PowerInfer is a new inference engine designed to run large language models more quickly on a single consumer-grade GPU, achieving speeds up to 11.69x faster than llama.cpp. Read the paper on Hugging Face.

via Hugging Face.