GGUF is a quantized format designed for llama.cpp. It enables CPU inference and mixed precision.
Benefits:
- Runs on CPU (no GPU needed)
- Multiple quantization options (Q4_K_M, Q5_K_M, Q8_0)
- Works on laptops and edge devices
Use GGUF when deploying to environments without powerful GPUs.