GPTQ is a popular -bit quantization method for inference. It uses calibration data to optimize quantization.
Advantages:
- Works well with most models
- GPU inference support
- Good quality for -bit
Use GPTQ when deploying on GPU and you need -bit to fit in memory.