Quick estimates for different methods:
- Full fine-tune FP16: ~ bytes per parameter
- LoRA FP16: ~ bytes per parameter (frozen) + adapter overhead
- QLoRA -bit: ~ byte per parameter (frozen) + adapter overhead
For a B model:
- Full FP16: ~GB
- LoRA: ~-GB
- QLoRA: ~-GB
Activations add more depending on batch size.