For a B model, here's where LoRA saves memory:
- Frozen parameters: Loaded once, no gradients or optimizer states
- Trainable adapters: ~-M parameters with full training overhead
Full fine-tune: GB+ VRAM LoRA: -GB VRAM
The savings come primarily from not storing optimizer states for frozen parameters.