During LoRA training, the original model weights are frozen. Gradients only flow through the adapter matrices.
This means:
- Much less memory for optimizer states (only adapters need them)
- Faster training (fewer parameters to update)
- Original model preserved (no catastrophic forgetting of frozen weights)
You're training perhaps -% of total parameters.