You can freeze some transformer layers entirely:
- Early layers: Capture general language patterns. Often safe to freeze.
- Middle layers: Store factual knowledge. May need updating for domain shift.
- Later layers: Handle task-specific behavior. Usually need training.
Freezing early layers speeds training and reduces memory. Start by freezing the first -% of layers.