In full fine-tuning, every layer changes:
- Embedding layer: Token representations shift
- Attention weights (Q, K, V, O): Attention patterns adapt
- Feed-forward layers: Stored knowledge adjusts
- Layer norms: Normalization statistics change
- Output head: Token prediction probabilities shift
Early layers change less than later layers. Most task-specific adaptation happens in the final third of the network.