Deeper models learn more complex patterns. Early layers capture surface patterns like syntax. Middle layers build semantic understanding. Later layers handle high-level reasoning.
This hierarchy matters for fine-tuning. Some methods only tune certain layers. Understanding where knowledge lives helps you choose what to tune.
Factual knowledge concentrates in middle layers, while task-specific behavior lives in later layers.