Regularization penalizes model complexity to prevent overfitting.
L2 (Ridge): Adds to loss. Shrinks weights toward zero but rarely to exactly zero. Use when all features might matter.
L1 (Lasso): Adds to loss. Can shrink weights to exactly zero. Use for feature selection.
Elastic Net: Combines L1 and L2. Best of both worlds.
Interview question: "Why does L1 produce sparse weights but L2 doesn't?"
L1's gradient is constant regardless of weight size. L2's gradient decreases as weight shrinks. L1 keeps pushing toward zero; L2 slows down.