Gradient clipping caps gradient magnitudes to prevent exploding gradients. If gradients exceed a threshold, they're scaled down.
Typical value: max gradient norm of .
This stabilizes training when you encounter bad batches or numerical edge cases. Enable it by default. The cost is negligible and the protection is valuable.