FP16's narrow range means small gradients can underflow to zero. Loss scaling multiplies the loss by a large factor before backprop, then scales gradients back down.
Dynamic loss scaling adjusts the scale automatically. Start high, reduce if overflow occurs.
BF16 rarely needs this because its range matches FP32. If using FP16, enable dynamic loss scaling in your training config.