BF16 has the same exponent range as FP32, meaning it can represent the same magnitude of numbers. FP16 has a narrower range and can overflow during training.
With BF16, you rarely need loss scaling or special handling. Training "just works" at half the memory of FP32.
All modern GPUs (Ampere and newer) support BF16. Use it as your default for fine-tuning.