FP8 uses just byte per value. H100 GPUs support FP8 natively.
FP8 is emerging for training but still experimental. Quality can suffer without careful implementation. For now, most use FP8 for inference only.
Expect FP8 training to mature over the next year. It promises another x memory and speed improvement over BF16.