A model's parameters directly consume memory. In FP32 (full precision), each parameter uses bytes.
- B parameters × bytes = GB just for the model
- B parameters × bytes = GB
- B parameters × bytes = GB
This is before optimizer states, gradients, and activations. FP32 training of even B models exceeds most consumer GPUs.