Key GPU concepts for training:
- CUDA cores: Handle general parallel computation
- Tensor cores: Specialized for matrix operations in reduced precision
- VRAM: Video memory where model and data live
- Memory bandwidth: How fast data moves between memory and compute
Tensor cores are why modern GPUs train so fast. They're optimized for exactly the operations transformers need.