Batch size affects training dynamics and memory usage.
Small batches: More noise, better generalization, slower per epoch.
Large batches: Less noise, faster per epoch, may generalize worse without LR scaling.
Linear scaling rule: When increasing batch size by k, increase LR by k.
Gradient accumulation: Simulate large batches on limited memory. Accumulate gradients over multiple forward passes.
Interview tip: Batch size is often constrained by GPU memory. Know how to work around it with gradient accumulation.