Every training iteration follows this pattern:
Load a batch of examples
Forward pass: model predicts next tokens
Compute loss against actual tokens
Backward pass: compute gradients
Optimizer step: update weights
One complete pass through all training data is one epoch. You typically train for - epochs depending on dataset size.