LLMs generate text one token at a time. At each step:
Process all previous tokens through the model
Get probability distribution over vocabulary for next token
Sample or select a token
Add it to the sequence and repeat
This is slow because each token requires a full forward pass. Techniques like KV caching speed this up by reusing computations from previous steps.