The training objective is simple: predict the next token given all previous tokens.
For the sentence "The cat sat," the model learns to predict:
- "cat" given "The"
- "sat" given "The cat"
This simple objective, applied to billions of tokens, produces powerful capabilities. Fine-tuning continues this same objective on your specific data.