"Attention Is All You Need" (): Introduced transformers. Replaced RNNs for translation.
BERT (): Bidirectional pretraining with masked language modeling. Dominated NLP benchmarks.
GPT- (): Showed emergent few-shot learning. "Too dangerous to release" (they released it).
GPT- (): B parameters. In-context learning works. API-based AI begins.
InstructGPT (): RLHF for following instructions. Foundation of ChatGPT.
Interview tip: Know the progression. Each paper built on previous work. Understand what each contributed.