Transformers: "Attention Is All You Need" (Vaswani )
LLMs: GPT- (Brown ), LLaMA (Touvron )
Alignment: InstructGPT (Ouyang ) - introduced RLHF for instruction following
Scaling: Chinchilla (Hoffmann ), "Scaling Laws" (Kaplan )
RAG: "Retrieval-Augmented Generation" (Lewis )
Efficient training: LoRA (Hu ), FlashAttention (Dao )
Interview tip: Know - papers well. Explain contributions and limitations.