The original transformer had both encoder and decoder. The encoder processes input with bidirectional attention (can see all positions). The decoder generates output with causal attention (can only see previous positions).
BERT uses encoder-only architecture. Good for understanding tasks like classification.
T5 uses encoder-decoder. Good for translation and summarization.
GPT uses decoder-only. Dominant for modern LLMs and generation tasks.