Think of attention like a search engine. For each position in the sequence, the model asks: "What other positions should I pay attention to?"
When processing "The cat sat on the mat because it was tired," the model needs to figure out what "it" refers to. Attention lets it look back at all previous words and determine that "cat" is most relevant. High attention weight means high relevance.