ALiBi adds a linear penalty to attention scores based on distance. Tokens far apart get lower attention scores, with the penalty increasing linearly with distance.
This simple bias replaces learned position embeddings entirely. Like RoPE, ALiBi generalizes to longer sequences than training length.
MPT and BLOOM use ALiBi. It's simpler than RoPE but works well in practice. When fine-tuning ALiBi models, you don't need to worry about position embeddings at all.