MQA takes key-value sharing to the extreme: all query heads share a single key-value head.
This reduces KV cache memory by the number of heads (often x or more). The trade-off is slight quality loss compared to full multi-head attention.
PaLM and Falcon use MQA. It's most valuable when you're memory-constrained during inference. GQA is the middle ground between MQA and full multi-head attention, offering a better quality-memory trade-off for most use cases.