Multi-head attention uses separate key and value heads for each query head. GQA shares key-value heads across groups of query heads.
With query heads and key-value heads, every query heads share one key-value pair. This reduces the KV cache size by x during inference.
Llama B and Llama use GQA. You get most of multi-head attention's quality with significantly lower memory usage. When fine-tuning these models, GQA is already built in.