With attention heads, the model runs independent attention operations. Each head has its own Q, K, V weight matrices.
The dimension is split across heads. If the model dimension is and you have heads, each head works with -dimensional vectors. This keeps computation constant while allowing multiple attention patterns.