Prefix tuning prepends learnable vectors to the key and value sequences in attention.
Instead of modifying weights, you're adding "soft prompts" that influence how attention operates. The base model is completely frozen.
Prefix tuning uses few parameters but can underperform LoRA on complex tasks. It works well for tasks where steering attention is sufficient.