After training, you can merge LoRA adapters back into the base model:
W_merged = W_original + (α/r) × BA
This creates a single model with no inference overhead. The merged model runs at the same speed as the original.
Alternatively, keep adapters separate. This lets you swap adapters for different tasks without reloading the base model.