Beta (β) controls how much the model can deviate from the reference:
- Low β (): Model can change significantly. Faster learning but risk of degradation.
- High β (+): Model stays close to reference. Safer but slower learning.
Typical values: to . Start with and increase if the model degrades. Beta acts as a regularizer keeping the model sensible.