Strategies to preserve pre-trained knowledge:
Use lower learning rates to minimize weight changes
Train for fewer epochs to reduce drift
Mix in general-purpose data with your specialized data
Use parameter-efficient methods (LoRA) that freeze most weights
Use regularization to penalize deviation from original weights
Balance is key. Adapt enough to be useful, not so much that you break things.