Continued pre-training trains on raw text without instruction formatting. It adapts the model to a domain without teaching specific tasks.
Use continued pre-training when:
- You have lots of domain text but few task examples
- You want to inject domain knowledge broadly
Then follow with SFT for specific tasks. This two-stage approach can outperform SFT alone for specialized domains.