Pre-training creates the foundation. Models learn grammar, facts, and reasoning from massive text corpora. This costs millions of dollars and takes months on thousands of GPUs.
Fine-tuning adapts this foundation. You take those learned capabilities and shape them for your task. This costs hundreds to thousands of dollars and takes hours to days on a single GPU. The pre-trained knowledge transfers to your domain.