Web pages change. Crawlers must re-fetch:
Change detection:
- HTTP Last-Modified / ETag headers
- Content hash comparison
- Sitemaps with lastmod dates
Re-crawl scheduling:
- Static pages: Weekly or monthly
- News sites: Every few minutes
- Priority based on importance and change frequency
Adaptive scheduling: Track change history per URL. Frequently changing pages get crawled more often.
Incremental updates: Re-crawl changed pages, not entire site. Use sitemap deltas.
Freshness vs coverage trade-off: Limited crawl budget. Spend on important, changing pages. Accept staleness for less important content.