Crawlers run across many machines. Coordination needs: URL assignment, duplicate detection, rate limiting per domain.
Consistent hashing assigns domains to crawlers. Shared Bloom filter (replicated or centralized) for dedup. Per-domain crawl delay tracked in Redis. Checkpointing frontier state for recovery. Monitor crawl rate, success rate, and coverage metrics.