Batch ingestion: Process data in chunks at scheduled intervals. Run daily, hourly, or every minutes. Simpler to implement and debug.
Streaming ingestion: Process data as it arrives, with sub-second latency. More complex but required for real-time use cases.
When to use each:
- Batch: Reports, analytics, ML training data
- Streaming: Fraud detection, live dashboards, alerting
Most companies use both. Batch handles % of workloads. Stream where latency matters. Don't over-engineer with streaming when hourly batch suffices.