At scale, how data is organized matters more than query logic.
Partitioning: Divides table into segments by column value (usually date). Queries only scan relevant partitions.
Clustering: Sorts data within partitions by specified columns. Improves performance for filtered queries.
When to use:
- Partition: Tables > TB, queries filter by partition column
- Cluster: Large tables with common filter patterns
Interview tip: When designing schemas, always mention partitioning strategy for large tables.