Deduplication reduces storage costs dramatically:
Block-level dedup: Split files into fixed or variable-size blocks ( MB typical) Hash each block (SHA-) Store only unique blocks File = list of block hashes
Benefits:
- Same file uploaded by users stored once
- File edits only upload changed blocks
- Version history shares unchanged blocks
Variable chunking: Content-defined chunking (Rabin fingerprinting) handles insertions better. Fixed chunks shift on insert; variable chunks stay stable.
Trade-off: Dedup requires reading existing blocks to compare. CPU cost at upload time.