Data & Infrastructure12 min read

Design a File Storage System

Upload, sync, and share files across every device β€” like Dropbox
scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

Think Dropbox, Google Drive, or OneDrive. Users upload files from one device and expect them to appear on every other device β€” instantly. They share files with colleagues, revert to previous versions, and never want to lose data.

Functional Requirements:

  • Upload & download β€” Users can upload files and download them from any device.
  • Sync across devices β€” Changes on one device propagate to all others automatically.
  • File sharing β€” Share files or folders with other users via links or permissions.
  • Version history β€” Keep previous versions so users can revert changes.

Non-Functional Requirements:

  • Reliability: Files must never be lost. Data durability is paramount β€” think 99.999999999% (11 nines).
  • Low sync latency: When a user saves a file, other devices should see the update within seconds, not minutes.
  • Bandwidth efficiency: Use delta sync β€” only upload the changed parts of a file, not the entire file every time. This is the key engineering challenge.
β–Έ The idea: sync files across devices

Estimation

Let's size this system:

  • 500M registered users, 100M daily active users
  • 200 files per user on average
  • Average file size: 100 KB (many small docs, some large media files)
  • Total storage: 500M Γ— 200 Γ— 100 KB = 10 PB
  • Sync operations: 100M DAU Γ— 1 sync/day avg = 100M sync operations/day (~1,150/s)
  • Upload bandwidth: If 10% of syncs are uploads of ~100 KB, that's ~1.15 GB/s sustained
  • Peak traffic: 3-5Γ— average during business hours

The real challenge isn't raw throughput β€” it's minimizing bandwidth via chunking and delta sync, while keeping sync latency low.

API Design

File operations are chunked for reliability and resumability:

Upload File (Chunked)

EndpointPOST /api/v1/files/upload
Request{"filename": "report.pdf", "chunks": [{"hash": "a1b2c3", "index": 0, "data": "..."}], "total_chunks": 5}
Response{"file_id": "f123", "version": 1, "status": "uploaded"}
Status201 Created

Download File

EndpointGET /api/v1/files/:id
ResponseFile binary stream with Content-Disposition header
Status200 OK

Update Metadata

EndpointPUT /api/v1/files/:id/metadata
Request{"name": "new-name.pdf", "shared_with": ["user456"]}
Status200 OK

Version History

EndpointGET /api/v1/files/:id/history
Response{"versions": [{"version": 3, "modified_at": "...", "size": 102400}, ...]}

Why chunked uploads? Large files (100 MB+) fail over unreliable networks. Chunking lets you resume from where you left off. Each chunk is typically 4 MB β€” small enough to retry quickly, large enough to avoid too many round trips.

β–Έ Chunked upload and delta sync
Click chart to zoom
Sync flow: files are split into blocks, deduped, stored, and synced to other devices via notifications
β–Έ Conflict resolution and versioning

Block-Level Dedup & Delta Sync

This is the most important engineering decision in a file storage system.

How it works:

  1. Split files into blocks β€” Each file is divided into fixed-size blocks (typically 4 MB). Each block is hashed (SHA-256).
  2. Hash comparison β€” Before uploading, the client computes hashes of all blocks and sends them to the server. The server checks which blocks it already has.
  3. Upload only new blocks β€” Only blocks with new hashes are uploaded. If you edit one paragraph in a 100 MB document, you might only upload a single 4 MB block instead of the whole file.
  4. Reconstruct on download β€” The file is a list of block hashes. To download, fetch each block and reassemble.

Bandwidth savings: Delta sync can reduce bandwidth by 90%+ for typical edits. This is why Dropbox feels so fast β€” it's not uploading your entire file every time you hit save.

Conflict resolution: When two devices edit the same file simultaneously:

  • Latest-write-wins β€” Simple but can lose data. Fine for most cases.
  • Keep both copies β€” Create a "conflicted copy" and let the user merge manually. Dropbox uses this approach.
  • Operational transform β€” Used by Google Docs for real-time collaborative editing (much more complex).
β–Έ Full architecture

Real-Time Sync & Storage Tiers

Notification Service: To achieve near-instant sync, the server must push updates to clients. Two approaches:

  • Long polling β€” Client holds an HTTP connection open. Server responds when there's a change. Simple but uses more connections.
  • WebSocket β€” Persistent bidirectional connection. More efficient for frequent updates. Preferred for desktop sync clients.

When a file changes, the notification service tells all connected devices: "File X has new blocks β€” here are the hashes." Each device then downloads only the blocks it doesn't have.

Storage Tiers:

  • Hot storage β€” Frequently accessed files (last 30 days). Stored on fast SSDs with high IOPS. Think S3 Standard.
  • Cold storage β€” Old versions, rarely accessed files. Stored on cheaper storage (S3 Glacier, tape). Access latency: minutes to hours, but cost is 10-20Γ— lower.
  • Lifecycle policies β€” Automatically move files from hot to cold based on access patterns. Version history older than 90 days β†’ cold storage.
Note: Interview tip: Delta sync (block-level deduplication) is the key differentiator in a file storage system design. It's what separates a naive "upload the whole file" approach from a production-grade system like Dropbox. Always bring it up β€” it shows you understand the real engineering challenge: minimizing bandwidth while keeping sync fast.

Key Metrics

Upload (chunked)
n = file size, B = block size (4 MB). Only changed blocks uploaded.
~200-500 ms \(O(n/B)\)
Download
k = number of blocks. Parallel block fetches from object store.
~50-200 ms \(O(k)\)
Sync latency
Notification push + delta download of changed blocks
~1-5 sec β€”
Total storage
500M users Γ— 200 files Γ— 100 KB avg
~10 PB β€”
Dedup savings
Block-level dedup across users and versions
40-60% β€”
Sync operations
100M daily syncs, peaks at 3-5Γ— during business hours
~1,150/s β€”

Quick check

Why do file storage systems split files into fixed-size blocks (e.g., 4 MB) instead of uploading whole files?

Continue reading