Design a File Storage System
Understanding the Problem
Think Dropbox, Google Drive, or OneDrive. Users upload files from one device and expect them to appear on every other device β instantly. They share files with colleagues, revert to previous versions, and never want to lose data.
Functional Requirements:
- Upload & download β Users can upload files and download them from any device.
- Sync across devices β Changes on one device propagate to all others automatically.
- File sharing β Share files or folders with other users via links or permissions.
- Version history β Keep previous versions so users can revert changes.
Non-Functional Requirements:
- Reliability: Files must never be lost. Data durability is paramount β think 99.999999999% (11 nines).
- Low sync latency: When a user saves a file, other devices should see the update within seconds, not minutes.
- Bandwidth efficiency: Use delta sync β only upload the changed parts of a file, not the entire file every time. This is the key engineering challenge.
Estimation
Let's size this system:
- 500M registered users, 100M daily active users
- 200 files per user on average
- Average file size: 100 KB (many small docs, some large media files)
- Total storage: 500M Γ 200 Γ 100 KB = 10 PB
- Sync operations: 100M DAU Γ 1 sync/day avg = 100M sync operations/day (~1,150/s)
- Upload bandwidth: If 10% of syncs are uploads of ~100 KB, that's ~1.15 GB/s sustained
- Peak traffic: 3-5Γ average during business hours
The real challenge isn't raw throughput β it's minimizing bandwidth via chunking and delta sync, while keeping sync latency low.
API Design
File operations are chunked for reliability and resumability:
Upload File (Chunked)
| Endpoint | POST /api/v1/files/upload |
| Request | {"filename": "report.pdf", "chunks": [{"hash": "a1b2c3", "index": 0, "data": "..."}], "total_chunks": 5} |
| Response | {"file_id": "f123", "version": 1, "status": "uploaded"} |
| Status | 201 Created |
Download File
| Endpoint | GET /api/v1/files/:id |
| Response | File binary stream with Content-Disposition header |
| Status | 200 OK |
Update Metadata
| Endpoint | PUT /api/v1/files/:id/metadata |
| Request | {"name": "new-name.pdf", "shared_with": ["user456"]} |
| Status | 200 OK |
Version History
| Endpoint | GET /api/v1/files/:id/history |
| Response | {"versions": [{"version": 3, "modified_at": "...", "size": 102400}, ...]} |
Why chunked uploads? Large files (100 MB+) fail over unreliable networks. Chunking lets you resume from where you left off. Each chunk is typically 4 MB β small enough to retry quickly, large enough to avoid too many round trips.
Block-Level Dedup & Delta Sync
This is the most important engineering decision in a file storage system.
How it works:
- Split files into blocks β Each file is divided into fixed-size blocks (typically 4 MB). Each block is hashed (SHA-256).
- Hash comparison β Before uploading, the client computes hashes of all blocks and sends them to the server. The server checks which blocks it already has.
- Upload only new blocks β Only blocks with new hashes are uploaded. If you edit one paragraph in a 100 MB document, you might only upload a single 4 MB block instead of the whole file.
- Reconstruct on download β The file is a list of block hashes. To download, fetch each block and reassemble.
Bandwidth savings: Delta sync can reduce bandwidth by 90%+ for typical edits. This is why Dropbox feels so fast β it's not uploading your entire file every time you hit save.
Conflict resolution: When two devices edit the same file simultaneously:
- Latest-write-wins β Simple but can lose data. Fine for most cases.
- Keep both copies β Create a "conflicted copy" and let the user merge manually. Dropbox uses this approach.
- Operational transform β Used by Google Docs for real-time collaborative editing (much more complex).
Real-Time Sync & Storage Tiers
Notification Service: To achieve near-instant sync, the server must push updates to clients. Two approaches:
- Long polling β Client holds an HTTP connection open. Server responds when there's a change. Simple but uses more connections.
- WebSocket β Persistent bidirectional connection. More efficient for frequent updates. Preferred for desktop sync clients.
When a file changes, the notification service tells all connected devices: "File X has new blocks β here are the hashes." Each device then downloads only the blocks it doesn't have.
Storage Tiers:
- Hot storage β Frequently accessed files (last 30 days). Stored on fast SSDs with high IOPS. Think S3 Standard.
- Cold storage β Old versions, rarely accessed files. Stored on cheaper storage (S3 Glacier, tape). Access latency: minutes to hours, but cost is 10-20Γ lower.
- Lifecycle policies β Automatically move files from hot to cold based on access patterns. Version history older than 90 days β cold storage.