Data & Infrastructure12 min read

Design a File Storage System

Upload, sync, and share files across every device — like Dropbox

scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

Think Dropbox, Google Drive, or OneDrive. Users upload files from one device and expect them to appear on every other device — instantly. They share files with colleagues, revert to previous versions, and never want to lose data.

Functional Requirements:

Upload & download — Users can upload files and download them from any device.
Sync across devices — Changes on one device propagate to all others automatically.
File sharing — Share files or folders with other users via links or permissions.
Version history — Keep previous versions so users can revert changes.

Non-Functional Requirements:

Reliability: Files must never be lost. Data durability is paramount — think 99.999999999% (11 nines).
Low sync latency: When a user saves a file, other devices should see the update within seconds, not minutes.
Bandwidth efficiency: Use delta sync — only upload the changed parts of a file, not the entire file every time. This is the key engineering challenge.

▸ The idea: sync files across devices

Estimation

Let's size this system:

500M registered users, 100M daily active users
200 files per user on average
Average file size: 100 KB (many small docs, some large media files)
Total storage: 500M × 200 × 100 KB = 10 PB
Sync operations: 100M DAU × 1 sync/day avg = 100M sync operations/day (~1,150/s)
Upload bandwidth: If 10% of syncs are uploads of ~100 KB, that's ~1.15 GB/s sustained
Peak traffic: 3-5× average during business hours

The real challenge isn't raw throughput — it's minimizing bandwidth via chunking and delta sync, while keeping sync latency low.

API Design

File operations are chunked for reliability and resumability:

Upload File (Chunked)

Endpoint	`POST /api/v1/files/upload`
Request	`{"filename": "report.pdf", "chunks": [{"hash": "a1b2c3", "index": 0, "data": "..."}], "total_chunks": 5}`
Response	`{"file_id": "f123", "version": 1, "status": "uploaded"}`
Status	`201 Created`

Download File

Endpoint	`GET /api/v1/files/:id`
Response	File binary stream with `Content-Disposition` header
Status	`200 OK`

Update Metadata

Endpoint	`PUT /api/v1/files/:id/metadata`
Request	`{"name": "new-name.pdf", "shared_with": ["user456"]}`
Status	`200 OK`

Version History

Endpoint	`GET /api/v1/files/:id/history`
Response	`{"versions": [{"version": 3, "modified_at": "...", "size": 102400}, ...]}`

Why chunked uploads? Large files (100 MB+) fail over unreliable networks. Chunking lets you resume from where you left off. Each chunk is typically 4 MB — small enough to retry quickly, large enough to avoid too many round trips.

▸ Chunked upload and delta sync

Click chart to zoom

Sync flow: files are split into blocks, deduped, stored, and synced to other devices via notifications

▸ Conflict resolution and versioning

Block-Level Dedup & Delta Sync

This is the most important engineering decision in a file storage system.

How it works:

Split files into blocks — Each file is divided into fixed-size blocks (typically 4 MB). Each block is hashed (SHA-256).
Hash comparison — Before uploading, the client computes hashes of all blocks and sends them to the server. The server checks which blocks it already has.
Upload only new blocks — Only blocks with new hashes are uploaded. If you edit one paragraph in a 100 MB document, you might only upload a single 4 MB block instead of the whole file.
Reconstruct on download — The file is a list of block hashes. To download, fetch each block and reassemble.

Bandwidth savings: Delta sync can reduce bandwidth by 90%+ for typical edits. This is why Dropbox feels so fast — it's not uploading your entire file every time you hit save.

Conflict resolution: When two devices edit the same file simultaneously:

Latest-write-wins — Simple but can lose data. Fine for most cases.
Keep both copies — Create a "conflicted copy" and let the user merge manually. Dropbox uses this approach.
Operational transform — Used by Google Docs for real-time collaborative editing (much more complex).

▸ Full architecture

Real-Time Sync & Storage Tiers

Notification Service: To achieve near-instant sync, the server must push updates to clients. Two approaches:

Long polling — Client holds an HTTP connection open. Server responds when there's a change. Simple but uses more connections.
WebSocket — Persistent bidirectional connection. More efficient for frequent updates. Preferred for desktop sync clients.

When a file changes, the notification service tells all connected devices: "File X has new blocks — here are the hashes." Each device then downloads only the blocks it doesn't have.

Storage Tiers:

Hot storage — Frequently accessed files (last 30 days). Stored on fast SSDs with high IOPS. Think S3 Standard.
Cold storage — Old versions, rarely accessed files. Stored on cheaper storage (S3 Glacier, tape). Access latency: minutes to hours, but cost is 10-20× lower.
Lifecycle policies — Automatically move files from hot to cold based on access patterns. Version history older than 90 days → cold storage.

Note: Interview tip: Delta sync (block-level deduplication) is the key differentiator in a file storage system design. It's what separates a naive "upload the whole file" approach from a production-grade system like Dropbox. Always bring it up — it shows you understand the real engineering challenge: minimizing bandwidth while keeping sync fast.