Social & Communication12 min read

Design an Image Hosting Service

Upload, store, and serve images at scale β€” like Imgur
scope:Real-World Systemdifficulty:Intermediate

Understanding the Problem

Image hosting services let users upload images, generate shareable links, and serve those images fast to millions of viewers. Think Imgur, Flickr, or the image backends behind social media platforms.

Let's define what we need:

Functional Requirements:

  • Upload: Users can upload images (JPEG, PNG, GIF, WebP) up to 10 MB.
  • View: Anyone with the link can view the image β€” no authentication required for public images.
  • Delete: Image owners can delete their uploads.
  • Resize/Thumbnail: Automatically generate thumbnails (150Γ—150, 300Γ—300, 600Γ—600) for different contexts (previews, embeds, galleries).

Non-Functional Requirements:

  • Low latency serving: Images should load in under 50ms for most users worldwide. CDN is essential.
  • High availability: The service must be up 99.99% of the time. A broken image link is a terrible user experience.
  • Handle large files: Uploads up to 10 MB. The system must handle multipart uploads and not timeout on slow connections.
  • Durability: Once uploaded, images must never be lost. We need redundant storage.
β–Έ The idea: upload β†’ store β†’ serve via CDN

Estimation

Let's size this system:

  • 10M uploads/day (~115 uploads/second, peak ~350/s)
  • Average image size: 2 MB
  • Read:Write ratio 10:1: ~1,150 reads/second average, ~1.2M reads/second peak (viral images)
  • Daily storage: 10M Γ— 2 MB = 20 TB/day of new images
  • 5-year storage: 20 TB Γ— 365 Γ— 5 = ~36 PB (originals only β€” thumbnails add ~30% more)
  • Bandwidth: At peak 1.2M reads/s Γ— 200 KB avg served size = ~240 GB/s outbound β€” this is why CDN is non-negotiable

This is a storage-heavy, read-heavy system. The main challenges are efficient storage, fast serving via CDN, and an async image processing pipeline.

API Design

Upload Image

EndpointPOST /api/v1/images
Content-Typemultipart/form-data
Bodyfile (binary), title (optional), is_public (boolean)
Response{"id": "img_abc123", "url": "https://cdn.imghost.com/abc123.jpg", "thumbnails": {...}}
Status201 Created

View Image

EndpointGET /api/v1/images/:id
Query params?size=thumb|medium|large|original
Response302 redirect to CDN URL, or image binary

Delete Image

EndpointDELETE /api/v1/images/:id
AuthBearer token (owner only)
Response204 No Content

In practice, most image reads bypass the API entirely β€” the client hits the CDN URL directly. The API is mainly for upload, metadata, and deletion.

β–Έ Upload flow: image processing pipeline
Click chart to zoom
Upload path: the original is stored immediately, thumbnails are generated asynchronously via a message queue
β–Έ Serving flow: CDN-first delivery
Click chart to zoom
Read path: CDN handles 95%+ of reads. Cache misses fall back to the origin object store.

Image Processing Pipeline

When an image is uploaded, it doesn't just get stored β€” it goes through a processing pipeline:

  1. Validation: Check file type, size (≀10 MB), and dimensions. Reject malformed files.
  2. Deduplication: Compute a content hash (SHA-256) of the file. If the same hash already exists, return the existing image instead of storing a duplicate. This can save 20-30% storage.
  3. Thumbnail generation: Create multiple sizes β€” 150Γ—150 (avatar/preview), 300Γ—300 (gallery), 600Γ—600 (medium). This runs async via a worker queue.
  4. Format conversion: Convert to WebP for browsers that support it (30-50% smaller than JPEG at same quality). Store both formats.
  5. EXIF stripping: Remove metadata (GPS location, camera info) for privacy. Users don't expect their location to be embedded in shared images.
  6. Content moderation: Run through an ML model or third-party API to detect inappropriate content. Flag or reject as needed.

The key insight: only the original upload is synchronous. Everything else (thumbnails, format conversion, moderation) happens asynchronously via a message queue. This keeps upload latency low (~200ms).

β–Έ Full architecture

Storage Strategy

Storage is the biggest cost and design challenge:

Object Store (S3): Store all image files β€” originals and thumbnails. S3 gives us 11 nines of durability, automatic replication, and virtually unlimited capacity. Organize by content hash: s3://images/{hash_prefix}/{hash}.{ext}

Metadata Database: Store image metadata β€” ID, owner, upload time, dimensions, content hash, thumbnail URLs, view count. A relational database (PostgreSQL) works well here since the data is structured and we need indexes on owner, hash, and creation time.

CDN: All image serving goes through a CDN (CloudFront, Fastly). The CDN caches images at edge locations worldwide, so users get images from the nearest server. Cache TTL of 30 days for images (they rarely change).

Deduplication: Before storing, compute SHA-256 hash of the image content. Check the metadata DB β€” if the hash exists, point the new image record to the existing S3 object. This saves enormous storage when the same meme or image gets uploaded thousands of times.

Note: Interview tip: Always mention CDN when discussing image or static content serving. Then go deeper β€” talk about cache invalidation on delete, WebP format optimization, and how thumbnails reduce bandwidth by 10x compared to serving originals. These details show you understand the real-world economics of image hosting.

Key Metrics

Upload (original)
Validate + S3 PUT + metadata write (n = file size)
~200 ms \(O(n)\)
Serve via CDN (cache hit)
Edge server β†’ client, no origin fetch
~20 ms \(O(1)\)
Serve via CDN (cache miss)
S3 fetch β†’ CDN cache β†’ client
~100 ms \(O(1)\)
Storage (5 years)
10M images/day Γ— 2 MB Γ— 5 years (originals)
~36 PB β€”
CDN hit rate
Popular images cached at edge; long TTL
~95%+ β€”
Thumbnail generation
Resize + WebP conversion per image
~2-5 s async \(O(n)\)

Quick check

Why is image processing (thumbnails, format conversion) done asynchronously rather than during the upload request?

Continue reading