Business Systems14 min read

Design an E-Commerce Platform

Build Amazon-scale shopping β€” from catalog to checkout to delivery
scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

E-commerce platforms like Amazon handle an extraordinary range of challenges: massive product catalogs, lightning-fast search, real-time inventory tracking, shopping carts, checkout with payment processing, and order fulfillment.

Let's define our requirements:

Functional Requirements:

  • Browse a product catalog with categories, filters, and detail pages.
  • Search products by keyword with faceted filtering (price, brand, rating).
  • Add items to a shopping cart and manage quantities.
  • Checkout: validate inventory, process payment, create order.
  • Track order status from placement through delivery.

Non-Functional Requirements:

  • 99.99% uptime: Downtime means lost revenue β€” every minute of downtime during a flash sale costs millions.
  • Sub-second search: Product search must return results in under 200ms. Users abandon after 3 seconds.
  • Flash sale capacity: Handle 100K orders/min during peak events (Black Friday, Prime Day).
  • Strong consistency for inventory: Never sell more than what's in stock. Overselling is costly and damages trust.
β–Έ The idea: browse β†’ cart β†’ checkout β†’ deliver

Estimation

Let's size this system:

  • 500M products in the catalog across millions of sellers.
  • 100M daily active users (DAU) browsing, searching, and buying.
  • 10M orders/day on average β€” about 115 orders/second.
  • Peak: 100K orders/min during flash sales β€” ~1,700 orders/second.
  • 1B search queries/day β€” ~11,500 queries/second, peak ~30K/s.
  • Product data: 500M Γ— ~5KB (title, description, images metadata) = ~2.5 TB for catalog.
  • Order data: 10M/day Γ— 365 Γ— ~2KB = ~7.3 TB/year.
  • Image storage: 500M products Γ— 5 images Γ— 500KB = ~1.25 PB (served via CDN).

This is a massive system. The key challenges are inventory consistency during high-throughput checkout and keeping search fast across 500M products.

Data Models

Core entities that power the platform:

Product:

  • SKU (unique identifier), title, description, price, category, brand
  • images[] (URLs to CDN), attributes (size, color, weight)
  • seller_id, rating, review_count, created_at

Cart:

  • user_id, items[] (each with SKU, quantity, price_snapshot)
  • updated_at β€” carts are ephemeral, stored in Redis for speed

Order:

  • order_id, user_id, items[], total_amount
  • status (pending β†’ paid β†’ shipped β†’ delivered), payment_id
  • shipping_address, created_at, updated_at

Inventory:

  • SKU, warehouse_id, quantity_available, quantity_reserved
  • last_updated β€” this is the critical table that needs strong consistency
β–Έ Product catalog and search
Click chart to zoom
Checkout flow: reserve-then-commit pattern ensures we never oversell, even during flash sales

Inventory Management

Inventory consistency is the hardest part of e-commerce. The core pattern is reserve-then-commit:

  1. Reserve: When checkout starts, atomically decrement quantity_available and increment quantity_reserved. Use distributed locking (Redis SETNX) or optimistic concurrency (version column with CAS).
  2. Commit: After payment succeeds, move from reserved to sold.
  3. Release: If payment fails or reservation TTL expires (10 min), release the reserved stock back to available.

Consistency model:

  • Catalog data: Eventual consistency is fine. A product description updating a few seconds late is acceptable.
  • Inventory/checkout: Strong consistency is required. Use database transactions with row-level locks on the inventory table. UPDATE inventory SET qty = qty - 1 WHERE sku = ? AND qty > 0 β€” the WHERE qty > 0 prevents overselling atomically.
  • Cart data: Eventual consistency. Carts stored in Redis with TTL β€” if Redis node fails, cart can be rebuilt from a persistent backup.
β–Έ Checkout: inventory reservation and payment

Search and Flash Sales

Search Architecture:

  • Elasticsearch cluster indexes all 500M products. Product updates flow from the Product DB via a change data capture (CDC) pipeline to keep the search index in near-real-time sync.
  • Faceted search: Elasticsearch excels at this β€” filter by price range, brand, category, rating, and more, all in a single query.
  • Personalized ranking: A ranking service re-orders search results based on user purchase history, browsing behavior, and seller relevance scores.

Flash Sale Handling:

  • Queue-based checkout: During flash sales, put checkout requests into a message queue (Kafka). Workers process them sequentially, preventing inventory race conditions.
  • Inventory pre-allocation: Before the sale, pre-allocate inventory into shards. Each checkout worker owns a shard β€” no cross-shard locking needed.
  • Rate limiting: Cap checkout requests per user to prevent bot abuse. Use a token bucket rate limiter at the API gateway.
  • CDN + static pages: Product pages for flash sale items are pre-rendered and served from CDN. Only the "Buy" button hits the backend.
β–Έ Full architecture
Note: Interview tip: Inventory consistency is the hardest part of e-commerce design. Always discuss the reserve-then-commit pattern, TTL-based reservation expiry, and the difference between eventual consistency (catalog) vs. strong consistency (checkout). This shows you understand where to make trade-offs.

Key Metrics

Search latency (p99)
Elasticsearch inverted index + caching
<200 ms \(O(\log n)\)
Checkout latency (p99)
Inventory lock + payment + order creation
<500 ms \(O(1)\)
Inventory accuracy
Reserve-then-commit with TTL expiry
99.99% β€”
Order throughput (peak)
Queue-based checkout during flash sales
100K/min β€”
Catalog size
Sharded product DB + Elasticsearch
500M products β€”
Image storage
CDN-served, multi-resolution
~1.25 PB β€”

Quick check

Why is the reserve-then-commit pattern critical for e-commerce checkout?

Continue reading