Business Systems13 min read

Design a Ride Sharing Service

Match riders with nearby drivers in real-time β€” at massive scale
scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

Think Uber or Lyft β€” a rider opens the app, requests a ride, and within seconds a nearby driver is matched and en route. Let's design the backend that makes this happen.

Functional Requirements:

  • Request a ride: Rider specifies pickup and dropoff locations.
  • Match a driver: Find the nearest available driver and assign them to the ride.
  • Track the ride: Real-time location tracking for both rider and driver during the trip.
  • Process payment: Calculate fare and charge the rider upon trip completion.

Non-Functional Requirements:

  • Low matching latency: Driver must be matched within <5 seconds of the request. Riders won't wait.
  • Real-time location tracking: Drivers report GPS every 4 seconds. Updates must be reflected instantly on rider's map.
  • Handle peak demand: Friday nights, concerts, rainstorms β€” demand can spike 5-10x. The system must handle surge gracefully.
  • High availability: 99.99% uptime. A down ride service means stranded riders and lost revenue.
β–Έ The idea: rider request β†’ match nearby driver β†’ ride

Estimation

Let's size this system with Uber-scale numbers:

  • 20 million rides per day β†’ ~230 rides/second on average, peak ~1,000/s
  • 5 million concurrent drivers reporting location every 4 seconds
  • Location update throughput: 5M / 4 = 1.25 million location updates per second
  • Matching latency target: Find and assign a driver within 5 seconds
  • Location data per update: driver_id (8B) + lat/lng (16B) + timestamp (8B) + metadata (32B) β‰ˆ 64 bytes
  • Location bandwidth: 1.25M Γ— 64B = ~80 MB/s of raw location data

The killer challenge here is the geospatial indexing β€” efficiently finding which drivers are near a given pickup point among millions of moving objects.

API Design

Request a Ride

EndpointPOST /api/v1/rides
Request{"rider_id": "r123", "pickup": {"lat": 37.77, "lng": -122.42}, "dropoff": {"lat": 37.79, "lng": -122.40}, "ride_type": "standard"}
Response{"ride_id": "ride_456", "status": "matching", "eta": 180}

Update Driver Location

EndpointPUT /api/v1/drivers/:id/location
Request{"lat": 37.78, "lng": -122.41, "heading": 90, "speed": 25}
NoteSent via WebSocket every 4 seconds for active drivers

Track a Ride

EndpointGET /api/v1/rides/:id
Response{"ride_id": "ride_456", "status": "in_progress", "driver_location": {"lat": 37.78, "lng": -122.41}, "eta": 120}

Complete a Ride

EndpointPOST /api/v1/rides/:id/complete
Response{"ride_id": "ride_456", "fare": 24.50, "distance_km": 8.2, "duration_min": 18}
β–Έ Driver matching: geospatial search
Click chart to zoom
Ride request flow: the Ride Service queries the Location Service for nearby drivers, picks the best match, and notifies the driver
β–Έ Location tracking: real-time GPS updates

Geospatial Indexing

This is the heart of the system. How do you find which of 5 million drivers are within 3 km of a pickup point?

Approach: Geohash / QuadTree

  • Divide the Earth's surface into a grid of cells using geohash encoding. Each cell gets a string prefix (e.g., 9q8yy for San Francisco).
  • Drivers are indexed by their geohash prefix. Finding nearby drivers = query the current cell + 8 adjacent cells (to handle edge cases near cell borders).
  • A QuadTree is an alternative: recursively subdivide the map into quadrants. Dense areas (cities) get subdivided more, rural areas stay coarse. Efficient for non-uniform driver distribution.

Location Updates:

  • Drivers push GPS coordinates every 4 seconds via persistent WebSocket connections.
  • Each update recalculates the driver's geohash. If the geohash changed, move the driver to the new cell in the index.
  • The location index lives in-memory (Redis or a custom in-memory store) for sub-millisecond lookups. No disk I/O on the hot path.

Why not a traditional database query? Running SELECT * WHERE distance(lat, lng, pickup_lat, pickup_lng) < 3km on 5M rows every second would melt any database. Geohash turns a radius query into a simple prefix lookup β€” orders of magnitude faster.

β–Έ Full architecture

Surge Pricing, ETA, and Analytics

Surge Pricing:

  • Divide the city into zones. Track the supply/demand ratio per zone in real-time.
  • When demand (ride requests) exceeds supply (available drivers) in a zone, apply a surge multiplier (1.5x, 2x, etc.).
  • This serves two purposes: discourages low-priority rides and incentivizes drivers to move into high-demand zones.

ETA Estimation:

  • Model the road network as a weighted graph (intersections = nodes, roads = edges, weights = travel time).
  • Use Dijkstra's algorithm or A* to compute shortest path. Pre-compute common routes.
  • Factor in real-time traffic data to adjust edge weights dynamically.

Trip Store:

  • Every completed trip is persisted for billing, dispute resolution, driver ratings, and analytics.
  • High write throughput β€” use a write-optimized store (Cassandra, DynamoDB) partitioned by city and date.
Note: Interview tip: Geospatial indexing (geohash or QuadTree) is the key challenge in a ride-sharing system. Interviewers want to see that you understand why naive distance queries don't scale, and how spatial partitioning turns an O(n) scan into an O(1) prefix lookup. Mention the 8-neighbor problem with geohash boundaries β€” it's a detail that shows depth.

Key Metrics

Driver matching latency
Geohash prefix query + rank nearby drivers
<5 seconds \(O(1)\) geohash lookup
Location update throughput
5M drivers Γ— 1 update every 4s
1.25M/s \(O(1)\) per update
Concurrent trips
20M rides/day with ~35 min avg trip
~500K β€”
ETA accuracy
Road graph + real-time traffic weights
Β±2 min \(O(E \log V)\) A*
WebSocket connections
Each active driver holds a persistent connection
5M concurrent β€”
Availability target
Multi-region with failover
99.99% β€”

Quick check

Why is geohash preferred over naive distance calculations for finding nearby drivers?

Continue reading