Design a Payment System
Understanding the Problem
Every time you tap "Pay" on an app, a complex chain of events fires behind the scenes. A payment system (like Stripe) must move money between accounts reliably, securely, and exactly once β even when networks fail, services crash, or users double-click the pay button.
Functional Requirements:
- Process payments: Accept a payment request with amount, currency, payment method, and merchant details. Charge the customer and credit the merchant.
- Refunds: Reverse a completed payment β full or partial β and update both ledger entries.
- Track payment status: Every payment moves through states:
created β processing β succeeded/failed. Clients can poll or receive webhooks for status updates. - Webhooks: Notify merchants of payment events (payment succeeded, refund issued, dispute opened) via HTTP callbacks with retry logic.
Non-Functional Requirements:
- Exactly-once processing: The most critical requirement. If a network timeout causes the client to retry, the system must not charge the customer twice. This is achieved through idempotency keys.
- High availability: 99.999% uptime β even minutes of downtime means millions in lost transactions.
- Data consistency: Money must never be created or destroyed. Every debit must have a matching credit (double-entry bookkeeping).
- PCI-DSS compliance: Card numbers must be tokenized, encrypted at rest, and never logged in plaintext.
- Audit trail: Every action must be recorded immutably for regulatory compliance and dispute resolution.
Back-of-the-Envelope Estimation
Let's size the system for a mid-to-large payment processor:
- Daily transactions: 1 million payments/day
- Average transaction value: $50 β $50M daily volume
- Peak throughput: ~100 transactions per second (TPS) during peak hours (2-3x average)
- Storage per transaction: ~1 KB (payment record + ledger entries + audit log) β ~1 GB/day, ~365 GB/year
- Uptime requirement: 99.999% ("five nines") β only ~5 minutes of downtime per year
- Webhook delivery: ~3M webhook events/day (multiple events per payment: created, processing, succeeded)
The throughput is modest compared to social media systems, but the correctness requirement is extreme. A social media post appearing twice is annoying; a payment being charged twice is a legal and financial liability.
API Design
Clean, idempotent APIs are the foundation of a reliable payment system.
Create a Payment
POST /api/v1/payments
Headers:
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
Authorization: Bearer sk_live_...
Body:
{
"amount": 4999, // in cents β always use integers to avoid floating-point issues
"currency": "usd",
"payment_method": "pm_card_visa_4242",
"merchant_id": "merch_abc123",
"description": "Order #7892",
"metadata": { "order_id": "7892" }
}
Response: 201 Created
{
"id": "pay_1234567890",
"status": "processing",
"amount": 4999,
"currency": "usd",
"created_at": "2026-03-10T14:30:00Z"
}Get Payment Status
GET /api/v1/payments/pay_1234567890
Response: 200 OK
{
"id": "pay_1234567890",
"status": "succeeded",
"amount": 4999,
"currency": "usd"
}Refund a Payment
POST /api/v1/payments/pay_1234567890/refund
Headers:
Idempotency-Key: 660e9500-f30c-52e5-b827-557766551111
Body:
{
"amount": 4999, // full refund; omit for partial
"reason": "customer_request"
}
Response: 201 Created
{
"id": "ref_0987654321",
"payment_id": "pay_1234567890",
"status": "processing",
"amount": 4999
}Key design choices: amounts in cents (integers avoid floating-point bugs), idempotency keys on every mutating endpoint, and status polling + webhooks for async updates.
Idempotency and Double-Entry Bookkeeping
Idempotency is the single most important concept in payment system design. Here's how it works:
- The client generates a UUID (the idempotency key) before making the request.
- When the server receives the request, it checks if that key already exists in the idempotency store (a Redis cache or database table).
- If the key exists, return the cached result β don't process the payment again.
- If the key is new, process the payment normally and store the result keyed by that UUID.
This means even if the client retries 10 times (due to network timeouts), the payment is only processed once.
Double-Entry Bookkeeping: Every transaction creates exactly two ledger entries β a debit and a credit β that sum to zero. For a $49.99 payment:
- Debit: Customer account β$49.99
- Credit: Merchant account +$49.99
For a refund, the entries reverse:
- Debit: Merchant account β$49.99
- Credit: Customer account +$49.99
The ledger is append-only β entries are never updated or deleted, only new entries are added. This creates an immutable audit trail.
Reconciliation: A background worker periodically compares internal ledger totals against the external payment processor's records. Any discrepancies are flagged for investigation. This catches bugs, fraud, and processor errors.
Ledger Design and Failure Handling
Append-Only Ledger: The ledger is the source of truth for all money movement. It uses an event-sourcing pattern β instead of storing current balances (which can drift), you store every individual transaction as an immutable event. The current balance is derived by replaying events.
Ledger entry schema:
{
"entry_id": "led_abc123",
"payment_id": "pay_1234567890",
"account_id": "acct_merchant_xyz",
"type": "credit",
"amount": 4999,
"currency": "usd",
"created_at": "2026-03-10T14:30:00Z",
"description": "Payment for Order #7892"
}Handling External Processor Failures:
- Timeout: If the external processor (Visa, Mastercard) doesn't respond, don't immediately fail. Retry with exponential backoff (1s, 2s, 4s, 8s...) up to a maximum of 5 attempts.
- Processor down: If the primary processor is unavailable, route to a backup processor. Major payment systems maintain relationships with multiple processors.
- Partial failures: If the charge succeeds at the processor but the ledger write fails, use a saga pattern β roll back the charge with a void/refund at the processor level.
- Stuck payments: A reconciliation worker detects payments stuck in "processing" state for too long and either completes or reverses them.