Content Delivery Networks
What is a CDN?
Imagine you're a book publisher with a warehouse in New York. A customer in Tokyo orders a book. You ship it across the Pacific — it takes a week. Terrible experience.
Now imagine you have local bookstores in every major city. Tokyo has your books on the shelf already. The customer walks in and gets their book in minutes. That's a CDN.
A Content Delivery Network is a geographically distributed network of servers (called edge servers or Points of Presence, PoPs) that cache and serve content from locations near the user. Instead of every request traveling to your origin server in Virginia, it's served from an edge server in the user's city.
CDNs primarily serve static content: images, videos, CSS, JavaScript, fonts, and downloadable files. Some modern CDNs also cache API responses and run edge functions (code that executes at the edge).
How CDNs Work
Here's the typical flow:
- A user in Berlin requests
https://cdn.example.com/images/cat.jpg. - DNS resolution routes the request to the nearest edge server (often using load balancing) (say, Frankfurt).
- The Frankfurt edge server checks its cache. If the image is there (cache hit), it returns it immediately — typically in 5-20ms.
- If the image isn't cached (cache miss), the edge server fetches it from the origin server (your actual server, maybe in Virginia), caches it locally, and returns it to the user.
- The next user in Berlin (or Munich, or any nearby city) gets the cached version — no trip to Virginia needed.
The result? Dramatically faster load times, reduced bandwidth on your origin server, and better user experience globally.
Push CDN vs Pull CDN
There are two models for how content gets to the edge servers:
Pull CDN (Lazy)
- Content is fetched from the origin on demand when a user requests it.
- First request is slow (cache miss → origin fetch). Subsequent requests are fast (cache hit).
- Content expires based on TTL (time-to-live) headers.
- You don't need to manage what's cached — the CDN handles it.
- Best for: websites with lots of content where you can't predict what'll be popular.
Push CDN
- You proactively upload content to the CDN before anyone requests it.
- Every request is a cache hit from the start. No cold start problem.
- You manage the content lifecycle — upload new versions, delete old ones.
- Best for: content you know will be popular (video releases, software downloads), or when you need guaranteed availability.
Most websites use pull CDNs because they're simpler and self-managing. Push CDNs are used for large media companies like Netflix or for software update distribution.
Cache-Control Headers for CDN
Cache Invalidation at the Edge
The hardest part of using a CDN is invalidation — telling edge servers to stop serving old content. You updated your logo, but the CDN still serves the old one because it hasn't expired yet.
Strategies:
- TTL-based expiry: Set a reasonable TTL (a core caching concept). After it expires, the edge server fetches a fresh copy. Simple but there's a staleness window.
- Cache busting with fingerprints: Include a content hash in the filename:
styles.a1b2c3.css. When the content changes, the filename changes, so the CDN treats it as a completely new resource. This is the gold standard for static assets. - Purge API: Most CDNs offer an API to immediately purge specific URLs or patterns.
POST /purge {"url": "/images/logo.png"}. Useful for emergencies but can be slow to propagate across all edge servers. - Soft purge / stale-while-revalidate: Mark content as stale but keep serving it while fetching a fresh copy in the background. Users never wait; they just might see slightly old content for a moment.
Major CDN Providers
CloudFront (AWS) — Amazon's CDN. Integrates tightly with S3, EC2, and Lambda@Edge. Over 400 edge locations worldwide. Great if you're already in the AWS ecosystem.
Cloudflare — Popular for its generous free tier and built-in DDoS protection. Operates one of the largest networks (300+ cities). Also offers Workers (serverless at the edge) and zero-trust security.
Akamai — The oldest and largest CDN. Powers about 30% of all web traffic. Used by major enterprises, banks, and media companies. Extensive but expensive.
Fastly — Known for extremely fast cache purging (< 150ms globally). Powers real-time content for companies like GitHub, Stripe, and the New York Times. Offers VCL for custom cache logic.
When to Use a CDN (and When Not To)
Use a CDN when:
- You serve static assets (images, videos, JS, CSS) to users worldwide.
- Your origin server is in one region but your users are global.
- You want to reduce bandwidth costs (CDN bandwidth is cheaper than origin bandwidth).
- You need DDoS protection (a rate limiter also helps here) (CDNs absorb attack traffic at the edge).
- You want faster page loads (CDN latency is typically 5-30ms vs 100-300ms from a distant origin).
Skip the CDN when:
- Your content is highly dynamic and personalized (e.g., real-time dashboards).
- All your users are in one geographic region near your server.
- You're serving sensitive data that must never be cached (financial transactions).
- You're in early development with minimal traffic — keep it simple first.