Social & Communication13 min read

Design a Collaborative Editor

Multiple cursors, one document β€” real-time editing at scale
scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

Think about Google Docs: multiple people editing the same document at the same time, seeing each other's cursors move in real time. How does that work without everyone's changes stepping on each other?

Let's define the requirements:

Functional Requirements:

  • Create & edit documents β€” rich text editing with formatting (bold, italic, headings, etc.).
  • Real-time collaboration β€” multiple users editing simultaneously with changes appearing in <100ms.
  • Cursor presence β€” see where other collaborators are typing, with colored cursors and names.
  • Version history β€” view and restore previous versions of the document.
  • Commenting β€” add comments anchored to specific text ranges.

Non-Functional Requirements:

  • Low latency sync: Changes must propagate to all collaborators in under 100ms for a fluid experience.
  • Conflict-free: When two users edit the same paragraph simultaneously, the result must be deterministic and correct β€” no lost characters.
  • Offline support: Users should be able to edit offline and sync when reconnected.
  • Eventual consistency: All clients must converge to the same document state, regardless of operation order.
β–Έ The idea: multiple cursors, one document

Estimation

Let's size this system:

  • 100M documents total across the platform.
  • 10M concurrent editing sessions at peak β€” documents actively being edited.
  • Average 3 editors per document β€” so ~30M concurrent WebSocket connections.
  • 50M operations per minute β€” each keystroke, cursor move, or formatting change is an operation.
  • Operation size: ~100-200 bytes per op (type + position + content + metadata).
  • Storage: Average document ~50KB text + op history. 100M docs Γ— 50KB = ~5 TB for documents alone. Op logs grow much larger.

The key challenges: (1) transforming concurrent operations correctly, (2) keeping WebSocket connections alive at scale, and (3) storing and replaying operation history efficiently.

Conflict Resolution: OT vs CRDT

The core technical challenge of collaborative editing is: what happens when two users edit the same spot at the same time?

Operational Transformation (OT)

  • The approach used by Google Docs. Each edit is an "operation" (insert, delete, format).
  • When the server receives concurrent operations, it transforms them against each other to adjust positions.
  • Requires a central server to determine the canonical operation order.
  • Proven at massive scale but complex to implement correctly (the transformation functions have subtle edge cases).

Conflict-free Replicated Data Types (CRDTs)

  • A newer approach where the data structure itself guarantees convergence β€” no central server needed.
  • Each character gets a unique, ordered ID. Merging is automatic regardless of operation order.
  • Great for peer-to-peer scenarios and offline editing.
  • Higher memory overhead (each character carries metadata) but simpler conflict logic.

In practice: OT is battle-tested for server-centric architectures (Google Docs, Etherpad). CRDTs are gaining ground for offline-first and P2P apps (Figma uses a CRDT-like approach).

β–Έ OT vs CRDT: resolving conflicts
Click chart to zoom
OT in action: the server transforms concurrent operations so all clients converge to the same document state

OT in Detail

The heart of OT is the transform(op1, op2) function. Given two operations that were applied concurrently to the same document state, it produces adjusted versions that can be applied in sequence.

Classic example:

  • Document state: "ABCDEFGH"
  • User A: inserts "Hello" at position 5 β†’ "ABCDEHelloFGH"
  • User B: deletes character at position 3 (the "D") β†’ "ABCEFGH"
  • These happened concurrently β€” neither user saw the other's edit.

Server transforms:

  • B deleted at position 3, which is before A's insert at position 5.
  • So A's insert position shifts left by 1: position 5 β†’ position 4.
  • Result after applying both: "ABCEHelloFGH" β€” both users converge.

The transformation rules get complex with overlapping ranges, multiple concurrent users, and rich text formatting β€” but the principle is always: adjust positions based on what other operations did before you.

β–Έ Operation transformation example

Supporting Features

Cursor Presence:

  • Each client sends cursor position updates via WebSocket (throttled to ~10 updates/sec).
  • The server broadcasts cursor positions to all collaborators in the same document.
  • Cursor positions are transformed along with text operations β€” if someone inserts text before your cursor, your cursor shifts right.

Commenting:

  • Comments are anchored to text ranges (start position, end position).
  • When text is edited around a comment anchor, the anchor positions are transformed using the same OT logic.
  • Comments are stored separately from document content, linked by range markers.

Version History:

  • Periodically snapshot the full document state (every N operations or every M minutes).
  • Between snapshots, store the operation log (append-only).
  • To restore a version: load the nearest snapshot, then replay operations forward.
  • This is the same pattern databases use with WAL (Write-Ahead Log) + checkpoints.

Permissions:

  • Document-level access: owner, editor, commenter, viewer.
  • Checked on WebSocket connection and on every operation received.
  • Real-time permission changes must disconnect/reconnect affected clients.
β–Έ Full architecture
Note: Interview tip: OT is proven at scale (Google Docs has used it for 15+ years). CRDTs are simpler conceptually for P2P and offline scenarios but use more memory. In an interview, mention both approaches and explain why you'd choose one over the other based on requirements. If the system is server-centric, go with OT. If offline-first or P2P matters, lean toward CRDTs.

Key Metrics

Sync latency (local)
Local apply is instant; server roundtrip adds network delay
<50 ms \(O(1)\)
Transform cost (OT)
n = number of concurrent ops to transform against
~1-5 ms \(O(n)\)
Document storage
Text content per document
~50 KB avg β€”
Op log storage
Append-only log across 100M documents
~5 TB β€”
Concurrent editors/doc
k = concurrent editors; transform pairs grow quadratically
Up to ~100 \(O(k^2)\)
WebSocket connections
10M sessions Γ— 3 editors avg
~30M peak β€”

Quick check

What is the main advantage of OT (Operational Transformation) over CRDTs for collaborative editing?

Continue reading