The Collaborative Editing Problem
The Fundamental Problem
Two people are editing the same document. Alice types "Hello" at position 0. At the exact same moment, Bob deletes the character at position 3. Neither knows about the other's edit yet — they're working on their local copies.
When these edits arrive at each other's machines, what happens? If you naively apply both edits, the result depends on the order they're applied — and Alice sees a different document than Bob.
This is the collaborative editing problem: how do you allow multiple users to edit shared state simultaneously and guarantee that everyone ends up with the same result?
It sounds simple. It's one of the hardest problems in distributed systems.
Why Simple Locking Doesn't Work
The first instinct: lock the document. Only one person can edit at a time. Problem solved, right?
Wrong. Locking kills the user experience:
- Granularity: Lock the whole document? Unusable for a 100-page doc with 10 editors. Lock per paragraph? Still frustrating. Lock per character? Absurd overhead.
- Latency: If the lock server is 200ms away, every keystroke has a 400ms round trip (request lock, wait for grant). Nobody types at 2.5 characters per second.
- Dead locks: User A locks paragraph 1, wants to edit paragraph 2. User B has paragraph 2 locked, wants paragraph 1. Deadlock.
- Failure: User locks a section, then their browser crashes. The lock is held until timeout. Other users are blocked.
Google Docs doesn't lock. Figma doesn't lock. Notion doesn't lock. The industry solved this differently — with algorithms that let everyone edit freely and reconcile the mess afterward.
Imagine a whiteboard in a room with 5 people, all drawing simultaneously. Locking is like saying "everyone raise your hand and wait for permission before each stroke." OT is like having a coordinator who watches everyone draw and adjusts their strokes in real-time to avoid conflicts. CRDTs are like giving each person a special pen that mathematically guarantees all drawings converge no matter what order the strokes are applied.
The Three Properties You Need
Any collaborative editing solution must guarantee three things:
1. Convergence
When all edits have been propagated to all clients, every client must see the exact same document. Not "similar." Not "close enough." Identical, byte-for-byte.
This is the non-negotiable requirement. If Alice sees "Hello World" and Bob sees "Helo World" after sync, the system is broken.
2. Intention Preservation
Each user's edit should achieve what they intended, even in the presence of concurrent edits. If Alice types "X" between "He" and "llo," the result should have "X" between "He" and "llo" — even if Bob simultaneously deleted a character elsewhere that shifted positions.
This is harder than it sounds. Position-based edits (insert at index 3) don't preserve intention when other edits shift indices.
3. Causality Preservation
If edit B was made with knowledge of edit A (B "happened after" A from the user's perspective), then B should be applied after A everywhere. Edits should never appear to travel backwards in time.
The Classic Example: Concurrent Insert and Delete
Let's trace through the problem concretely. Document starts as "ABCD".
- Alice: Insert "X" at position 1 (between A and B). Intended result:
"AXBCD" - Bob: Delete position 2 (the "C"). Intended result:
"ABD"
Both edits happen concurrently — neither knows about the other.
The problem: Bob's "delete position 2" meant "delete C." But after Alice's insert, position 2 is no longer C — it's B. Naively applying Bob's operation on Alice's document deletes the wrong character.
The correct result is "AXBD" — Alice's X is between A and B (her intention), and C is deleted (Bob's intention).
The Three Approaches
Approach 1: Locking (Pessimistic Concurrency)
Prevent conflicts by preventing concurrent access.
- How: Acquire a lock before editing, release when done
- Used by: Traditional databases, file systems, some simple wiki editors
- Pros: Simple to implement, no conflict resolution needed
- Cons: Terrible UX (waiting for locks), deadlock risk, failure handling complexity
Approach 2: Operational Transformation (OT)
Transform operations against each other to account for concurrent edits.
- How: When you receive a remote operation, transform it against all operations that happened concurrently, adjusting positions and content
- Used by: Google Docs, Google Sheets, Apache Wave
- Pros: Well-understood theory, works with a central server
- Cons: Extremely complex to implement correctly, N-client transform is hard, requires central server for ordering
Approach 3: Conflict-free Replicated Data Types (CRDTs)
Design data structures that mathematically guarantee convergence regardless of operation order.
- How: Each element has a unique ID. Operations reference IDs, not positions. Merge is commutative, associative, and idempotent.
- Used by: Figma, Notion (partially), Apple Notes, Linear, Yjs, Automerge
- Pros: No central server needed, works offline, mathematically proven convergence
- Cons: Higher memory overhead (IDs per element), some operations are complex to express
| Property | Locking | OT | CRDTs |
|---|---|---|---|
| Concurrent editing | No | Yes | Yes |
| Central server required | Yes (lock server) | Yes (transform server) | No (peer-to-peer possible) |
| Offline support | No | Limited | Yes (merge on reconnect) |
| Implementation complexity | Low | Very high | Medium-high |
| Memory overhead | Low | Low-medium | Higher (per-element IDs) |
| Convergence guarantee | N/A (no concurrent edits) | Proven (with correct transforms) | Mathematically proven |
| Industry adoption | Legacy systems | Google Docs | Figma, Notion, Linear, Apple Notes |
Real-World Implementations
Google Docs: OT with a Central Server
Google Docs uses a variant of the Jupiter protocol. The server acts as a single source of truth, sequencing all operations. Each client transforms incoming operations against its local pending operations. It works brilliantly but requires every edit to pass through Google's servers.
Figma: Custom CRDT
Figma built a custom CRDT for their design tool. Their data model (objects with properties on a canvas) maps well to CRDTs because conflicts are at the property level, not the character level. Two users changing different properties of the same shape merges cleanly. Two users changing the same property uses "last writer wins," which is acceptable for design properties.
Notion: Hybrid Approach
Notion uses a hybrid: block-level CRDTs for structure (adding, removing, reordering blocks) and OT-like techniques for text within blocks. This is pragmatic — text CRDT is the hardest part, and confining it to individual blocks limits the blast radius of complexity.
The Real Difficulty: Text
Most data types have natural conflict resolution. A key-value store? Last writer wins. A set? Union the sets. A counter? Sum the increments.
Text is the hard case. Characters have a linear order that must be preserved. Insertions at the same position must be deterministically ordered. Deletions must not leave "tombstones" that bloat memory forever. Undo must undo your edits, not other people's edits.
This is why collaborative text editing is the canonical hard problem. It's why Google spent years on OT, and why Yjs and Automerge represent years of PhD research.
- 1Convergence is non-negotiable — all clients must reach identical state after sync
- 2Intention preservation requires operations to be position-independent or transformed
- 3Locking trades UX for simplicity — acceptable only for infrequent, non-concurrent edits
- 4OT requires a central server for operation ordering — limits architecture choices
- 5CRDTs trade memory overhead for mathematical convergence guarantees and offline support
- 6Text is the hardest CRDT — every other data type has simpler natural merge strategies
System Design: Collaborative Spreadsheet
Design a collaborative spreadsheet editor for up to 50 concurrent editors. Cells contain text, numbers, or formulas that reference other cells. Discuss: how do you handle concurrent edits to the same cell? What about formula dependencies when referenced cells change? Should you use OT or CRDTs, and why? How do you handle a user editing a formula while another user deletes the referenced cell?