The Collaborative Editing Problem

advanced16 min read

The Fundamental Problem

Two people are editing the same document. Alice types "Hello" at position 0. At the exact same moment, Bob deletes the character at position 3. Neither knows about the other's edit yet — they're working on their local copies.

When these edits arrive at each other's machines, what happens? If you naively apply both edits, the result depends on the order they're applied — and Alice sees a different document than Bob.

This is the collaborative editing problem: how do you allow multiple users to edit shared state simultaneously and guarantee that everyone ends up with the same result?

It sounds simple. It's one of the hardest problems in distributed systems.

Why Simple Locking Doesn't Work

The first instinct: lock the document. Only one person can edit at a time. Problem solved, right?

Wrong. Locking kills the user experience:

Granularity: Lock the whole document? Unusable for a 100-page doc with 10 editors. Lock per paragraph? Still frustrating. Lock per character? Absurd overhead.
Latency: If the lock server is 200ms away, every keystroke has a 400ms round trip (request lock, wait for grant). Nobody types at 2.5 characters per second.
Dead locks: User A locks paragraph 1, wants to edit paragraph 2. User B has paragraph 2 locked, wants paragraph 1. Deadlock.
Failure: User locks a section, then their browser crashes. The lock is held until timeout. Other users are blocked.

Google Docs doesn't lock. Figma doesn't lock. Notion doesn't lock. The industry solved this differently — with algorithms that let everyone edit freely and reconcile the mess afterward.

Mental Model

Imagine a whiteboard in a room with 5 people, all drawing simultaneously. Locking is like saying "everyone raise your hand and wait for permission before each stroke." OT is like having a coordinator who watches everyone draw and adjusts their strokes in real-time to avoid conflicts. CRDTs are like giving each person a special pen that mathematically guarantees all drawings converge no matter what order the strokes are applied.

The Three Properties You Need

Any collaborative editing solution must guarantee three things:

1. Convergence

When all edits have been propagated to all clients, every client must see the exact same document. Not "similar." Not "close enough." Identical, byte-for-byte.

This is the non-negotiable requirement. If Alice sees "Hello World" and Bob sees "Helo World" after sync, the system is broken.

2. Intention Preservation

Each user's edit should achieve what they intended, even in the presence of concurrent edits. If Alice types "X" between "He" and "llo," the result should have "X" between "He" and "llo" — even if Bob simultaneously deleted a character elsewhere that shifted positions.

This is harder than it sounds. Position-based edits (insert at index 3) don't preserve intention when other edits shift indices.

3. Causality Preservation

If edit B was made with knowledge of edit A (B "happened after" A from the user's perspective), then B should be applied after A everywhere. Edits should never appear to travel backwards in time.

Quiz

Why is convergence alone not sufficient for a good collaborative editing experience?

ABCD

The Classic Example: Concurrent Insert and Delete

Let's trace through the problem concretely. Document starts as "ABCD".

Alice: Insert "X" at position 1 (between A and B). Intended result: "AXBCD"
Bob: Delete position 2 (the "C"). Intended result: "ABD"

Both edits happen concurrently — neither knows about the other.

Execution Trace

Initial

Document: ABCD (both Alice and Bob see this)

A=0, B=1, C=2, D=3

Alice's edit

Insert X at position 1

Alice sees: AXBCD

Bob's edit

Delete position 2 (C)

Bob sees: ABD

Alice receives Bob's edit

Delete position 2 on AXBCD would delete B, not C!

Alice's insert shifted everything right

Naive result at Alice

AXCD (wrong — Bob wanted to delete C, not B)

Intention NOT preserved

Correct result

AXBD (X inserted between A and B, C deleted)

Both intentions preserved

The problem: Bob's "delete position 2" meant "delete C." But after Alice's insert, position 2 is no longer C — it's B. Naively applying Bob's operation on Alice's document deletes the wrong character.

The correct result is "AXBD" — Alice's X is between A and B (her intention), and C is deleted (Bob's intention).

The Three Approaches

Approach 1: Locking (Pessimistic Concurrency)

Prevent conflicts by preventing concurrent access.

How: Acquire a lock before editing, release when done
Used by: Traditional databases, file systems, some simple wiki editors
Pros: Simple to implement, no conflict resolution needed
Cons: Terrible UX (waiting for locks), deadlock risk, failure handling complexity

Approach 2: Operational Transformation (OT)

Transform operations against each other to account for concurrent edits.

How: When you receive a remote operation, transform it against all operations that happened concurrently, adjusting positions and content
Used by: Google Docs, Google Sheets, Apache Wave
Pros: Well-understood theory, works with a central server
Cons: Extremely complex to implement correctly, N-client transform is hard, requires central server for ordering

Approach 3: Conflict-free Replicated Data Types (CRDTs)

Design data structures that mathematically guarantee convergence regardless of operation order.

How: Each element has a unique ID. Operations reference IDs, not positions. Merge is commutative, associative, and idempotent.
Used by: Figma, Notion (partially), Apple Notes, Linear, Yjs, Automerge
Pros: No central server needed, works offline, mathematically proven convergence
Cons: Higher memory overhead (IDs per element), some operations are complex to express

Property	Locking	OT	CRDTs
Concurrent editing	No	Yes	Yes
Central server required	Yes (lock server)	Yes (transform server)	No (peer-to-peer possible)
Offline support	No	Limited	Yes (merge on reconnect)
Implementation complexity	Low	Very high	Medium-high
Memory overhead	Low	Low-medium	Higher (per-element IDs)
Convergence guarantee	N/A (no concurrent edits)	Proven (with correct transforms)	Mathematically proven
Industry adoption	Legacy systems	Google Docs	Figma, Notion, Linear, Apple Notes

Quiz

Why has the industry been moving from OT toward CRDTs for new collaborative applications?

ABCD

Real-World Implementations

Google Docs: OT with a Central Server

Google Docs uses a variant of the Jupiter protocol. The server acts as a single source of truth, sequencing all operations. Each client transforms incoming operations against its local pending operations. It works brilliantly but requires every edit to pass through Google's servers.

Figma: Custom CRDT

Figma built a custom CRDT for their design tool. Their data model (objects with properties on a canvas) maps well to CRDTs because conflicts are at the property level, not the character level. Two users changing different properties of the same shape merges cleanly. Two users changing the same property uses "last writer wins," which is acceptable for design properties.

Notion: Hybrid Approach

Notion uses a hybrid: block-level CRDTs for structure (adding, removing, reordering blocks) and OT-like techniques for text within blocks. This is pragmatic — text CRDT is the hardest part, and confining it to individual blocks limits the blast radius of complexity.

Collaborative Editing PipelinePhase 1 / 5

Phase 1 / 5Local Edit

User types, local state updates immediately

1/5

The Real Difficulty: Text

Most data types have natural conflict resolution. A key-value store? Last writer wins. A set? Union the sets. A counter? Sum the increments.

Text is the hard case. Characters have a linear order that must be preserved. Insertions at the same position must be deterministically ordered. Deletions must not leave "tombstones" that bloat memory forever. Undo must undo your edits, not other people's edits.

This is why collaborative text editing is the canonical hard problem. It's why Google spent years on OT, and why Yjs and Automerge represent years of PhD research.

Quiz

Which aspect of collaborative editing makes text specifically harder than other data types?

ABCD

Key Rules

1Convergence is non-negotiable — all clients must reach identical state after sync
2Intention preservation requires operations to be position-independent or transformed
3Locking trades UX for simplicity — acceptable only for infrequent, non-concurrent edits
4OT requires a central server for operation ordering — limits architecture choices
5CRDTs trade memory overhead for mathematical convergence guarantees and offline support
6Text is the hardest CRDT — every other data type has simpler natural merge strategies

Interview Question

System Design: Collaborative Spreadsheet

Design a collaborative spreadsheet editor for up to 50 concurrent editors. Cells contain text, numbers, or formulas that reference other cells. Discuss: how do you handle concurrent edits to the same cell? What about formula dependencies when referenced cells change? Should you use OT or CRDTs, and why? How do you handle a user editing a formula while another user deletes the referenced cell?