Skip to content

Yjs and Practical CRDTs

advanced18 min read

From Theory to Production

The previous topics gave you the theory. Now let's build something real. Yjs is the most widely used CRDT library in the JavaScript ecosystem — it powers collaborative features in Notion-like editors, Figma clones, and production apps serving millions of users.

What makes Yjs special isn't just correctness (all CRDTs give you that). It's the engineering: a highly optimized encoding format, a provider system that separates sync from data structure, and an ecosystem of editor integrations that let you add collaboration to an existing editor in under 100 lines of code.

Yjs Architecture

The core abstraction in Yjs is the Y.Doc — a document that contains shared types. Shared types are the CRDT data structures: Y.Text, Y.Array, Y.Map, Y.XmlFragment.

import * as Y from 'yjs';

const doc = new Y.Doc();

const text = doc.getText('editor');
const todos = doc.getArray<{ title: string; done: boolean }>('todos');
const metadata = doc.getMap<string>('metadata');

text.insert(0, 'Hello, world!');
todos.push([{ title: 'Learn CRDTs', done: false }]);
metadata.set('author', 'Alice');

Every shared type in a Y.Doc is identified by a string name. Two clients connecting to the same document will automatically sync these shared types.

Mental Model

Think of Y.Doc as a shared database that syncs automatically. Y.Text is a text column, Y.Array is a list table, Y.Map is a key-value store. You read and write locally (instant), and Yjs handles replication in the background. When a remote change arrives, your local state updates automatically — like a database subscription that's always active.

Shared Types in Detail

Y.Text

The workhorse for collaborative text editing. Under the hood, it uses a YATA (Yet Another Transformation Approach) CRDT — a variant optimized for text that handles concurrent insertions at the same position deterministically.

const text = doc.getText('content');

text.insert(0, 'Hello ');
text.insert(6, 'World');

text.format(0, 5, { bold: true });

text.delete(5, 1);

text.observe((event) => {
  for (const delta of event.delta) {
    if (delta.insert) {
      // Characters were inserted
    }
    if (delta.delete) {
      // Characters were deleted
    }
    if (delta.retain) {
      // Characters were formatted (attributes changed)
    }
  }
});

Y.Text supports rich text via formatting attributes. These are stored as metadata on character ranges, and they merge correctly when concurrent edits affect overlapping ranges.

Y.Array

A collaborative list. Elements can be any JSON-serializable value or nested shared types.

const list = doc.getArray<string>('items');

list.push(['item1', 'item2']);
list.insert(1, ['inserted']);
list.delete(0, 1);

const nested = new Y.Map();
nested.set('key', 'value');
list.push([nested]);

Y.Map

A collaborative key-value store. Last-writer-wins semantics for values (like an LWW-Register per key).

const map = doc.getMap('settings');

map.set('theme', 'dark');
map.set('fontSize', 14);

map.observe((event) => {
  for (const [key, change] of event.changes.keys) {
    if (change.action === 'add') { /* new key */ }
    if (change.action === 'update') { /* value changed */ }
    if (change.action === 'delete') { /* key removed */ }
  }
});
Quiz
What conflict resolution strategy does Y.Map use when two clients set the same key concurrently?

The Provider System

Yjs separates data structure from transport. Providers handle sync — you can use multiple providers simultaneously:

import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';
import { IndexeddbPersistence } from 'y-indexeddb';

const doc = new Y.Doc();

const wsProvider = new WebsocketProvider(
  'wss://your-server.com',
  'document-room-id',
  doc
);

wsProvider.on('status', (event: { status: string }) => {
  // 'connecting', 'connected', 'disconnected'
});

const indexeddbProvider = new IndexeddbPersistence(
  'document-room-id',
  doc
);

indexeddbProvider.whenSynced.then(() => {
  // Local data loaded from IndexedDB
});

With both providers active, your document:

  1. Loads instantly from IndexedDB (offline-first)
  2. Syncs with other clients via WebSocket
  3. Persists every change to IndexedDB automatically
  4. Works offline — changes queue and sync when reconnected

The WebSocket Server

The y-websocket package includes a minimal server:

import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 1234 });

// y-websocket server handles:
// - Document state sync on connection
// - Broadcasting updates to all clients in a room
// - Awareness (cursor positions, user info)

For production, you'll want a more robust server. The Yjs community offers Hocuspocus — a full-featured Yjs server with authentication, persistence hooks, webhook support, and horizontal scaling.

Building a Collaborative Editor with Tiptap

Tiptap is a headless rich text editor built on ProseMirror. The @tiptap/extension-collaboration package integrates Yjs directly:

import { Editor } from '@tiptap/core';
import StarterKit from '@tiptap/starter-kit';
import Collaboration from '@tiptap/extension-collaboration';
import CollaborationCursor from '@tiptap/extension-collaboration-cursor';
import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';

const doc = new Y.Doc();
const provider = new WebsocketProvider('wss://collab.example.com', 'doc-123', doc);

const editor = new Editor({
  extensions: [
    StarterKit.configure({ history: false }),
    Collaboration.configure({
      document: doc,
    }),
    CollaborationCursor.configure({
      provider,
      user: {
        name: 'Alice',
        color: '#f59e0b',
      },
    }),
  ],
});

That's it. You now have a collaborative rich text editor with real-time cursors. The history: false is important — Tiptap's built-in undo/redo doesn't understand collaborative edits. Use @tiptap/extension-collaboration which provides Yjs-aware undo that only undoes your changes, not other people's.

Info

Always disable the editor's built-in history when using collaboration. Built-in undo/redo tracks all operations linearly — it will undo other users' edits. Yjs-aware undo uses Y.UndoManager, which tracks only the local user's operations and transforms undo operations against concurrent remote changes.

Quiz
Why must you disable the editor's built-in undo/redo when adding Yjs collaboration?

Yjs Encoding and Performance

One of Yjs's key innovations is its encoding format. Updates are serialized as compact binary using a custom encoding:

const update = Y.encodeStateAsUpdate(doc);

Y.applyUpdate(remoteDoc, update);

const stateVector = Y.encodeStateVector(doc);
const diff = Y.encodeStateAsUpdate(doc, stateVector);

const mergedUpdate = Y.mergeUpdates([update1, update2, update3]);

Performance characteristics:

  • Document encoding: A 100,000-character document typically encodes to 30-100KB (depending on edit history)
  • Incremental updates: Individual keystrokes encode to 5-20 bytes
  • Merge: Merging 10,000 updates into a single update takes ~10ms
  • Apply: Applying a 100KB update to an empty document takes ~5ms

Yjs achieves this through several optimizations:

  1. Struct compression: Consecutive operations from the same client are stored in runs
  2. State vectors: Only missing updates are sent during sync (not the full document)
  3. Lazy encoding: Internal structures are only serialized when needed

Yjs's YATA algorithm differs from classic CRDTs in a subtle but important way. Traditional list CRDTs like RGA use a linked list of individually identified elements. YATA groups consecutive insertions from the same client into "structs" — a single metadata entry for a run of characters. This dramatically reduces memory overhead. A 100-character run inserted by one client is one struct, not 100 individual CRDT elements. This is why Yjs can handle documents with millions of characters while naive CRDT implementations struggle past 10,000.

Yjs vs Automerge

Both are production-quality CRDT libraries. The choice depends on your use case.

FeatureYjsAutomerge
LanguageJavaScript (primary), Rust bindingsRust core with JS/Swift/Go bindings
Document modelShared types (Text, Array, Map)JSON-like document tree
Text editingY.Text with formatting supportAutomerge.Text (simpler API)
Encoding sizeVery compact (custom binary)Compact (uses columnar encoding)
PerformanceFastest for text-heavy workloadsFaster for document-level operations
Editor integrationsProseMirror, Tiptap, Monaco, CodeMirrorProseMirror, CodeMirror
Persistencey-indexeddb, manual encodingBuilt-in save/load, automerge-repo
Sync protocoly-websocket, y-webrtcautomerge-repo with pluggable network
Maturity8+ years, widely adoptedAutomerge 3 (2024) with ~10x memory reduction, rapidly maturing
Bundle size~15KB gzipped (core)~100KB gzipped (WASM)

Choose Yjs when: you're building a text editor, need the smallest bundle, want the widest ecosystem of editor integrations, or need maximum performance for text operations.

Choose Automerge when: your data model is complex JSON (nested objects, not primarily text), you want a Rust core for non-JS platforms, or you prefer the automerge-repo sync abstraction.

Quiz
For a collaborative code editor (similar to VS Code Live Share), which CRDT library and editor combination would be most appropriate?

Practical Limits

Yjs is fast, but it's not magic. Know the limits:

  • Document size: Documents up to ~10MB of encoded state work well. Beyond that, initial sync time becomes noticeable.
  • Update frequency: Yjs handles hundreds of updates per second easily. Thousands per second (e.g., high-frequency sensor data) may need throttling.
  • Number of collaborators: 50-100 simultaneous editors on one document works. More than that, and presence updates (cursor positions) become the bottleneck, not the CRDT operations.
  • History/tombstones: Deleted content stays as tombstones. Over time, tombstones accumulate. Yjs provides garbage collection (doc.gc = true), which compacts tombstones but loses the ability to undo past the GC point.
What developers doWhat they should do
Creating a new Y.Doc for every component that needs shared state
Y.Doc is the unit of sync. All shared types within one doc sync together atomically. Multiple docs mean multiple sync sessions, multiple connections, and no cross-type transactions.
Use a single Y.Doc per collaborative session and get shared types by name
Using Y.Map for ordered lists
Y.Map has no ordering guarantee for keys. Y.Array preserves insertion order and handles concurrent insertions correctly via the CRDT algorithm.
Use Y.Array for ordered collections, Y.Map for key-value data
Sending full document state on every change
Full state sync on every change wastes bandwidth. Yjs state vectors let you compute a minimal diff — only the updates the remote hasn't seen.
Use state vectors and incremental updates via Y.encodeStateAsUpdate(doc, remoteStateVector)
Not setting doc.gc = true for long-lived documents
Without GC, tombstones from deleted content accumulate forever, bloating document size. GC compacts them but prevents undo past the GC point — an acceptable tradeoff for most applications.
Enable garbage collection for documents that accumulate large edit histories
Key Rules
  1. 1Y.Doc is the unit of sync — one doc per collaborative session, shared types accessed by name
  2. 2Providers separate transport from data: use WebSocket for real-time, IndexedDB for persistence, both for offline-first
  3. 3Always disable built-in editor undo/redo and use Y.UndoManager for collaborative-aware undo
  4. 4Use state vectors for efficient sync — never send full state when incremental updates suffice
  5. 5Enable doc.gc for long-lived documents to prevent tombstone accumulation
Interview Question

Architecture: Collaborative Document Platform

Design a Notion-like platform supporting 10,000 concurrent documents with up to 20 editors each. Cover: Yjs document lifecycle (creation, loading, persistence, cleanup), server architecture (one WebSocket connection per document? multiplexed?), handling large documents (100+ pages), offline editing on mobile, and the sync protocol when a user opens a document for the first time vs reconnecting after 5 minutes offline.