Skip to content

Message Deduplication and Ordering

advanced17 min read

Networks Are Liars

You send a message. The server processes it. The acknowledgment gets lost. Your client doesn't know the server received it, so it retries. Now the server has processed the same message twice.

Or: you send messages A, B, C in order. They take different network paths. The server receives A, C, B. Or A, A, C (B got lost and A was retried).

These aren't edge cases. These are normal conditions in distributed systems. TCP handles ordering within a single connection, but the moment you have reconnections, multiple connections, or a load balancer routing requests to different servers, all ordering bets are off.

Idempotency: The Foundation of Deduplication

An operation is idempotent if performing it multiple times produces the same result as performing it once. "Set balance to $100" is idempotent. "Add $100 to balance" is not.

For non-idempotent operations, you need idempotency keys — unique identifiers that let the server recognize and discard duplicates.

function generateIdempotencyKey(): string {
  return `${Date.now()}-${crypto.randomUUID()}`;
}

class IdempotentSender {
  private pendingKeys = new Map<string, { message: string; timestamp: number }>();

  send(ws: WebSocket, type: string, payload: unknown): string {
    const key = generateIdempotencyKey();
    const message = JSON.stringify({ key, type, payload });

    this.pendingKeys.set(key, { message, timestamp: Date.now() });
    ws.send(message);

    return key;
  }

  acknowledge(key: string): void {
    this.pendingKeys.delete(key);
  }

  retryPending(ws: WebSocket): void {
    for (const [key, { message }] of this.pendingKeys) {
      ws.send(message);
    }
  }
}

On the server side, you need a deduplication layer:

class DeduplicationLayer {
  private seen = new Map<string, { result: unknown; expiresAt: number }>();
  private cleanupInterval: ReturnType<typeof setInterval>;

  constructor(private ttlMs = 300_000) {
    this.cleanupInterval = setInterval(() => this.cleanup(), 60_000);
  }

  async process<T>(
    key: string,
    handler: () => Promise<T>
  ): Promise<{ result: T; duplicate: boolean }> {
    const existing = this.seen.get(key);
    if (existing) {
      return { result: existing.result as T, duplicate: true };
    }

    const result = await handler();
    this.seen.set(key, { result, expiresAt: Date.now() + this.ttlMs });
    return { result, duplicate: false };
  }

  private cleanup(): void {
    const now = Date.now();
    for (const [key, entry] of this.seen) {
      if (entry.expiresAt <= now) {
        this.seen.delete(key);
      }
    }
  }
}
Mental Model

Think of idempotency keys like receipt numbers. When you buy coffee and the card machine times out, you don't know if you were charged. With a receipt number, the barista can check: "Did this exact transaction already go through?" If yes, they don't charge again. If no, they process it. The receipt number is your idempotency key.

Quiz
Why can't you rely on message content alone for deduplication?

Sequence Numbers: Ordering Within a Stream

When you need messages processed in order, assign monotonically increasing sequence numbers:

class SequencedStream {
  private nextSeq = 0;
  private buffer = new Map<number, unknown>();
  private expectedSeq = 0;

  send(ws: WebSocket, payload: unknown): void {
    const seq = this.nextSeq++;
    ws.send(JSON.stringify({ seq, payload }));
  }

  receive(
    seq: number,
    payload: unknown,
    onOrdered: (payload: unknown) => void
  ): void {
    if (seq < this.expectedSeq) {
      return; // duplicate, already processed
    }

    this.buffer.set(seq, payload);

    while (this.buffer.has(this.expectedSeq)) {
      onOrdered(this.buffer.get(this.expectedSeq));
      this.buffer.delete(this.expectedSeq);
      this.expectedSeq++;
    }
  }
}

This gives you total ordering within a single stream. Messages are buffered until all predecessors arrive, then delivered in sequence. Simple and effective for a single client-server connection.

But it breaks the moment you have multiple senders. Client A sends message 5, Client B sends message 5 — whose "5" comes first? Sequence numbers are local; they don't capture causality across multiple participants.

Lamport Timestamps: Causal Ordering Across Nodes

Leslie Lamport's 1978 paper gave us a beautifully simple idea: a logical clock that captures the "happened before" relationship.

Each node maintains a counter. The rules:

  1. Before sending a message, increment your counter.
  2. When receiving a message, set your counter to max(your counter, received counter) + 1.
  3. If event A's timestamp is less than event B's, A might have happened before B.
class LamportClock {
  private time = 0;

  tick(): number {
    return ++this.time;
  }

  send(): number {
    return this.tick();
  }

  receive(remoteTime: number): number {
    this.time = Math.max(this.time, remoteTime) + 1;
    return this.time;
  }

  now(): number {
    return this.time;
  }
}
Info

Lamport timestamps give you a partial ordering. If timestamp(A) < timestamp(B), you know A didn't happen after B. But two events with different timestamps might be concurrent (neither caused the other). To detect true concurrency, you need vector clocks.

Quiz
Node X has Lamport time 5 and receives a message with Lamport time 8. What is Node X's new time?

Vector Clocks: True Causality Detection

A vector clock is a Lamport clock per node. Each node maintains a vector of counters — one for every node in the system. This lets you distinguish between "A happened before B" and "A and B are concurrent."

class VectorClock {
  private clock: Map<string, number>;

  constructor(private nodeId: string, nodeIds: string[]) {
    this.clock = new Map(nodeIds.map((id) => [id, 0]));
  }

  tick(): Map<string, number> {
    this.clock.set(this.nodeId, (this.clock.get(this.nodeId) ?? 0) + 1);
    return new Map(this.clock);
  }

  send(): Map<string, number> {
    return this.tick();
  }

  receive(remote: Map<string, number>): Map<string, number> {
    for (const [nodeId, time] of remote) {
      this.clock.set(
        nodeId,
        Math.max(this.clock.get(nodeId) ?? 0, time)
      );
    }
    this.clock.set(this.nodeId, (this.clock.get(this.nodeId) ?? 0) + 1);
    return new Map(this.clock);
  }

  static compare(
    a: Map<string, number>,
    b: Map<string, number>
  ): 'before' | 'after' | 'concurrent' {
    let aBeforeB = false;
    let bBeforeA = false;

    const allKeys = new Set([...a.keys(), ...b.keys()]);
    for (const key of allKeys) {
      const aVal = a.get(key) ?? 0;
      const bVal = b.get(key) ?? 0;
      if (aVal < bVal) aBeforeB = true;
      if (aVal > bVal) bBeforeA = true;
    }

    if (aBeforeB && !bBeforeA) return 'before';
    if (bBeforeA && !aBeforeB) return 'after';
    return 'concurrent';
  }
}

Comparing two vector clocks:

  • A "happened before" B if every entry in A is less than or equal to the corresponding entry in B, and at least one is strictly less.
  • A and B are "concurrent" if neither dominates the other (A has some entries greater than B, and B has some entries greater than A).
Execution Trace
Init
Alice: {A:0, B:0}, Bob: {A:0, B:0}
Both clocks start at zero
Alice sends msg1
Alice ticks: {A:1, B:0}. Sends {A:1, B:0} to Bob.
Alice's clock advances her own entry
Bob receives msg1
Bob merges: max({A:0,B:0}, {A:1,B:0}) then ticks B. Result: {A:1, B:1}
Bob now knows about Alice's event
Bob sends msg2
Bob ticks: {A:1, B:2}. Sends {A:1, B:2} to Alice.
Bob's entry advances
Alice sends msg3 (before receiving msg2)
Alice ticks: {A:2, B:0}. Sends {A:2, B:0}.
Alice hasn't seen Bob's update yet
Comparison
msg2 {A:1,B:2} vs msg3 {A:2,B:0}: CONCURRENT
Neither vector dominates — these events are truly concurrent
Quiz
Given vector clocks V1 = {A:2, B:3, C:1} and V2 = {A:2, B:4, C:1}, what is the relationship?

Causal Ordering vs Total Ordering

These are fundamentally different guarantees:

PropertyCausal OrderingTotal Ordering
GuaranteeIf A caused B, A is delivered before BAll nodes see all events in the same order
Concurrent eventsCan be delivered in any orderAssigned a deterministic order
ComplexityVector clocks (O(n) per node)Consensus protocol (Raft, Paxos)
LatencyLow (no coordination needed)Higher (requires agreement)
Use caseChat messages, collaborative editingFinancial transactions, leader election

For most real-time frontend applications, causal ordering is sufficient. You don't need all clients to see messages in the exact same order — you just need that if message B is a reply to message A, every client sees A before B.

Total ordering requires a consensus protocol (Raft, Paxos, or a centralized sequencer), which adds latency and complexity. It's necessary for financial transactions or any scenario where "who went first" has legal or monetary consequences.

The "Exactly Once" Myth

Here's the uncomfortable truth that every distributed systems textbook will tell you: exactly-once delivery is impossible in a system with unreliable networks.

You can have:

  • At-most-once: Fire and forget. If the ACK is lost, don't retry. Message might be lost.
  • At-least-once: Retry until acknowledged. Message might be delivered multiple times.
  • Effectively-once: At-least-once delivery + idempotent processing. The message arrives at least once, but duplicate processing is detected and suppressed.

"Exactly once" is marketing language. What systems actually implement is "effectively once" — the combination of reliable delivery (retries) and idempotent processing (deduplication).

Key Rules
  1. 1Every non-idempotent operation needs an idempotency key generated by the sender
  2. 2Server-side deduplication must have a TTL — you cannot store every key forever
  3. 3Sequence numbers only provide ordering within a single stream, not across streams
  4. 4Lamport timestamps capture partial causality; vector clocks capture full causality
  5. 5Exactly-once delivery is a myth — design for at-least-once with idempotent processing
What developers doWhat they should do
Assuming TCP guarantees message ordering across reconnections
TCP guarantees ordering within a single connection. When the connection drops and reconnects, it's a new TCP stream — ordering between the old and new stream is not guaranteed.
Implement application-level sequence numbers for ordering guarantees
Using wall-clock timestamps for event ordering
Wall clocks drift, can jump backwards (NTP adjustments), and are not synchronized across machines. Two events 1ms apart on different machines have no reliable ordering based on wall time alone.
Use logical clocks (Lamport or vector clocks) for causal ordering
Storing idempotency keys forever
Idempotency keys grow without bound if not cleaned up. A 5-minute TTL covers retries during normal operations. If a client retries after 15 minutes, it's reasonable to treat it as a new operation.
Set a TTL (typically 5-15 minutes) and expire old keys
Interview Question

Design: Chat Message Ordering

Design the message ordering system for a group chat with 100 participants across 5 geographic regions. Messages should appear in causal order (replies after the message they reply to) but you don't need total ordering. Handle: network partitions, reconnection after 30 seconds offline, and a participant on a slow connection (500ms+ latency). What data structures do you use for ordering? How do you handle the case where two people reply to the same message simultaneously?