Streaming AI Responses with SSE

advanced18 min read

The 3-Second Rule That Changed AI UIs

Try this experiment: open ChatGPT and ask it something complex. Now imagine if instead of seeing tokens appear word-by-word, you stared at a spinner for 8 seconds and then the entire response popped in at once. Same content, same wait time, same answer. But the experience is completely different.

Streaming isn't about speed. The model takes the same time to generate the full response either way. Streaming is about perceived latency. When the first token appears in 200ms instead of 8 seconds, the user's brain switches from "is this broken?" to "it's thinking, and I can already start reading." That shift is the difference between an AI product people love and one they abandon.

Every major AI product — ChatGPT, Claude, Gemini, Copilot — streams responses. Not because it's trendy, but because the psychology of waiting demands it. And the protocol powering all of them? Server-Sent Events.

Mental Model

Think of SSE like a news ticker on a TV screen. You don't wait for the entire day's news to be compiled before the ticker starts scrolling. As soon as the newsroom has one headline ready, it pushes it to the screen. Viewers start reading immediately while new headlines keep arriving. The connection stays open, the data flows one direction (server to client), and each item is a self-contained event. That's SSE — a persistent one-way channel where the server pushes events as they become available.

What SSE Actually Is

Server-Sent Events is a dead-simple protocol built on top of HTTP. The server responds with Content-Type: text/event-stream and keeps the connection open, sending structured text events over time.

Here's the raw wire format:

event: message_start
data: {"type":"message_start","message":{"id":"msg_01X","role":"assistant"}}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}}

event: message_stop
data: {"type":"message_stop"}

Each event has a clear structure:

event: — the event type (optional, defaults to "message")
data: — the payload (can span multiple lines with data: prefix on each)
id: — a unique event ID for reconnection (optional)
retry: — reconnection interval in milliseconds (optional)
Events are separated by a blank line (two newlines: \n\n)

That's the entire protocol. No binary framing, no handshake negotiation, no magic bytes. Just structured text over a long-lived HTTP response.

Quiz

What Content-Type header does a server send to establish an SSE connection?

ABCD

SSE vs WebSocket vs Long Polling

Before we go deeper, let's settle when to use what:

Feature	SSE	WebSocket	Long Polling
Direction	Server → Client only	Bidirectional	Server → Client only
Protocol	HTTP/1.1 or HTTP/2	WS (upgrade from HTTP)	HTTP (repeated requests)
Auto reconnect	Built-in	Manual	Manual
Binary data	Text only	Text + Binary	Either
Auth headers	Via EventSource: No custom headers	Via handshake only	Per request
Best for	LLM streaming, live feeds	Chat, gaming, real-time collab	Legacy fallback

For LLM streaming, SSE wins. The data flows one direction (server to client), it's always text (JSON events), and you don't need bidirectional communication for token streaming. WebSocket is overkill — you'd be establishing a persistent bidirectional channel just to read from it.

The EventSource API (and Why You Won't Use It)

The browser ships with a built-in EventSource API for SSE:

const source = new EventSource('/api/stream');

source.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  console.log(data);
});

source.addEventListener('error', (event) => {
  console.error('Connection lost, reconnecting...');
});

Clean, simple, automatic reconnection. Sounds perfect. So why does every AI product ignore it?

Three deal-breaking limitations:

GET only — EventSource can only make GET requests. LLM APIs require POST with a JSON body containing the messages, model, temperature, etc.
No custom headers — you can't set Authorization: Bearer sk-... or any custom headers. LLM APIs always require authentication headers.
No request body — even if you could POST, there's no way to send a request body.

These aren't edge cases — they're fundamental requirements for any AI API. The EventSource API was designed for simple server-push scenarios like stock tickers or notification feeds. LLM streaming needs something more flexible.

Quiz

Why is the built-in EventSource API unsuitable for streaming LLM API responses?

ABCD

fetch + ReadableStream: The Real Pattern

Here's what production AI apps actually use — fetch with ReadableStream:

async function streamChat(messages) {
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': API_KEY,
      'anthropic-version': '2023-06-01',
    },
    body: JSON.stringify({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      stream: true,
      messages,
    }),
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });

    const lines = buffer.split('\n');
    buffer = lines.pop() ?? '';

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const json = line.slice(6);
        if (json === '[DONE]') return;

        const event = JSON.parse(json);
        if (event.type === 'content_block_delta') {
          process.stdout.write(event.delta.text);
        }
      }
    }
  }
}

Let's break down why every piece matters:

response.body.getReader() — gives you a ReadableStreamDefaultReader that reads chunks as they arrive, not after the full response downloads
TextDecoder with { stream: true } — handles multi-byte UTF-8 characters that might be split across chunks
Line buffer — SSE events are line-delimited, but network chunks don't respect line boundaries. A chunk might end mid-line, so you keep the incomplete last line in a buffer
lines.pop() — the last element after splitting might be an incomplete line, so you save it for the next chunk

Common Trap

Never use decoder.decode(value) without { stream: true } when processing a stream. Without it, the decoder treats each chunk as a complete message, which corrupts multi-byte characters (like emoji or non-ASCII text) that get split across chunk boundaries. You'll see garbled output intermittently — the kind of bug that passes every test but breaks in production with real user input.

SSE Event Formats From Major Providers

Each AI provider structures their SSE events differently. Understanding these formats is essential for building provider-agnostic streaming UIs.

Anthropic's Event Protocol

Anthropic uses typed events with a clear lifecycle:

event: message_start
data: {"type":"message_start","message":{"id":"msg_01X","model":"claude-sonnet-4-20250514","role":"assistant","usage":{"input_tokens":25}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" there!"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

The lifecycle is explicit: message_start → content_block_start → deltas → content_block_stop → message_delta → message_stop. Each content block has an index, which matters when the model returns multiple blocks (text + tool use).

Anthropic also sends ping events as keep-alives and typed delta variants: text_delta for text, input_json_delta for tool call arguments, and thinking_delta for extended thinking content.

OpenAI's Event Format

OpenAI uses a simpler format with a single event type:

data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}

data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":" there!"}}]}

data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

No event: field — every event is an unnamed data: line. The stream terminates with the literal string data: [DONE]. The delta object progressively adds content, and the final chunk includes a finish_reason.

Vercel AI SDK Data Stream Protocol

The Vercel AI SDK uses a prefix-based protocol where each line starts with a type identifier:

f:{"messageId":"msg-1"}
0:"Hello"
0:" there"
0:"!"
d:{"finishReason":"stop","usage":{"promptTokens":10,"completionTokens":5}}

Prefixes map to types: 0 for text deltas, f for start, d for done/finish, 9 for tool calls, g for reasoning, and more. This protocol is optimized for the AI SDK's React hooks.

Anthropic SSE Event LifecyclePhase 1 / 6

Phase 1 / 6message_start

Opens the message. Contains model, role, input token usage.

once

1/6

Quiz

In Anthropic's streaming protocol, which event carries the actual generated text tokens?

ABCD

Building a Production SSE Consumer

Let's build a proper SSE parser that handles the real-world edge cases:

type SSEEvent = {
  event: string;
  data: string;
  id?: string;
  retry?: number;
};

function parseSSEEvents(chunk: string): {
  events: SSEEvent[];
  remaining: string;
} {
  const events: SSEEvent[] = [];
  const blocks = chunk.split('\n\n');
  const remaining = blocks.pop() ?? '';

  for (const block of blocks) {
    if (!block.trim()) continue;

    let event = 'message';
    let data = '';
    let id: string | undefined;
    let retry: number | undefined;

    for (const line of block.split('\n')) {
      if (line.startsWith('event: ')) {
        event = line.slice(7);
      } else if (line.startsWith('data: ')) {
        data += (data ? '\n' : '') + line.slice(6);
      } else if (line.startsWith('id: ')) {
        id = line.slice(4);
      } else if (line.startsWith('retry: ')) {
        retry = parseInt(line.slice(7), 10);
      }
    }

    if (data) {
      events.push({ event, data, id, retry });
    }
  }

  return { events, remaining };
}

Notice how events are split by double newlines (\n\n), but the last block might be incomplete — so we save it as remaining for the next chunk.

Now wire it up with fetch:

async function* streamSSE(
  url: string,
  options: RequestInit
): AsyncGenerator<SSEEvent> {
  const response = await fetch(url, options);

  if (!response.ok) {
    const body = await response.text();
    throw new Error(`SSE request failed (${response.status}): ${body}`);
  }

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  try {
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const { events, remaining } = parseSSEEvents(buffer);
      buffer = remaining;

      for (const event of events) {
        yield event;
      }
    }

    if (buffer.trim()) {
      const { events } = parseSSEEvents(buffer + '\n\n');
      for (const event of events) {
        yield event;
      }
    }
  } finally {
    reader.releaseLock();
  }
}

Using an async generator here is the elegant move. The caller gets a clean for await...of loop:

for await (const event of streamSSE('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ messages, model: 'claude-sonnet-4-20250514' }),
})) {
  if (event.event === 'content_block_delta') {
    const delta = JSON.parse(event.data);
    if (delta.delta.type === 'text_delta') {
      appendToUI(delta.delta.text);
    }
  }
}

Why async generators are perfect for SSE

An async generator (async function*) lets you yield values asynchronously. The consumer pulls events one at a time with for await...of, which naturally applies backpressure — if the consumer is slow to process events, the generator pauses at the yield until the consumer is ready for the next one. This is fundamentally different from a callback-based approach where events fire whether the consumer is ready or not. For SSE consumers, this means you never buffer unbounded events in memory — each event is processed before the next one is pulled.

Error Handling and Reconnection

SSE connections drop. Networks are unreliable. Here's how to handle it:

async function streamWithRetry(
  url: string,
  options: RequestInit,
  onEvent: (event: SSEEvent) => void,
  maxRetries = 3
) {
  let retries = 0;
  let lastEventId: string | undefined;

  while (retries <= maxRetries) {
    try {
      const headers = new Headers(options.headers);
      if (lastEventId) {
        headers.set('Last-Event-ID', lastEventId);
      }

      for await (const event of streamSSE(url, {
        ...options,
        headers,
      })) {
        retries = 0;
        if (event.id) lastEventId = event.id;
        onEvent(event);
      }

      return;
    } catch (error) {
      retries++;

      if (retries > maxRetries) {
        throw new Error(
          `Stream failed after ${maxRetries} retries: ${error}`
        );
      }

      const delay = Math.min(1000 * 2 ** (retries - 1), 30000);
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }
}

Key patterns here:

Exponential backoff — wait 1s, 2s, 4s, 8s... up to 30s between retries
Last-Event-ID — the SSE spec defines this header for reconnection. If the server assigns IDs to events, you can resume from where you left off (though most LLM APIs don't support this)
Reset retry count on success — if we receive events, the connection is healthy

Execution Trace

Initial request

POST to /api/chat with messages payload

First attempt, no Last-Event-ID

Connection established

Receiving content_block_delta events

Reset retry counter to 0

Network drops

reader.read() throws TypeError

Caught in try/catch

Retry 1 (1s delay)

Reconnect with Last-Event-ID header

Exponential backoff: 1000ms

Connection restored

Resume receiving events

Server may replay missed events

Stream completes

reader.read() returns done: true

Clean exit from loop

Quiz

When retrying a dropped SSE connection, what header should you send to help the server resume from where you left off?

ABCD

Cancelling a Stream

Users change their mind. They click "Stop generating." Your code needs to handle this gracefully:

const controller = new AbortController();

const streamPromise = streamSSE('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ messages }),
  signal: controller.signal,
});

stopButton.addEventListener('click', () => {
  controller.abort();
});

When abort() is called, the fetch promise rejects with an AbortError. The ReadableStream is cancelled, and the underlying TCP connection is closed. The server stops generating tokens (most AI APIs detect client disconnection).

Here's the thing most people miss: you need to handle the AbortError differently from real errors. An abort isn't a failure — it's an intentional user action:

try {
  for await (const event of streamSSE(url, { signal })) {
    handleEvent(event);
  }
} catch (error) {
  if (error instanceof DOMException && error.name === 'AbortError') {
    return;
  }
  throw error;
}

Common Mistakes

What developers do	What they should do
Using EventSource for LLM API calls EventSource only supports GET requests with no custom headers. Every LLM API requires POST with auth headers and a JSON body.	Use fetch + ReadableStream for full control over method, headers, and body
Calling TextDecoder.decode() without { stream: true } Without the stream flag, multi-byte UTF-8 characters split across chunks get corrupted. This causes intermittent garbled text with emoji and non-ASCII content.	Always pass { stream: true } when decoding streaming chunks
Splitting chunks on newlines without buffering incomplete lines Network chunks don't align with SSE event boundaries. A chunk can end in the middle of a data: line, and processing it as complete produces parse errors.	Keep a buffer and save the last incomplete line for the next chunk
Not handling AbortError separately from real errors When users click 'Stop generating', the stream throws an AbortError. Showing an error message for an intentional action is a broken UX.	Check for AbortError and treat it as a clean cancellation, not a failure

Key Rules

1SSE is text/event-stream over HTTP — events separated by blank lines, fields prefixed with event:, data:, id:, retry:
2Use fetch + ReadableStream for LLM streaming — EventSource is GET-only with no custom headers
3Always decode with TextDecoder({ stream: true }) to handle split multi-byte characters
4Buffer incomplete lines between chunks — network boundaries don't respect event boundaries
5Use AbortController for cancellation and handle AbortError as a clean exit, not a failure
6Implement exponential backoff for reconnection — never hammer a failing endpoint

Quiz

Why do production AI apps use async generators for SSE consumption instead of callbacks?

ABCD

What's Next

You now understand the protocol layer — how SSE works, why EventSource falls short, and how to build a robust fetch-based consumer. But we're reading raw chunks and splitting strings. In the next topic, we'll dive into the ReadableStream API itself: TransformStream for parsing pipelines, TextDecoderStream for zero-copy decoding, and how to compose stream transforms that turn raw bytes into structured events with clean separation of concerns.