Skip to content

Readable Streams and Text Decoding

advanced18 min read

The Character That Arrived in Two Pieces

A developer builds a chat UI that streams responses from an LLM. It works beautifully with English text. Then a user asks a question in Japanese, and the output starts glitching -- random characters appear mid-stream, replacing perfectly valid kanji.

The fix isn't in the LLM. It's not in React. It's in one line of code that misunderstands how bytes become text.

const decoder = new TextDecoder();
const reader = response.body.getReader();

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  // This is the bug:
  const text = String.fromCharCode(...value);
  // Should be:
  const text = decoder.decode(value, { stream: true });
}

The problem? A single UTF-8 character like is encoded as three bytes (0xE6 0xBC 0xA2). When the network splits those bytes across two chunks, String.fromCharCode treats each byte as a separate character and produces garbage. TextDecoder knows how to hold onto incomplete bytes and wait for the rest.

This is the kind of bug that only shows up in production, with real users, in real languages. Let's make sure you never ship it.

The Mental Model

Mental Model

Think of a stream as a conveyor belt in a factory. Raw materials (bytes) arrive on the belt at whatever pace the supplier (the network) delivers them. You don't wait for the entire shipment to arrive before you start working -- you process each box as it comes. A ReadableStream is the belt. A reader is a worker standing at the belt, picking up boxes one at a time. A TextDecoder is a translator who converts the raw materials into something meaningful (text). And a TransformStream is a workstation on the belt that transforms items as they pass through.

Why Streams Matter for LLM Responses

When you call an LLM API, the response might be 2,000 tokens long. Without streaming, the user stares at a blank screen for 3-5 seconds until the entire response arrives. With streaming, the first token appears in ~200ms and text flows in progressively, exactly like ChatGPT.

But there's more to it than UX:

Memory efficiency. A non-streaming response loads the entire body into memory as a single string. For a 100KB response, that's fine. For a million-token context dump or a large file download, you'd rather process chunks and discard them.

Backpressure. Streams give you flow control. If your UI can't render fast enough (say you're doing expensive Markdown parsing per chunk), the stream automatically slows down. The consumer controls the pace, not the producer.

Composability. Streams can be piped through transforms -- decode bytes to text, split text on newline boundaries, parse JSON objects, extract specific fields -- all without buffering the entire response.

Cancellation. If the user navigates away or clicks "Stop generating," you can abort the stream and the underlying HTTP connection. No wasted bandwidth, no orphaned processing.

The Streams API: Three Primitives

The Streams API has three core classes. You'll use all three when building LLM integrations:

ReadableStream -- a source of data you can read from. fetch() gives you one automatically via response.body. You can also create your own from any async data source.

WritableStream -- a destination you can write data to. Less common in frontend code, but useful for piping data somewhere (a file, a WebSocket, etc.).

TransformStream -- sits between a readable and a writable. It has a writable side (input) and a readable side (output). Data flows in, gets transformed, and flows out. TextDecoderStream is a built-in transform.

// The relationship:
// ReadableStream --> TransformStream --> WritableStream
//                    (in)    (out)
//              writable side  readable side

For LLM streaming, you mostly care about ReadableStream (consuming the response) and TransformStream (processing the data). Let's start with reading.

response.body: Your First ReadableStream

When you call fetch(), the response body is a ReadableStream of Uint8Array chunks:

const response = await fetch('https://api.example.com/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ prompt: 'Explain closures' }),
});

// response.body is a ReadableStream<Uint8Array>
console.log(response.body instanceof ReadableStream); // true

To consume it, you need a reader.

The Reader Protocol

A reader gives you a pull-based interface: you ask for the next chunk when you're ready for it. This is backpressure in action -- the stream only delivers data as fast as you consume it.

const reader = response.body.getReader();

while (true) {
  const { value, done } = await reader.read();

  if (done) {
    console.log('Stream complete');
    break;
  }

  // value is a Uint8Array (raw bytes)
  console.log('Received', value.byteLength, 'bytes');
}

The read() method returns a promise that resolves with { value, done }:

  • done: false -- value is a Uint8Array containing the next chunk of bytes
  • done: true -- the stream is finished. value is undefined

This is the same protocol as iterators ({ value, done }), which is why streams and for await...of play so nicely together.

Quiz
What type is the value returned by response.body.getReader().read() when the stream is not done?

Important: Lock and Release

When you call getReader(), the stream becomes locked to that reader. You can't get another reader or pipe the stream while it's locked. To release the lock:

reader.releaseLock();

But in practice, you rarely need to do this manually. Once you've read the stream to completion (or canceled it), the lock is released.

Common Trap

You can only call getReader() once per stream. If you need to read the same data twice, you must use response.clone() before reading, or tee() the stream. Each clone creates a separate buffer, so be mindful of memory.

TextDecoder and the UTF-8 Boundary Problem

Here's where things get interesting -- and where most streaming bugs live.

UTF-8 is a variable-width encoding. ASCII characters (English letters, numbers, basic symbols) are 1 byte. European accented characters are 2 bytes. CJK characters (Chinese, Japanese, Korean) are 3 bytes. Emoji are 4 bytes.

CharacterUTF-8 BytesByte Count
A0x411
ñ0xC3 0xB12
0xE6 0xBC 0xA23
🔥0xF0 0x9F 0x94 0xA54

The network doesn't care about character boundaries. A TCP segment might split right in the middle of a multi-byte character. Your first chunk might end with 0xE6 0xBC -- the first two bytes of -- and the next chunk starts with 0xA2 -- the third byte.

If you naively convert each chunk to text independently:

// WRONG: each chunk decoded independently
const chunk1Text = new TextDecoder().decode(chunk1); // "...partial漢" becomes "...partial��"
const chunk2Text = new TextDecoder().decode(chunk2); // "�rest..." -- the orphaned byte

You get replacement characters (U+FFFD, the symbol) because the decoder sees incomplete byte sequences and has no way to know the rest is coming.

The Fix: Streaming Mode

TextDecoder has a stream option that tells it to hold onto incomplete bytes:

const decoder = new TextDecoder();

// Chunk 1 ends mid-character
const text1 = decoder.decode(chunk1, { stream: true });
// decoder holds the incomplete bytes internally

// Chunk 2 starts with the remaining bytes
const text2 = decoder.decode(chunk2, { stream: true });
// decoder combines held bytes with new bytes, outputs complete characters

// Final call without { stream: true } flushes any remaining bytes
const textFinal = decoder.decode();

The { stream: true } option tells the decoder: "More data is coming. If the chunk ends with an incomplete character, hold those bytes and prepend them to the next chunk."

The final decoder.decode() call (with no arguments and no stream: true) flushes any remaining buffered bytes. If there are leftover incomplete bytes at the end of the stream, they get replaced with .

Quiz
What happens if you create a new TextDecoder for each chunk instead of reusing one?

TextDecoderStream: The Proper Way

Manually managing TextDecoder with { stream: true } works, but there's a cleaner approach. TextDecoderStream is a TransformStream that handles all of this automatically:

const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ prompt: 'Explain closures' }),
});

const textStream = response.body.pipeThrough(new TextDecoderStream());
const reader = textStream.getReader();

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  // value is now a string, properly decoded
  console.log(value);
}

pipeThrough() connects the byte stream to the decoder transform. The output is a ReadableStream<string> instead of ReadableStream<Uint8Array>. The decoder handles multi-byte boundaries internally.

This is the recommended pattern. It's shorter, harder to misuse, and composes with other transforms.

Under the hood: TextDecoderStream

TextDecoderStream is essentially this:

// Simplified internal implementation
class TextDecoderStream extends TransformStream {
  constructor(encoding = 'utf-8', options) {
    const decoder = new TextDecoder(encoding, options);
    super({
      transform(chunk, controller) {
        const text = decoder.decode(chunk, { stream: true });
        if (text.length > 0) {
          controller.enqueue(text);
        }
      },
      flush(controller) {
        const text = decoder.decode();
        if (text.length > 0) {
          controller.enqueue(text);
        }
      },
    });
  }
}

The transform method is called for each chunk and uses streaming mode. The flush method is called when the input stream closes, draining any buffered bytes. You get correct multi-byte handling with zero effort.

TextDecoderStream vs Manual TextDecoder

AspectManual TextDecoderTextDecoderStream
UTF-8 boundary handlingYou must remember { stream: true }Automatic
ComposabilityManual loop onlyPipes with other transforms
Error surfaceEasy to forget streaming modeHard to misuse
Browser supportAll browsersAll modern browsers (Chrome 71+, Firefox 105+, Safari 14.1+)

For any new code, prefer TextDecoderStream.

Async Generators for Clean Stream Consumption

The while-loop-with-reader pattern works, but it's imperative and hard to compose. Async generators give you a much cleaner abstraction:

async function* streamText(response) {
  const reader = response.body
    .pipeThrough(new TextDecoderStream())
    .getReader();

  try {
    while (true) {
      const { value, done } = await reader.read();
      if (done) return;
      yield value;
    }
  } finally {
    reader.releaseLock();
  }
}

Now consuming a stream looks like this:

const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ prompt: 'Explain closures' }),
});

for await (const chunk of streamText(response)) {
  appendToUI(chunk);
}

Clean, readable, and the finally block guarantees the reader lock is released even if you break early or an error occurs.

Building Higher-Level Generators

The real power is composition. You can chain generators to build processing pipelines:

async function* parseSSEEvents(response) {
  let buffer = '';

  for await (const chunk of streamText(response)) {
    buffer += chunk;
    const parts = buffer.split('\n\n');
    buffer = parts.pop();

    for (const part of parts) {
      if (part.startsWith('data: ')) {
        const data = part.slice(6);
        if (data === '[DONE]') return;
        yield JSON.parse(data);
      }
    }
  }
}

async function* extractTokens(response) {
  for await (const event of parseSSEEvents(response)) {
    const token = event.choices?.[0]?.delta?.content;
    if (token) yield token;
  }
}

Now your component code is trivial:

for await (const token of extractTokens(response)) {
  setText(prev => prev + token);
}

Each generator handles one concern: byte decoding, SSE parsing, or token extraction. They compose like pipes in a Unix shell.

Quiz
What does the finally block in an async generator ensure when consuming a ReadableStream?

TransformStream for Composable Pipelines

Async generators are great, but the Streams API has its own composition primitive: TransformStream. You can chain transforms with pipeThrough() to build declarative pipelines.

Here's a transform that splits a text stream on newline boundaries (useful for SSE or NDJSON):

function createLineSplitter() {
  let buffer = '';

  return new TransformStream({
    transform(chunk, controller) {
      buffer += chunk;
      const lines = buffer.split('\n');
      buffer = lines.pop();

      for (const line of lines) {
        if (line.trim()) {
          controller.enqueue(line);
        }
      }
    },
    flush(controller) {
      if (buffer.trim()) {
        controller.enqueue(buffer);
      }
    },
  });
}

And a transform that parses SSE data: lines:

function createSSEParser() {
  return new TransformStream({
    transform(line, controller) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data !== '[DONE]') {
          controller.enqueue(JSON.parse(data));
        }
      }
    },
  });
}

Chain them together:

const tokenStream = response.body
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(createLineSplitter())
  .pipeThrough(createSSEParser());

const reader = tokenStream.getReader();

Each pipeThrough() produces a new ReadableStream with the transformed data. The pipeline reads like a description of what happens: decode bytes, split into lines, parse SSE events.

Execution Trace
Network delivers bytes
Uint8Array: [0x64, 0x61, 0x74, 0x61, 0x3A, 0x20, ...]
Raw HTTP response body chunk
TextDecoderStream
String: 'data: {"choices":[{"delta":{"content":"Hello"}}]}\ndata: {"ch...'
Bytes become text, multi-byte boundaries handled
Line splitter
Line: 'data: {"choices":[{"delta":{"content":"Hello"}}]}'
Each complete line emitted individually, partial lines buffered
SSE parser
Object: { choices: [{ delta: { content: 'Hello' } }] }
JSON parsed, 'data: ' prefix stripped
Token extraction
String: 'Hello'
delta.content extracted, ready for UI

When to Use TransformStream vs Async Generators

Both solve the same problem. Here's when each shines:

Use TransformStream when:

  • You want to compose with pipeThrough() / pipeTo()
  • You're building reusable stream utilities
  • You need backpressure propagation through the entire chain
  • The transform is stateless or has simple state

Use async generators when:

  • You want simpler, more readable code
  • You need complex control flow (conditionals, try/catch per item)
  • You're consuming the stream in application code (not building a library)
  • You need to merge or fan out multiple streams

In practice, most application code uses async generators for consumption and TransformStream for reusable processing stages.

AbortController: Canceling Streams

When a user clicks "Stop generating," you need to cancel the stream, the HTTP connection, and any in-flight processing. AbortController handles all of this:

const controller = new AbortController();

const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ prompt: 'Write a novel' }),
  signal: controller.signal,
});

// Start reading the stream
const reader = response.body.getReader();

// When the user clicks "Stop":
stopButton.addEventListener('click', () => {
  controller.abort();
});

try {
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    appendToUI(new TextDecoder().decode(value, { stream: true }));
  }
} catch (err) {
  if (err.name === 'AbortError') {
    console.log('Stream canceled by user');
  } else {
    throw err;
  }
}

When controller.abort() is called:

  1. The fetch signal triggers an abort
  2. The underlying TCP connection is torn down
  3. Any pending reader.read() rejects with an AbortError
  4. No more data is buffered or processed

Abort with a Reason

You can pass a reason to abort() for better debugging:

controller.abort(new Error('User clicked stop'));

The reason becomes the err.cause in the catch block (or the error itself, depending on the browser).

Timeout Pattern

Combine AbortController with setTimeout for request timeouts:

function fetchWithTimeout(url, options, timeoutMs = 30000) {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), timeoutMs);

  return fetch(url, { ...options, signal: controller.signal })
    .finally(() => clearTimeout(timeoutId));
}
Quiz
What happens to a pending reader.read() call when the associated AbortController.abort() is called?

Putting It All Together

Here's a production-grade streaming function that handles decoding, parsing, cancellation, and error recovery:

async function* streamChat(prompt, signal) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt }),
    signal,
  });

  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }

  const reader = response.body
    .pipeThrough(new TextDecoderStream())
    .getReader();

  try {
    let buffer = '';
    while (true) {
      const { value, done } = await reader.read();
      if (done) break;

      buffer += value;
      const events = buffer.split('\n\n');
      buffer = events.pop();

      for (const event of events) {
        if (!event.startsWith('data: ')) continue;
        const data = event.slice(6);
        if (data === '[DONE]') return;

        const parsed = JSON.parse(data);
        const token = parsed.choices?.[0]?.delta?.content;
        if (token) yield token;
      }
    }
  } finally {
    reader.releaseLock();
  }
}

Usage in a React component:

function ChatMessage({ prompt }) {
  const [text, setText] = useState('');
  const controllerRef = useRef(null);

  const startStreaming = useCallback(async () => {
    const controller = new AbortController();
    controllerRef.current = controller;

    try {
      for await (const token of streamChat(prompt, controller.signal)) {
        setText(prev => prev + token);
      }
    } catch (err) {
      if (err.name !== 'AbortError') {
        console.error('Stream error:', err);
      }
    }
  }, [prompt]);

  const stopStreaming = useCallback(() => {
    controllerRef.current?.abort();
  }, []);

  // ...
}

Custom ReadableStream: Creating Your Own

Sometimes you need to create a ReadableStream from scratch -- wrapping a WebSocket, an EventSource, or a custom data source:

function createTimerStream(intervalMs, count) {
  let i = 0;
  let timerId;

  return new ReadableStream({
    start(controller) {
      timerId = setInterval(() => {
        if (i >= count) {
          clearInterval(timerId);
          controller.close();
          return;
        }
        controller.enqueue(`Tick ${i++}`);
      }, intervalMs);
    },
    cancel() {
      clearInterval(timerId);
    },
  });
}

The ReadableStream constructor takes a source object with these hooks:

  • start(controller) -- called once when the stream is created. Set up your data source here.
  • pull(controller) -- called when the consumer wants more data. Use for demand-driven sources.
  • cancel(reason) -- called when the consumer cancels the stream. Clean up resources here.

The controller has three methods:

  • controller.enqueue(chunk) -- push data into the stream
  • controller.close() -- signal that no more data will come
  • controller.error(err) -- signal an error
Wrapping a WebSocket as a ReadableStream

This pattern is useful for unifying WebSocket and fetch-based streaming under one interface:

function websocketToStream(url) {
  return new ReadableStream({
    start(controller) {
      const ws = new WebSocket(url);
      ws.onmessage = (event) => controller.enqueue(event.data);
      ws.onerror = (err) => controller.error(err);
      ws.onclose = () => controller.close();
      this._ws = ws;
    },
    cancel() {
      this._ws?.close();
    },
  });
}

// Now you can use it exactly like a fetch stream:
const reader = websocketToStream('wss://api.example.com/chat').getReader();

for await...of with ReadableStream

In modern browsers, ReadableStream implements the async iterable protocol. You can use for await...of directly:

const textStream = response.body.pipeThrough(new TextDecoderStream());

for await (const chunk of textStream) {
  console.log(chunk);
}

No getReader() needed. The for await...of loop handles acquiring and releasing the reader automatically. This is the cleanest way to consume a stream when you don't need manual reader control.

Common Trap

ReadableStream async iteration support was added in Chrome 124, Firefox 110, and Safari 17.4. If you need to support older browsers, stick with the getReader() pattern or use a polyfill. Check your target browser versions.

Common Pitfalls and Fixes

What developers doWhat they should do
Creating a new TextDecoder for each chunk: new TextDecoder().decode(chunk)
A new decoder per chunk loses multi-byte character state. Bytes of a single character split across chunks produce replacement characters. One decoder instance holds incomplete bytes between calls.
Reuse one decoder with { stream: true }: decoder.decode(chunk, { stream: true })
Using response.text() or response.json() and then trying to stream
text() and json() consume the entire body into memory. Once called, the stream is locked and exhausted. You must choose: stream via response.body, or buffer via text()/json(). You can't do both.
Use response.body.getReader() for streaming, response.text() for buffered
Forgetting to handle AbortError separately from real errors
User-initiated cancellation is normal flow, not an error. If you treat AbortError like a real error, you'll show error UI when the user simply clicked Stop. Always filter it out.
Check err.name === 'AbortError' in your catch block
Not calling reader.releaseLock() in a finally block
If an error occurs mid-stream, the reader stays locked. The stream can't be garbage collected or read again. Always release in finally to prevent resource leaks.
Wrap reader.read() loops in try/finally with reader.releaseLock()

The Rules

Key Rules
  1. 1Always reuse a single TextDecoder instance with { stream: true } -- or use TextDecoderStream
  2. 2response.body is a ReadableStream of Uint8Array, not text -- you must decode it
  3. 3Use AbortController to cancel streams -- pass signal to fetch and catch AbortError
  4. 4Release reader locks in finally blocks to prevent resource leaks
  5. 5Prefer TextDecoderStream with pipeThrough() over manual TextDecoder management
  6. 6Buffer partial SSE events -- network chunks don't align with event boundaries
  7. 7Async generators wrap the reader loop into a composable, for-await-of-friendly interface
Quiz
You're streaming an LLM response and want to split it into SSE events. Your line splitter buffers text and splits on newlines. Where should you flush the remaining buffer?

Where This Connects