Streaming SSR with Suspense

advanced15 min read

The Problem with Traditional SSR

Traditional SSR has a brutal constraint: the server must wait for ALL data before sending ANY HTML. If your page needs data from three APIs and one takes 3 seconds, the user stares at a blank screen for 3 seconds — even though the other two APIs responded in 50ms.

Traditional SSR Timeline:

Server receives request
  ├── Fetch user profile    (50ms)  ✓ done
  ├── Fetch product data    (80ms)  ✓ done
  └── Fetch recommendations (3000ms) ... waiting ...
                                     ... still waiting ...
                                     ✓ done (3000ms)
Server renders full HTML           → sends to browser
Browser receives HTML              → user sees content
Total wait: 3000ms (bottlenecked by slowest fetch)

The fast APIs are done in under 100ms, but their content is held hostage by the slow one. The user gets nothing until everything is ready.

Mental Model

Traditional SSR is like a restaurant that won't bring any food until every dish for the table is ready. Your salad has been sitting under a heat lamp for 20 minutes while the kitchen finishes the steak. Streaming SSR is a restaurant that brings each dish as it's ready — salad first, then sides, then the steak. You start eating immediately.

How Streaming Works

Streaming SSR flips the model: send HTML chunks as they become ready, not all at once.

The server immediately sends the HTML shell — navigation, layout, loading skeletons — then streams in content chunks as each data source resolves. The browser progressively renders these chunks without waiting for the full response.

Streaming SSR Timeline:

Server receives request
  ├── Send HTML shell immediately (nav, layout, skeletons) → browser renders!
  ├── Fetch user profile (50ms) ✓ → stream chunk 1        → browser updates!
  ├── Fetch product data (80ms) ✓ → stream chunk 2         → browser updates!
  └── Fetch recommendations (3000ms) ✓ → stream chunk 3    → browser updates!

User sees shell at:           ~0ms
User sees profile + product:  ~80ms
User sees recommendations:    ~3000ms

The user sees useful content within milliseconds, not seconds. The slow API only delays its own section.

Suspense as Streaming Boundaries

In React, Suspense boundaries define where the stream can break. Each Suspense boundary is a potential streaming boundary — the server can send the fallback immediately and stream in the resolved content later.

import { Suspense } from 'react'

export default async function ProductPage({ params }) {
  return (
    <main>
      <NavBar />

      <Suspense fallback={<ProductSkeleton />}>
        <ProductDetails slug={params.slug} />
      </Suspense>

      <Suspense fallback={<ReviewsSkeleton />}>
        <Reviews slug={params.slug} />
      </Suspense>

      <Suspense fallback={<RecommendationsSkeleton />}>
        <Recommendations slug={params.slug} />
      </Suspense>
    </main>
  )
}

async function ProductDetails({ slug }) {
  const product = await fetchProduct(slug)
  return <div>{product.name} — ${product.price}</div>
}

async function Reviews({ slug }) {
  const reviews = await fetchReviews(slug)
  return <ul>{reviews.map(r => <li key={r.id}>{r.text}</li>)}</ul>
}

async function Recommendations({ slug }) {
  const recs = await fetchRecommendations(slug)
  return <div>{recs.map(r => <RecCard key={r.id} {...r} />)}</div>
}

The server sends NavBar and all three skeleton fallbacks immediately. As each async component resolves, React streams a chunk that replaces the skeleton with real content.

Streaming SSR PhasesPhase 1 / 3

Phase 1 / 3Shell Response

Server sends HTML head, nav, layout, and Suspense fallbacks. Browser starts rendering immediately.

1/3

Quiz

In streaming SSR, what does the browser receive first?

ABCD

The Wire Format: How Chunks Replace Fallbacks

Here is the part that most tutorials skip. When React streams in a resolved Suspense boundary, it doesn't send a separate HTTP request or use WebSockets. It's all one HTTP response — a single, long-lived connection using chunked transfer encoding.

The initial HTML contains placeholder elements:

<!--$?-->
<template id="B:0"></template>
<div>Loading reviews...</div>
<!--/$-->

When the reviews data resolves, React streams an additional chunk at the bottom of the response:

<div hidden id="S:0">
  <ul><li>Great product!</li><li>Highly recommend</li></ul>
</div>
<script>
  // Swap the fallback with the resolved content
  $RC("B:0", "S:0")
</script>

The inline $RC function (React's own tiny runtime) finds the template marker B:0, removes the fallback, and inserts the hidden content S:0 in its place. This is a pure DOM operation — no React hydration needed for the swap itself.

Out-of-order streaming

Chunks can arrive in any order. If recommendations resolve before reviews, React streams the recommendations chunk first. The $RC function uses IDs to find the correct placeholder, so it doesn't matter what order chunks arrive — each one knows exactly where it belongs. This is how React achieves out-of-order streaming without any coordination between Suspense boundaries.

The React APIs

React provides two streaming APIs depending on your runtime:

Node.js: renderToPipeableStream

import { renderToPipeableStream } from 'react-dom/server'

function handleRequest(req, res) {
  const { pipe, abort } = renderToPipeableStream(<App />, {
    bootstrapScripts: ['/client.js'],
    onShellReady() {
      res.statusCode = 200
      res.setHeader('Content-Type', 'text/html')
      pipe(res)
    },
    onShellError(error) {
      res.statusCode = 500
      res.send('<!doctype html><p>Server error</p>')
    },
    onError(error) {
      console.error(error)
    }
  })

  setTimeout(() => abort(), 10000)
}

Edge/Web: renderToReadableStream

import { renderToReadableStream } from 'react-dom/server'

async function handleRequest(req) {
  const stream = await renderToReadableStream(<App />, {
    bootstrapScripts: ['/client.js'],
    signal: AbortSignal.timeout(10000)
  })

  return new Response(stream, {
    headers: { 'Content-Type': 'text/html' }
  })
}

The key callbacks:

onShellReady — The shell (everything outside Suspense boundaries) is rendered. Start piping.
onShellError — The shell itself failed to render. Send a fallback error page.
onAllReady — Everything, including all Suspense boundaries, is resolved. Used for static generation (you want the complete HTML).
onError — An error occurred in a Suspense boundary. The fallback stays; the boundary doesn't resolve.

Quiz

When should you start piping the response in renderToPipeableStream?

ABCD

Streaming and SEO

A common concern: "If the initial HTML has skeletons, will search engines see the real content?"

Modern crawlers (Googlebot) handle streaming well. They wait for the full response and process the final DOM state, including all streamed chunks. The $RC script swaps happen before the crawler evaluates the page.

However, if you're concerned about crawlers that don't execute JavaScript, you can use onAllReady instead of onShellReady for bot user agents:

const { pipe } = renderToPipeableStream(<App />, {
  onShellReady() {
    if (isBot(req)) return
    res.statusCode = 200
    pipe(res)
  },
  onAllReady() {
    if (!isBot(req)) return
    res.statusCode = 200
    pipe(res)
  }
})

Bots get the full HTML in one shot. Real users get streaming.

Common Trap

Don't nest Suspense boundaries too deeply thinking it helps streaming. Each boundary adds overhead — a template marker, a hidden div, and a swap script. For most pages, 2-4 Suspense boundaries at the top level is ideal: navigation, hero content, main content, sidebar/comments. Over-granular boundaries create unnecessary DOM complexity and more inline scripts.

Streaming in Next.js App Router

In Next.js, streaming is the default for async Server Components wrapped in Suspense. You don't call renderToPipeableStream directly — Next.js handles it.

import { Suspense } from 'react'
import { LoadingSkeleton } from '@/components/ui'

export default async function Dashboard() {
  return (
    <div>
      <h1>Dashboard</h1>
      <Suspense fallback={<LoadingSkeleton />}>
        <Analytics />
      </Suspense>
      <Suspense fallback={<LoadingSkeleton />}>
        <RecentActivity />
      </Suspense>
    </div>
  )
}

You can also use loading.tsx files, which create an implicit Suspense boundary at the route segment level:

app/
  dashboard/
    loading.tsx    ← Suspense fallback for this route
    page.tsx       ← async Server Component

The loading.tsx content displays instantly while page.tsx resolves.

Quiz

A Next.js page has three Suspense boundaries. The middle one takes 5 seconds to resolve. What do users see during those 5 seconds?

ABCD

What developers do	What they should do
Wait for onAllReady before piping for real users onAllReady waits for every Suspense boundary, eliminating the entire benefit of streaming. Users get a slow, traditional SSR experience.	Use onShellReady for streaming to users, onAllReady only for bots or static generation
Put the entire page inside a single Suspense boundary One big Suspense boundary means nothing streams until everything resolves — you're back to the traditional SSR bottleneck.	Wrap individual sections that have their own data dependencies in separate Suspense boundaries
Assume streaming requires WebSockets or Server-Sent Events It's a regular HTTP response that stays open. The server sends HTML chunks over the same connection. No special protocols needed.	Streaming SSR uses standard HTTP chunked transfer encoding in a single response
Skip fallback design because skeletons are temporary Skeletons are the first thing users see. If they're different sizes than the real content, you'll get layout shift when content streams in.	Design polished skeletons that match the content layout to prevent layout shift (CLS)

Key Rules

1Traditional SSR waits for the slowest data source. Streaming SSR sends the shell immediately and streams chunks as data resolves.
2Each Suspense boundary is a potential streaming boundary — the fallback is sent first, resolved content is streamed later.
3Chunks are sent as HTML with inline scripts that swap fallbacks — it's one HTTP response, not multiple requests.
4Out-of-order streaming lets fast data sources resolve independently of slow ones.
5Use onShellReady for streaming to users. Use onAllReady for bots and static generation.
6In Next.js, async Server Components inside Suspense boundaries stream automatically.