Skip to content

Streaming SSR and Progressive Rendering

advanced15 min read

The Waterfall Problem With Traditional SSR

Traditional SSR has a dirty secret: the server waits for all data before sending any HTML. If your page has a fast header (5ms) and a slow recommendation engine (800ms), the user stares at a blank screen for 800ms — even though the header was ready in 5ms.

Traditional SSR timeline:

Server: |---fetch header (5ms)---fetch recommendations (800ms)---|---render HTML---|
Browser:                                                                           |---show page---|
                                                                                   ^
                                                                            805ms before first byte

That's 800ms of wasted time. The header, nav, and sidebar were all ready almost instantly, but the browser got nothing until the slowest query finished. Streaming SSR fixes this completely.

The Mental Model

Mental Model

Think of streaming SSR like a newspaper being printed page by page and delivered as each page comes off the press.

Traditional SSR waits until the entire newspaper is printed (all 40 pages), then delivers the whole stack at once. You wait 40 minutes and get everything.

Streaming SSR delivers each page as it's printed. You get the front page in 1 minute. Sports section in 3 minutes. The financial section (which requires late-breaking market data) arrives in 15 minutes. But you've been reading for 14 minutes already.

Each Suspense boundary is like a section of the newspaper. It tells the press: "This section might take a while — print and deliver everything else first, and send this section when it's ready."

How Streaming SSR Works at the Wire Level

When you wrap a component in Suspense, React can flush HTML to the browser before that component's data is ready.

export default async function ProductPage({ params }) {
  const product = await getProduct(params.id) // Fast: 20ms

  return (
    <div>
      <ProductHeader product={product} />

      <Suspense fallback={<ReviewsSkeleton />}>
        <ProductReviews productId={params.id} />  {/* Slow: 600ms */}
      </Suspense>

      <Suspense fallback={<RecommendationsSkeleton />}>
        <Recommendations category={product.category} />  {/* Slow: 800ms */}
      </Suspense>
    </div>
  )
}

What the browser receives over time:

Chunk 1 (at ~25ms):
┌──────────────────────────────────┐
│ <html><body>                     │
│   <div>                         │
│     <header>Product Name...</>   │  ← Fully rendered
│     <div id="S:0">               │
│       <ReviewsSkeleton />        │  ← Placeholder
│     </div>                       │
│     <div id="S:1">               │
│       <RecommendationsSkeleton/> │  ← Placeholder
│     </div>                       │
│   </div>                         │
│   <script>/* streaming runtime */│
└──────────────────────────────────┘

Chunk 2 (at ~620ms):
┌──────────────────────────────────┐
│ <template id="S:0">             │
│   <div class="reviews">...</div> │  ← Actual reviews HTML
│ </template>                      │
│ <script>                         │
│   $RC("S:0", "S:0")             │  ← Replace skeleton with real content
│ </script>                        │
└──────────────────────────────────┘

Chunk 3 (at ~825ms):
┌──────────────────────────────────┐
│ <template id="S:1">             │
│   <div class="recs">...</div>    │  ← Actual recommendations HTML
│ </template>                      │
│ <script>                         │
│   $RC("S:1", "S:1")             │
│ </script>                        │
└──────────────────────────────────┘

The $RC function is React's client-side streaming runtime. It finds the placeholder by ID, replaces it with the real content from the template tag, and triggers hydration for that subtree.

Quiz
In streaming SSR, what does the browser show at the 100ms mark if the page header takes 20ms but the reviews section takes 600ms?

Suspense Boundaries as Streaming Boundaries

Each Suspense boundary is a streaming decision point. React renders everything it can, hits a Suspense boundary where data isn't ready, renders the fallback, flushes to the browser, and continues when the data resolves.

// Multiple Suspense boundaries = multiple streaming chunks
export default async function DashboardPage() {
  return (
    <div className="grid grid-cols-3 gap-4">
      {/* These all stream independently */}
      <Suspense fallback={<StatsSkeleton />}>
        <StatsCard />          {/* Resolves in 50ms */}
      </Suspense>

      <Suspense fallback={<ChartSkeleton />}>
        <RevenueChart />       {/* Resolves in 200ms */}
      </Suspense>

      <Suspense fallback={<TableSkeleton />}>
        <RecentOrders />       {/* Resolves in 500ms */}
      </Suspense>
    </div>
  )
}

The browser progressively fills in each section as data arrives:

  • At 50ms: StatsCard appears, chart and table are skeletons
  • At 200ms: RevenueChart fills in, table is still a skeleton
  • At 500ms: RecentOrders fills in, page is complete

Nested Suspense

Suspense boundaries can be nested, and they resolve from the outside in:

<Suspense fallback={<PageSkeleton />}>
  <UserProfile />  {/* Must resolve first */}

  <Suspense fallback={<FeedSkeleton />}>
    <ActivityFeed />  {/* Streams after UserProfile resolves */}

    <Suspense fallback={<CommentsSkeleton />}>
      <Comments />  {/* Streams after ActivityFeed resolves */}
    </Suspense>
  </Suspense>
</Suspense>

The outer boundary must resolve before inner boundaries can stream. React doesn't reveal child content before its parent — that would cause layout shifts as parent containers resize.

Chunked Transfer Encoding under the hood

Streaming SSR uses HTTP chunked transfer encoding. Instead of a Content-Length header, the response uses Transfer-Encoding: chunked. Each chunk starts with its byte length in hexadecimal, followed by the chunk data.

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html

1a4
<!DOCTYPE html><html>...<div id="S:0"><skeleton/></div>...
0

f8
<template id="S:0"><div class="reviews">...</div></template><script>$RC("S:0")</script>
0

With HTTP/2 and HTTP/3, this is even more efficient — multiple streams can be multiplexed over a single connection, and server push can preload critical resources alongside the HTML stream.

Next.js 15 uses this automatically in the App Router. You don't configure chunked encoding — any component wrapped in Suspense with an async data dependency becomes a streaming boundary.

Next.js 15 Streaming in Practice

Automatic Streaming with loading.js

Next.js wraps your page in a Suspense boundary automatically when you add a loading.js file:

app/
  dashboard/
    loading.tsx    ← Automatic Suspense fallback for the page
    page.tsx       ← The actual page component
// app/dashboard/loading.tsx
export default function DashboardLoading() {
  return <DashboardSkeleton />
}

// app/dashboard/page.tsx
export default async function Dashboard() {
  const data = await fetchDashboardData() // Slow query
  return <DashboardContent data={data} />
}

Next.js effectively transforms this into:

<Suspense fallback={<DashboardLoading />}>
  <Dashboard />
</Suspense>

Manual Suspense for Granular Control

For finer control, use Suspense directly within your page:

export default async function Dashboard() {
  return (
    <div>
      <h1>Dashboard</h1>

      {/* This data loads fast — don't Suspense it */}
      <QuickStats />

      {/* These load slow — stream them independently */}
      <div className="grid grid-cols-2 gap-4">
        <Suspense fallback={<ChartSkeleton />}>
          <RevenueChart />
        </Suspense>
        <Suspense fallback={<TableSkeleton />}>
          <UserTable />
        </Suspense>
      </div>
    </div>
  )
}
Quiz
What is the key difference between using loading.tsx and using manual Suspense boundaries in Next.js 15?

Production Scenario: The Analytics Dashboard

A team has an analytics dashboard with five widgets. Without streaming, the page takes 2.3 seconds to load because it waits for the slowest query:

Without streaming:
Server fetches: stats(50ms) + chart(400ms) + table(800ms) + heatmap(1200ms) + alerts(300ms)
                All run in parallel, but page waits for slowest: 1200ms
                Then render: 200ms
                Then transfer: 100ms
Total TTFB: ~1500ms
User sees content: ~1700ms

With streaming SSR, each widget is wrapped in Suspense:

export default function AnalyticsDashboard() {
  return (
    <div className="grid grid-cols-2 gap-6">
      <Suspense fallback={<StatsSkeleton />}>
        <StatsWidget />
      </Suspense>
      <Suspense fallback={<AlertsSkeleton />}>
        <AlertsWidget />
      </Suspense>
      <Suspense fallback={<ChartSkeleton />}>
        <ChartWidget />
      </Suspense>
      <Suspense fallback={<TableSkeleton />}>
        <TableWidget />
      </Suspense>
      <Suspense fallback={<HeatmapSkeleton />}>
        <HeatmapWidget />
      </Suspense>
    </div>
  )
}
With streaming:
Browser receives first chunk at ~80ms (page shell + all skeletons)
Stats appears at ~130ms
Alerts appears at ~380ms
Chart appears at ~480ms
Table appears at ~880ms
Heatmap appears at ~1280ms

User sees meaningful content: ~130ms (vs 1700ms before)
Page fully loaded: ~1280ms (vs 1700ms before)

The perceived performance improvement is massive. The user sees stats in 130ms instead of waiting 1700ms for everything.

Common Trap

A common mistake is wrapping every single component in Suspense. This creates dozens of streaming chunks, each with its own skeleton flash. The result is a page that "pops" in piece by piece, which feels chaotic. Group related components under a single Suspense boundary — stream by section, not by component.

Parallel vs Sequential Data Fetching

Streaming works best when data fetches happen in parallel. Watch out for accidental waterfalls:

// Waterfall: each fetch waits for the previous one
async function SlowPage() {
  const user = await getUser()           // 100ms
  const posts = await getPosts(user.id)  // 200ms (waits for user)
  const comments = await getComments(posts[0].id) // 150ms (waits for posts)
  // Total: 450ms sequential

  return <PageContent user={user} posts={posts} comments={comments} />
}

// Parallel: all fetches start simultaneously
async function FastPage() {
  const user = await getUser()  // 100ms — needed first for other queries

  const [posts, notifications] = await Promise.all([
    getPosts(user.id),        // 200ms
    getNotifications(user.id) // 150ms
  ])
  // Total: 100ms + 200ms = 300ms (parallel after user)

  return <PageContent user={user} posts={posts} notifications={notifications} />
}

Even better — use Suspense to avoid the waterfall entirely:

async function BestPage() {
  const user = await getUser() // 100ms — render immediately

  return (
    <div>
      <Header user={user} />

      <Suspense fallback={<PostsSkeleton />}>
        <Posts userId={user.id} />  {/* Fetches independently, streams when ready */}
      </Suspense>

      <Suspense fallback={<NotificationsSkeleton />}>
        <Notifications userId={user.id} />  {/* Fetches independently, streams when ready */}
      </Suspense>
    </div>
  )
}

Now the user sees the header at 100ms, and posts and notifications stream in independently as their data resolves.

Quiz
You have three async Server Components wrapped in separate Suspense boundaries. Component A takes 100ms, B takes 300ms, C takes 200ms. When does the user see all three components?

Common Mistakes

What developers doWhat they should do
Wrapping every single component in its own Suspense boundary
Dozens of independently streaming chunks create a chaotic popcorn effect where different parts pop in at random times. Group logically related UI under one boundary.
Group related components under shared Suspense boundaries, stream by section
Not providing properly sized skeleton fallbacks, causing layout shifts
When the real content replaces the skeleton, if the sizes differ, everything below shifts. This directly hurts Core Web Vitals CLS scores.
Match skeleton dimensions to the final content to prevent CLS
Using await at the page level for data needed by child Suspense boundaries
Awaiting data in the parent blocks the entire page render. Moving data fetching into child components lets React start streaming immediately and resolve each section independently.
Let each Suspense boundary fetch its own data to enable parallel streaming
Forgetting that Suspense fallbacks must be synchronous components
If a fallback itself needs async data, it defeats the purpose of streaming. Fallbacks should be instant — simple HTML/CSS skeleton shapes.
Use lightweight skeleton components that render instantly with no data dependencies

Key Rules

Key Rules
  1. 1Streaming SSR sends HTML in chunks using HTTP chunked transfer encoding. Each Suspense boundary is a potential streaming point.
  2. 2React flushes everything it can render immediately, shows Suspense fallbacks for pending data, and streams replacements as data resolves.
  3. 3Each streamed chunk includes a template tag with the real HTML and a script tag that swaps it in place of the skeleton.
  4. 4In Next.js 15, streaming is automatic — any async Server Component inside a Suspense boundary becomes a streaming chunk.
  5. 5Design skeletons to match final content dimensions to prevent layout shifts (CLS).
  6. 6Group related UI under shared Suspense boundaries. Stream by section, not by individual component.