Performance Testing in CI

intermediate18 min read

Why Performance Testing Belongs in CI

Here is the thing most teams get wrong about performance: they treat it as an audit, not a test. Someone runs Lighthouse once a quarter, shares a slide deck with scores, the team nods, and nothing changes. Meanwhile, every sprint quietly adds 50KB of JavaScript, a new third-party script, and a layout shift from that "quick" banner component.

Performance is not a score you check — it is a constraint you enforce. And the only way to enforce constraints reliably is to make them part of your CI pipeline, right next to your unit tests and linting.

Mental Model

Think of performance budgets like a financial budget. You would not check your bank account once a year and hope for the best. You set limits, track spending continuously, and get alerts when something unusual happens. Performance budgets work the same way — set limits for bundle size, LCP, and TTI, then let CI enforce them on every pull request.

The argument is simple: if a developer adds a 200KB charting library and your CI pipeline catches it before merge, fixing it takes 10 minutes. If it lands in production and you discover it three months later buried under 40 other commits, fixing it takes days — if it gets fixed at all.

Lighthouse CI: Your First Line of Defense

Lighthouse CI (LHCI) runs Google Lighthouse as part of your CI pipeline. Instead of manually opening DevTools, you get automated audits on every commit with historical tracking.

Installation and Setup

npm install -D @lhci/cli

Create a .lighthouserc.js at your project root:

module.exports = {
  ci: {
    collect: {
      startServerCommand: 'npm run start',
      startServerReadyPattern: 'ready on',
      url: ['http://localhost:3000', 'http://localhost:3000/courses'],
      numberOfRuns: 3,
      settings: {
        preset: 'desktop',
      },
    },
    assert: {
      assertions: {
        'categories:performance': ['error', { minScore: 0.9 }],
        'categories:accessibility': ['error', { minScore: 0.95 }],
        'first-contentful-paint': ['warn', { maxNumericValue: 2000 }],
        'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
        'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
        'total-blocking-time': ['error', { maxNumericValue: 300 }],
      },
    },
    upload: {
      target: 'temporary-public-storage',
    },
  },
};

A few things to notice here:

numberOfRuns: 3 — Lighthouse results vary between runs. Running 3 times and taking the median reduces noise significantly.
error vs warn — Errors fail the CI build. Warnings show up in the report but do not block merges. Use error for hard limits and warn for aspirational targets.
upload — The temporary-public-storage target gives you a free public URL to view the full report. For production, you would use your own LHCI server.

Quiz

Why does Lighthouse CI run multiple audits per URL instead of just one?

ABCD

Performance Budgets: Drawing the Line

A performance budget is a set of hard limits your application must stay within. Without them, every feature slowly bloats the app because no individual change seems like a big deal.

The three most impactful budgets to set:

Bundle size — Total JavaScript shipped to the client
LCP (Largest Contentful Paint) — How fast the main content appears
TTI (Time to Interactive) — How fast the page becomes responsive to input

budget.json

Most bundlers and CI tools support a budget.json file. Here is a practical starting point:

[
  {
    "path": "/*",
    "timings": [
      { "metric": "largest-contentful-paint", "budget": 2500 },
      { "metric": "first-contentful-paint", "budget": 1800 },
      { "metric": "total-blocking-time", "budget": 300 },
      { "metric": "cumulative-layout-shift", "budget": 0.1 }
    ],
    "resourceSizes": [
      { "resourceType": "script", "budget": 300 },
      { "resourceType": "stylesheet", "budget": 100 },
      { "resourceType": "image", "budget": 500 },
      { "resourceType": "total", "budget": 1000 }
    ],
    "resourceCounts": [
      { "resourceType": "script", "budget": 15 },
      { "resourceType": "third-party", "budget": 5 }
    ]
  }
]

The resourceSizes values are in KB. The timings values are in milliseconds (except CLS, which is unitless).

Right-sizing your budgets

Do not just pick arbitrary numbers. Measure your current production metrics first, then set budgets 10-20% below your current values. This prevents regressions while giving you a realistic target. Setting an LCP budget of 1000ms when your current LCP is 3500ms just means the budget fails on every build and everyone ignores it.

Bundle Size Budgets in Next.js

Next.js has built-in support for bundle analysis. Add a size check to your CI:

// next.config.js
module.exports = {
  experimental: {
    outputFileTracingIncludes: {},
  },
};

For granular bundle budgets, use @next/bundle-analyzer paired with a size-limit tool:

{
  "size-limit": [
    { "path": ".next/static/chunks/main-*.js", "limit": "80 KB" },
    { "path": ".next/static/chunks/pages/_app-*.js", "limit": "50 KB" },
    { "path": ".next/static/css/*.css", "limit": "30 KB" }
  ]
}

npx size-limit

This gives you per-chunk visibility. When someone imports moment instead of dayjs, you will know immediately.

Quiz

Your performance budget sets LCP at 2500ms. A developer opens a PR that adds a hero image carousel, and CI reports LCP at 2700ms. What is the right course of action?

ABCD

GitHub Actions Integration

Here is a complete GitHub Actions workflow that runs Lighthouse CI on every pull request:

name: Performance CI

on:
  pull_request:
    branches: [main]

jobs:
  lighthouse:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'

      - run: npm ci

      - run: npm run build

      - name: Run Lighthouse CI
        run: |
          npm install -g @lhci/cli
          lhci autorun
        env:
          LHCI_GITHUB_APP_TOKEN: ${{ secrets.LHCI_GITHUB_APP_TOKEN }}

  bundle-size:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'

      - run: npm ci

      - name: Check bundle size
        run: npx size-limit

Two jobs run in parallel: Lighthouse audits for runtime metrics and size-limit for static bundle analysis. The LHCI_GITHUB_APP_TOKEN enables Lighthouse CI to post results as PR comments — your reviewers see the performance impact without leaving GitHub.

Setting up the LHCI GitHub App

Install the Lighthouse CI GitHub App from the GitHub Marketplace. It will provide a token you store as a repository secret. Once configured, every PR gets an inline comment showing performance scores, deltas from the base branch, and links to the full HTML report. This turns performance reviews from "someone should check Lighthouse" into "the data is right here in the PR."

Custom Performance Assertions with Playwright

Lighthouse gives you synthetic scores, but sometimes you need custom assertions. Maybe you want to measure how long your specific search component takes to return results, or verify that a route transition completes within 200ms.

Playwright has built-in Performance API access, which means you can write assertions against real browser performance metrics.

Measuring Core Web Vitals

import { test, expect } from '@playwright/test';

test('homepage LCP is under 2.5s', async ({ page }) => {
  await page.goto('/');

  const lcp = await page.evaluate(() => {
    return new Promise<number>((resolve) => {
      new PerformanceObserver((list) => {
        const entries = list.getEntries();
        const last = entries[entries.length - 1];
        resolve(last.startTime);
      }).observe({ type: 'largest-contentful-paint', buffered: true });
    });
  });

  expect(lcp).toBeLessThan(2500);
});

Measuring Custom Interactions

test('search results appear within 300ms', async ({ page }) => {
  await page.goto('/courses');

  const searchInput = page.getByRole('searchbox');
  await searchInput.fill('javascript');

  const start = Date.now();

  await page.getByTestId('search-results').waitFor({ state: 'visible' });

  const duration = Date.now() - start;
  expect(duration).toBeLessThan(300);
});

Measuring CLS During Interactions

test('course page has no layout shift after images load', async ({ page }) => {
  await page.goto('/courses/frontend-engineering');

  const cls = await page.evaluate(() => {
    return new Promise<number>((resolve) => {
      let clsValue = 0;
      const observer = new PerformanceObserver((list) => {
        for (const entry of list.getEntries()) {
          if (!(entry as any).hadRecentInput) {
            clsValue += (entry as any).value;
          }
        }
      });
      observer.observe({ type: 'layout-shift', buffered: true });

      setTimeout(() => {
        observer.disconnect();
        resolve(clsValue);
      }, 5000);
    });
  });

  expect(cls).toBeLessThan(0.1);
});

The power of Playwright-based performance tests is specificity. Lighthouse tells you "your CLS is 0.15." Playwright lets you write "the CLS caused by image loading on the course page must be below 0.1." When it fails, you know exactly which page and which interaction caused the regression.

Quiz

What is the main advantage of custom Playwright performance tests over Lighthouse CI?

ABCD

Tracking Performance Over Time

Catching regressions is valuable, but tracking trends is where the real insight lives. A single PR might pass your budgets while still contributing to a slow, steady decline.

LHCI Server for Historical Data

For serious performance tracking, self-host the LHCI server:

docker run --publish 9001:9001 \
  patrickhulce/lhci-server

Update your .lighthouserc.js:

module.exports = {
  ci: {
    upload: {
      target: 'lhci',
      serverBaseUrl: 'https://your-lhci-server.example.com',
      token: process.env.LHCI_BUILD_TOKEN,
    },
  },
};

The LHCI server stores every run and provides a dashboard with trend lines for every metric. You can see exactly when LCP started creeping up and correlate it with specific commits.

web-vitals in Synthetic Tests

The web-vitals library is the gold standard for measuring Core Web Vitals. You can integrate it into your CI tests for more accurate measurements:

import { test, expect } from '@playwright/test';

test('collect web-vitals metrics', async ({ page }) => {
  await page.addInitScript(() => {
    (window as any).__webVitals = {};

    const script = document.createElement('script');
    script.src =
      'https://unpkg.com/web-vitals@4/dist/web-vitals.iife.js';
    script.onload = () => {
      const wv = (window as any).webVitals;
      wv.onLCP((m: any) => { (window as any).__webVitals.lcp = m.value; });
      wv.onFCP((m: any) => { (window as any).__webVitals.fcp = m.value; });
      wv.onCLS((m: any) => { (window as any).__webVitals.cls = m.value; });
      wv.onINP((m: any) => { (window as any).__webVitals.inp = m.value; });
    };
    document.head.appendChild(script);
  });

  await page.goto('/');
  await page.waitForLoadState('networkidle');

  await page.click('body');
  await page.waitForTimeout(1000);

  const vitals = await page.evaluate(() => (window as any).__webVitals);

  if (vitals.lcp) expect(vitals.lcp).toBeLessThan(2500);
  if (vitals.cls) expect(vitals.cls).toBeLessThan(0.1);
  if (vitals.fcp) expect(vitals.fcp).toBeLessThan(1800);
});

Common Trap

Synthetic tests (CI environments) produce different numbers than real-user monitoring (RUM). CI machines have different CPU power, no network latency to CDNs, and no contention from other browser tabs. Use synthetic tests to catch regressions (relative changes), not to validate absolute performance targets. Your CI LCP of 800ms does not mean users see 800ms — it means that if it jumps to 1200ms, something regressed.

Alerting on Regressions

Detection without notification is useless. Set up alerts so the right people know immediately when performance degrades.

GitHub Actions Status Checks

The simplest approach: Lighthouse CI assertions already fail the build. Configure branch protection rules to require the performance check to pass before merging:

# In your repo settings, add "Performance CI / lighthouse" as a required check

Slack Notifications on Failure

Add a notification step to your workflow:

- name: Notify on performance regression
  if: failure()
  uses: slackapi/slack-github-action@v1
  with:
    payload: |
      {
        "text": "Performance regression detected in ${{ github.event.pull_request.html_url }}\nLighthouse report: Check the CI artifacts for details."
      }
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_PERF_WEBHOOK }}

Trend-Based Alerts

For catching gradual degradation, compare against a rolling baseline instead of fixed thresholds:

// In .lighthouserc.js
module.exports = {
  ci: {
    assert: {
      assertions: {
        'largest-contentful-paint': [
          'error',
          {
            maxNumericValue: 2500,
            aggregationMethod: 'median-run',
          },
        ],
      },
    },
  },
};

When using the LHCI server, you can configure assertions relative to the previous build rather than absolute values. This catches the "death by a thousand cuts" scenario where each PR adds 20ms of LCP and no single change triggers the absolute budget, but over two months your LCP has doubled.

Putting It All Together

Here is the complete performance CI strategy, layered from fast feedback to deep analysis:

Layer 1: Bundle size check (size-limit)
├── Runs in ~10 seconds
├── Catches: large dependency additions, forgotten tree-shaking
└── Feedback: instant, in PR comment

Layer 2: Lighthouse CI (LHCI)
├── Runs in ~2-3 minutes
├── Catches: render performance, accessibility, best practices
└── Feedback: PR comment with scores and deltas

Layer 3: Custom Playwright assertions
├── Runs in ~5-10 minutes
├── Catches: interaction-specific regressions, component performance
└── Feedback: test failure with specific metric and threshold

Layer 4: LHCI Server trend tracking
├── Runs continuously, data aggregated over time
├── Catches: gradual degradation across many PRs
└── Feedback: dashboard alerts, weekly reports

Each layer catches different classes of regressions. Layer 1 is cheap and fast — run it on every commit. Layer 4 is expensive but catches what the others miss.

Quiz

Your team has a CI pipeline with Lighthouse CI running on every PR. Over 3 months, LCP slowly increases from 1.8s to 2.4s, but no single PR triggers the 2.5s budget. What additional strategy catches this gradual degradation?

ABCD

What developers do	What they should do
Run Lighthouse once in a local browser and call it tested Local runs vary by machine state, have no historical tracking, and depend on someone remembering to do it	Run Lighthouse in CI on every PR with automated assertions
Set performance budgets based on ideal targets without measuring current values first Unrealistic budgets get ignored. Start from where you are and tighten over time	Measure current production metrics, then set budgets 10-20% below
Treat synthetic CI metrics as real-user performance numbers CI machines have different CPU, memory, and network than real user devices. A CI LCP of 800ms does not mean users see 800ms	Use CI metrics for regression detection (relative changes), not absolute performance validation
Only check overall Lighthouse score without specific metric budgets The overall score is a weighted average that can hide individual regressions. LCP could worsen by 500ms while CLS improvement masks it in the total score	Set budgets for individual metrics: LCP, CLS, TBT, bundle size, and resource counts
Skip bundle size checks because Lighthouse already measures performance Bundle size checks are instant (10 seconds) and catch dependency bloat before you even build. Lighthouse only runs after a full build and start-up	Run bundle size checks alongside Lighthouse — they catch different things

Key Rules

1Performance is a constraint you enforce in CI, not a score you check manually.
2Run Lighthouse CI with at least 3 runs per URL to reduce measurement variance.
3Set budgets based on current production metrics minus 10-20%, then tighten over time.
4Use error-level assertions for hard limits (LCP, CLS) and warn-level for aspirational targets.
5Layer your strategy: bundle size checks (fast) → Lighthouse CI (medium) → Playwright assertions (specific) → trend tracking (gradual).
6Synthetic CI metrics detect regressions — they do not represent real-user performance.
7Track trends over time with an LHCI server. Absolute budgets miss gradual degradation.