Lab Data vs Field Data

advanced18 min read

Your Lighthouse Score Is Not Your Users' Experience

You run Lighthouse, see a green 95 performance score, ship with confidence. Two weeks later, your CrUX data shows half your users have LCP over 4 seconds. What happened?

Nothing went wrong with the tool. The problem is that Lighthouse and your users are measuring completely different realities. Lighthouse runs on your fast MacBook, on a stable Wi-Fi connection, with a clean browser profile. Your actual users are on a 2019 Android phone in Jakarta, riding the subway with spotty 4G, with 47 browser tabs open.

This gap between lab data and field data is one of the most misunderstood concepts in web performance. Understanding it is the difference between optimizing for benchmarks and optimizing for real humans.

Mental Model

Lab data is like testing a car on a perfectly smooth track at a controlled temperature. Field data is like tracking how that car performs across millions of drivers on real roads — potholes, rain, traffic, altitude changes. Both are useful. Neither alone tells the full story. The track gives you repeatable, debuggable results. The road gives you the truth.

Lab Data: The Controlled Experiment

Lab data comes from running performance tools in a controlled, synthetic environment. You choose the device profile, network speed, and conditions. The same test run twice produces nearly identical results.

Common Lab Tools

Lighthouse — Built into Chrome DevTools. Simulates a mid-tier mobile device with throttled CPU (4x slowdown) and network (simulated slow 4G). Runs locally in your browser
WebPageTest — Remote testing from real locations worldwide. Supports real device emulation, multi-step scripting, video comparison, and waterfall analysis. The gold standard for deep lab analysis
Chrome DevTools Performance panel — Records runtime performance traces. Shows flame charts, main thread blocking, layout shifts, and long tasks. No throttling by default (you must enable it manually)
PageSpeed Insights — Runs Lighthouse remotely on Google's servers AND shows real CrUX field data. The bridge between lab and field

What Lab Data Does Well

Lab data shines in three areas:

Reproducibility — Same test, same conditions, same results. You can A/B test code changes with confidence that differences come from your code, not from network variability
Debugging — Waterfall charts, flame charts, and frame-by-frame rendering timelines let you pinpoint exactly what is slow and why. Field data tells you that something is slow. Lab data tells you why
Pre-production testing — You can test a staging deploy before real users ever see it. Field data requires real traffic

Where Lab Data Fails

Here is the uncomfortable truth: lab data is a fiction. A useful fiction, but a fiction nonetheless.

Throttling is not real slowness. Lighthouse simulates a slow network by adding delays. A real slow 3G connection has packet loss, jitter, variable latency, and TCP retransmissions that throttling cannot replicate. Simulated 4G behaves nothing like actual 4G on a crowded cell tower
CPU throttling is crude. Lighthouse applies a multiplier to slow down JavaScript execution. But real low-end devices have smaller caches, weaker GPUs, thermal throttling, and background processes competing for resources. A 4x CPU slowdown on an M3 MacBook does not equal a Snapdragon 665
No real user diversity. Lab tests use one device profile, one viewport, one network. Your real audience has thousands of combinations. The person on a Jio network in rural India has a fundamentally different experience than someone on fiber in Seoul
No interaction patterns. Lighthouse measures load performance. It does not measure what happens when the user scrolls, clicks, types, or navigates between pages. INP (Interaction to Next Paint) can only be measured meaningfully with real interactions

Quiz

A developer's Lighthouse score shows LCP of 1.2 seconds, but CrUX data reports LCP at 3.8 seconds for the same page. What is the most likely explanation?

ABCD

Field Data: The Ground Truth

Field data (also called Real User Monitoring, or RUM) captures performance metrics from actual users visiting your site. Every page load, every interaction, every layout shift — measured on their real device, their real network, in their real context.

Sources of Field Data

Chrome User Experience Report (CrUX) is the largest public dataset of real-user performance data. Chrome collects anonymized metrics from users who have opted into usage statistics syncing. Key facts:

Covers millions of websites, updated monthly (with a 28-day rolling window)
Reports Core Web Vitals: LCP, INP, CLS
Provides origin-level and URL-level data
Accessible via PageSpeed Insights, BigQuery, and the CrUX API
Used by Google for Search ranking signals

The web-vitals JavaScript library lets you collect field data from your own users. It hooks into browser Performance APIs to measure LCP, INP, CLS, FCP, and TTFB, then gives you a callback to send that data wherever you want.

Commercial RUM tools (Datadog RUM, New Relic Browser, SpeedCurve, Sentry Performance) provide dashboards, alerting, and deep segmentation on top of raw field data.

What Field Data Does Well

Truth — This is what your users actually experience. No simulation, no throttling, no guessing
Distribution visibility — You see the full range: the fast users, the slow users, the median, the long tail. Lab data gives you one data point. Field data gives you millions
Impact on business — You can correlate real performance with real conversion rates, bounce rates, and engagement. "Users with LCP under 2.5s have 23% higher conversion" is a field-data insight
INP measurement — Interaction to Next Paint requires real user interactions. You cannot meaningfully measure INP in a lab test (Lighthouse reports TBT as a proxy, but it is not the same thing)

Where Field Data Falls Short

No debugging capability. Field data tells you LCP is 4.2 seconds at the 75th percentile. It does not tell you why. Was it a slow server response? A render-blocking stylesheet? A massive hero image? You need lab tools to diagnose
Requires real traffic. New pages, pre-launch features, and staging environments have no field data. You are flying blind until real users visit
Delayed feedback. CrUX updates monthly. Even your own RUM data requires enough samples to be statistically meaningful. Lab data gives you feedback in seconds
Privacy constraints. You cannot collect identifying information about individual user sessions (nor should you). This limits how deep you can segment

Quiz

You need to diagnose why INP is failing on a specific page. Which approach gives you the most actionable debugging information?

ABCD

Why Lab and Field Disagree

The disagreement is not random. There are specific, predictable reasons why lab and field numbers diverge.

Dimension	Lab Data	Field Data
Device	Simulated mid-tier phone (CPU throttling)	Real devices: flagship phones, budget Androids, old iPhones, tablets, desktops
Network	Simulated throttled connection (fixed latency + bandwidth)	Real networks: 3G, 4G, 5G, Wi-Fi, satellite, with jitter and packet loss
User behavior	Cold load only, no interaction	Scroll, click, type, navigate back, switch tabs, multi-step flows
Geography	Single test location	Global distribution across all regions and ISPs
Browser state	Clean profile, no extensions	Dozens of extensions, cached resources, background tabs
Sample size	1 test run (or a few)	Thousands to millions of real page loads
Timing	Snapshot at test time	28-day rolling window (CrUX) or continuous (RUM)
Metrics available	LCP, CLS, TBT, FCP, SI, TTFB	LCP, INP, CLS, FCP, TTFB (real interactions)
Debugging	Full waterfall, flame chart, frame timeline	Aggregate numbers only (unless using attribution builds)
Speed of feedback	Seconds	Days to weeks for statistical significance

The Throttling Problem

Lighthouse applies simulated throttling by default. It adds artificial delays to network requests after they have already completed at full speed. This is fast to run but misses real-world effects like TCP slow start, connection saturation, and resource contention.

WebPageTest supports applied throttling (also called packet-level throttling), which actually shapes traffic at the network layer. This is more realistic but still cannot replicate the behavior of a real congested mobile network where a cell tower is shared with thousands of other users.

Neither approach can simulate:

DNS resolution variability across ISPs
CDN edge cache misses for cold regions
TLS handshake overhead on slow CPUs
TCP congestion window resets from packet loss
Background app interference on mobile devices

The Device Problem

A 4x CPU slowdown on an Apple M3 chip does not produce the same behavior as a MediaTek Dimensity 700 running natively. The throttled M3 still has:

Larger L1/L2/L3 caches
Faster memory bandwidth
Better branch prediction
No thermal throttling (your MacBook has a fan, that budget phone does not)
No competition from other apps for RAM

The result: JavaScript that runs fine under simulated throttling can cause multi-second jank on real budget devices because the bottleneck is not raw CPU speed but memory pressure and thermal constraints.

Quiz

Lighthouse uses 'simulated throttling' by default. What does this actually mean?

ABCD

Percentiles: P75 vs P95

When you look at field data, you are looking at a distribution, not a single number. The choice of which percentile to report changes the story completely.

P75 (75th percentile) means 75% of your users had an experience at or better than this value. Google uses P75 for Core Web Vitals thresholds and Search ranking. If your P75 LCP is 2.4 seconds, that means 75% of your users saw LCP in 2.4 seconds or less.

P95 (95th percentile) captures the experience of your worst-off users (excluding extreme outliers). If your P95 LCP is 8 seconds, that means 5% of your users — potentially millions of people at scale — waited 8 seconds or more for meaningful content.

Why P75?

Google chose P75 as the threshold for Core Web Vitals because:

It is high enough to represent real pain points (not just the median, which hides the long tail)
It is not so extreme that a few outliers (bots, broken connections, users closing tabs) dominate the metric
It balances actionability with sensitivity — improvements at P75 are achievable and meaningful

When P95 Matters

P75 is where Google draws the line. P95 is where your worst user experiences live. Consider:

A site with 1 million daily page loads at P95 LCP of 8s means 50,000 page loads per day are painfully slow
These slow loads disproportionately affect users in developing markets — exactly the audience many companies are trying to grow
Conversion rate impact at the tail is often more severe than at the median

If you only optimize for P75, you are explicitly choosing to ignore the bottom 25% of your user base.

P75 is a floor, not a ceiling

Passing Core Web Vitals at P75 is the minimum bar for Search ranking benefits. If you are serious about performance, track P95 and P99 as well. The long tail is where your most frustrated users live.

The CrUX Dataset

The Chrome User Experience Report is the canonical source of field data for the public web. Understanding how it works — and its limitations — is essential.

How CrUX Collects Data

CrUX data comes from real Chrome users who meet these criteria:

Using Chrome on Android, ChromeOS, Linux, macOS, or Windows (not iOS — Chrome on iOS uses WebKit, not Blink)
Have usage statistic reporting enabled (opted in)
Have synced their browsing history

This means CrUX data skews toward Chrome users and excludes Safari (iOS), Firefox, and other browsers entirely. For sites with heavy iOS traffic, CrUX may not represent the full audience.

CrUX Data Access Points

PageSpeed Insights (PSI) — The easiest way to check CrUX data for any URL. Enter a URL and you get both lab results (Lighthouse) and field data (CrUX) side by side. This is the first place to check.

CrUX API — Programmatic access to origin-level and URL-level CrUX data. Free, requires an API key. Returns P75 values and histogram distributions for Core Web Vitals.

BigQuery — The full CrUX dataset, updated monthly. Lets you run SQL queries across the entire dataset — compare sites, analyze trends, segment by connection type, device type, and country. Incredibly powerful for competitive analysis.

-- Example: Query LCP distribution for a specific origin
SELECT
  origin,
  effective_connection_type.name AS connection_type,
  largest_contentful_paint.histogram AS lcp_histogram,
  largest_contentful_paint.percentiles.p75 AS lcp_p75
FROM
  `chrome-ux-report.all.202403`
WHERE
  origin = 'https://example.com'

CrUX Dashboard — A Google Data Studio template that auto-generates trend charts from CrUX BigQuery data. Plug in your origin and get historical trends without writing SQL.

Quiz

A site has heavy traffic from Safari on iOS. A developer checks CrUX and sees great Core Web Vitals scores. Should they be confident in their performance?

ABCD

Implementing RUM with the web-vitals Library

The web-vitals library is a tiny (under 2KB gzipped) library maintained by the Chrome team. It provides reliable, accurate measurements of all Core Web Vitals using the same underlying browser APIs that CrUX uses.

Basic Setup

import { onLCP, onINP, onCLS, onFCP, onTTFB } from 'web-vitals';

function sendToAnalytics(metric) {
  const body = JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    delta: metric.delta,
    id: metric.id,
    navigationType: metric.navigationType,
  });

  if (navigator.sendBeacon) {
    navigator.sendBeacon('/api/vitals', body);
  } else {
    fetch('/api/vitals', { body, method: 'POST', keepalive: true });
  }
}

onLCP(sendToAnalytics);
onINP(sendToAnalytics);
onCLS(sendToAnalytics);
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);

A few critical details in this code:

navigator.sendBeacon is essential. Unlike fetch, sendBeacon is guaranteed to deliver data even when the user navigates away or closes the tab. Metrics like CLS and INP report their final values on page unload — if you use fetch without keepalive: true, the request may be canceled
metric.delta gives you the change since the last report, not the cumulative value. CLS reports multiple times as shifts occur. Use delta if you are summing values server-side
metric.rating is "good", "needs-improvement", or "poor" based on Core Web Vitals thresholds

Attribution Build for Debugging

The standard web-vitals build tells you what the metric value is. The attribution build tells you why.

import { onINP } from 'web-vitals/attribution';

onINP((metric) => {
  const attribution = metric.attribution;

  console.log('Slow interaction:', {
    eventTarget: attribution.interactionTarget,
    eventType: attribution.interactionType,
    inputDelay: attribution.inputDelay,
    processingDuration: attribution.processingDuration,
    presentationDelay: attribution.presentationDelay,
    longAnimationFrameEntries: attribution.longAnimationFrameEntries,
  });
});

The attribution build is larger (around 4KB gzipped) but invaluable for debugging. It breaks down INP into its three phases:

Input delay — Time from user interaction to when the event handler starts. Usually caused by long tasks blocking the main thread
Processing duration — Time spent in event handlers. Your code's fault
Presentation delay — Time from handler completion to next paint. Usually layout/style recalculation or rendering work

Using PerformanceObserver Directly

For custom metrics beyond Core Web Vitals, you can use the PerformanceObserver API directly.

const observer = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (entry.entryType === 'largest-contentful-paint') {
      console.log('LCP candidate:', {
        element: entry.element,
        url: entry.url,
        startTime: entry.startTime,
        size: entry.size,
        renderTime: entry.renderTime,
        loadTime: entry.loadTime,
      });
    }
  }
});

observer.observe({ type: 'largest-contentful-paint', buffered: true });

The buffered: true option is critical — without it, you miss entries that occurred before the observer was registered. Since scripts typically load after the page has started rendering, many LCP candidates would be lost without buffering.

Why sendBeacon, not fetch?

When a user navigates away from your page, the browser cancels pending fetch requests. This is a problem because CLS and INP finalize their values on page visibility change or unload. If you collect metrics using fetch without keepalive: true, you lose the most important data point — the final metric value.

navigator.sendBeacon solves this. It is designed for "fire-and-forget" requests that must survive page unload. The browser queues the request and guarantees delivery even after the page is gone. The tradeoff: you cannot read the response, and the payload is limited (typically 64KB). For performance telemetry, this is exactly what you want.

If sendBeacon is unavailable, fetch with keepalive: true is the fallback. The keepalive flag tells the browser to keep the request alive even after the page unloads, up to a cumulative 64KB limit across all keepalive requests.

When to Use Each

This is not an either-or decision. Lab and field data serve fundamentally different purposes, and a mature performance practice uses both.

Scenario	Use Lab Data	Use Field Data
Debugging a specific performance bottleneck	Yes — waterfall, flame chart, frame analysis	No — too aggregate for root cause analysis
Measuring real-world user impact	No — conditions are synthetic	Yes — this IS the real-world impact
Pre-launch performance testing	Yes — no real users yet	No — no traffic to measure
Setting performance budgets	Yes — reproducible, automatable in CI	Yes — validate budgets against real conditions
Tracking performance regressions over time	Yes — consistent baseline for comparison	Yes — catches regressions that lab tests miss
Optimizing for a specific market (India, Brazil, Nigeria)	Partially — WebPageTest can test from those locations	Yes — CrUX and RUM show actual user experience there
Measuring INP (Interaction to Next Paint)	No — requires real user interactions	Yes — the only way to measure real INP
Competitive benchmarking	Yes — WebPageTest for controlled comparison	Yes — CrUX BigQuery for real-world comparison
CI/CD pipeline gates	Yes — Lighthouse CI can block deploys	No — too slow for build pipelines

The Ideal Workflow

Develop — Use DevTools Performance panel to profile as you build. Catch obvious issues early
Pre-merge — Run Lighthouse CI in your CI/CD pipeline. Set performance budgets. Block merges that regress key metrics
Post-deploy — Monitor field data via RUM (web-vitals library + your analytics backend). Watch for regressions that lab tests missed
Investigate — When field data shows a regression, use WebPageTest and DevTools to reproduce and diagnose
Validate — After fixing, confirm improvement in both lab (immediate feedback) and field (delayed but authoritative) data

Quiz

Your CI pipeline runs Lighthouse on every pull request and the scores are consistently green. After deploying, your RUM data shows INP failures on the checkout page. Why did CI not catch this?

ABCD

Building a Complete Performance Monitoring Stack

A production-grade performance monitoring setup combines lab and field data into a single workflow.

Layer 1: Lab (Development + CI)

Chrome DevTools Performance panel during development
Lighthouse CI in your CI/CD pipeline with performance budgets
WebPageTest for deep-dive investigations and competitor analysis
Custom DevTools recordings for specific user flows

Layer 2: Field (Production)

web-vitals library collecting Core Web Vitals from all users
Attribution build enabled for a sample of traffic (not all — it adds bundle size)
Data pipeline to your analytics backend (BigQuery, Datadog, custom)
Dashboards segmented by device type, connection speed, geography, and page

Layer 3: Alerting

Alert on P75 regressions (Core Web Vitals threshold crossings)
Alert on P95 regressions (tail performance degradation)
Alert on lab regression (Lighthouse CI budget failures)
Weekly reports comparing lab trends vs field trends

Performance Monitoring PipelinePhase 1 / 6

Phase 1 / 6Lab: Development

DevTools Performance panel, manual Lighthouse runs during active development

instant feedback

1/6

Key Rules

1Lab data is for debugging and prevention. Field data is for truth and validation. Use both.
2Lighthouse simulated throttling does not replicate real network conditions — never trust a lab score as your users' reality.
3CrUX only includes Chrome users with sync enabled — supplement with your own RUM for full browser coverage.
4Always use navigator.sendBeacon or fetch with keepalive for metric collection — regular fetch loses data on page unload.
5P75 is Google's threshold, but P95 reveals your worst user experiences. Track both.
6The web-vitals attribution build is essential for diagnosing WHY a metric is slow, not just that it is slow.
7INP cannot be meaningfully measured in lab conditions — it requires real user interactions in the field.

What developers do	What they should do
Treating Lighthouse score as the definitive measure of site performance Lighthouse runs in synthetic conditions that do not represent the diversity of real user devices, networks, and behaviors	Use Lighthouse for debugging and CI gates, but validate with CrUX and RUM for real-world performance
Only tracking P75 because that is what Google uses for ranking P75 passing means 25% of users may still have a poor experience. At scale, that is millions of bad page loads	Track P75, P95, and P99 to understand the full distribution of user experience
Using fetch without keepalive to send performance metrics CLS and INP report final values on page unload. Regular fetch requests are canceled when the user navigates away, losing the most critical data	Use navigator.sendBeacon or fetch with keepalive: true
Assuming CrUX data represents all your users CrUX only includes opted-in Chrome users. Safari (iOS), Firefox, and other browsers are excluded entirely	Supplement CrUX with your own RUM implementation that covers all browsers
Running Lighthouse in CI and assuming INP is covered because TBT passes TBT is a lab proxy that only measures main-thread blocking during load. It cannot capture real user interaction responsiveness	Implement field-based INP monitoring with the web-vitals library

Quiz

You are setting up performance monitoring for a new production site. Which combination gives you the most complete picture?

ABCD