Lab Data vs Field Data
Your Lighthouse Score Is Not Your Users' Experience
You run Lighthouse, see a green 95 performance score, ship with confidence. Two weeks later, your CrUX data shows half your users have LCP over 4 seconds. What happened?
Nothing went wrong with the tool. The problem is that Lighthouse and your users are measuring completely different realities. Lighthouse runs on your fast MacBook, on a stable Wi-Fi connection, with a clean browser profile. Your actual users are on a 2019 Android phone in Jakarta, riding the subway with spotty 4G, with 47 browser tabs open.
This gap between lab data and field data is one of the most misunderstood concepts in web performance. Understanding it is the difference between optimizing for benchmarks and optimizing for real humans.
Lab data is like testing a car on a perfectly smooth track at a controlled temperature. Field data is like tracking how that car performs across millions of drivers on real roads — potholes, rain, traffic, altitude changes. Both are useful. Neither alone tells the full story. The track gives you repeatable, debuggable results. The road gives you the truth.
Lab Data: The Controlled Experiment
Lab data comes from running performance tools in a controlled, synthetic environment. You choose the device profile, network speed, and conditions. The same test run twice produces nearly identical results.
Common Lab Tools
- Lighthouse — Built into Chrome DevTools. Simulates a mid-tier mobile device with throttled CPU (4x slowdown) and network (simulated slow 4G). Runs locally in your browser
- WebPageTest — Remote testing from real locations worldwide. Supports real device emulation, multi-step scripting, video comparison, and waterfall analysis. The gold standard for deep lab analysis
- Chrome DevTools Performance panel — Records runtime performance traces. Shows flame charts, main thread blocking, layout shifts, and long tasks. No throttling by default (you must enable it manually)
- PageSpeed Insights — Runs Lighthouse remotely on Google's servers AND shows real CrUX field data. The bridge between lab and field
What Lab Data Does Well
Lab data shines in three areas:
- Reproducibility — Same test, same conditions, same results. You can A/B test code changes with confidence that differences come from your code, not from network variability
- Debugging — Waterfall charts, flame charts, and frame-by-frame rendering timelines let you pinpoint exactly what is slow and why. Field data tells you that something is slow. Lab data tells you why
- Pre-production testing — You can test a staging deploy before real users ever see it. Field data requires real traffic
Where Lab Data Fails
Here is the uncomfortable truth: lab data is a fiction. A useful fiction, but a fiction nonetheless.
- Throttling is not real slowness. Lighthouse simulates a slow network by adding delays. A real slow 3G connection has packet loss, jitter, variable latency, and TCP retransmissions that throttling cannot replicate. Simulated 4G behaves nothing like actual 4G on a crowded cell tower
- CPU throttling is crude. Lighthouse applies a multiplier to slow down JavaScript execution. But real low-end devices have smaller caches, weaker GPUs, thermal throttling, and background processes competing for resources. A 4x CPU slowdown on an M3 MacBook does not equal a Snapdragon 665
- No real user diversity. Lab tests use one device profile, one viewport, one network. Your real audience has thousands of combinations. The person on a Jio network in rural India has a fundamentally different experience than someone on fiber in Seoul
- No interaction patterns. Lighthouse measures load performance. It does not measure what happens when the user scrolls, clicks, types, or navigates between pages. INP (Interaction to Next Paint) can only be measured meaningfully with real interactions
Field Data: The Ground Truth
Field data (also called Real User Monitoring, or RUM) captures performance metrics from actual users visiting your site. Every page load, every interaction, every layout shift — measured on their real device, their real network, in their real context.
Sources of Field Data
Chrome User Experience Report (CrUX) is the largest public dataset of real-user performance data. Chrome collects anonymized metrics from users who have opted into usage statistics syncing. Key facts:
- Covers millions of websites, updated monthly (with a 28-day rolling window)
- Reports Core Web Vitals: LCP, INP, CLS
- Provides origin-level and URL-level data
- Accessible via PageSpeed Insights, BigQuery, and the CrUX API
- Used by Google for Search ranking signals
The web-vitals JavaScript library lets you collect field data from your own users. It hooks into browser Performance APIs to measure LCP, INP, CLS, FCP, and TTFB, then gives you a callback to send that data wherever you want.
Commercial RUM tools (Datadog RUM, New Relic Browser, SpeedCurve, Sentry Performance) provide dashboards, alerting, and deep segmentation on top of raw field data.
What Field Data Does Well
- Truth — This is what your users actually experience. No simulation, no throttling, no guessing
- Distribution visibility — You see the full range: the fast users, the slow users, the median, the long tail. Lab data gives you one data point. Field data gives you millions
- Impact on business — You can correlate real performance with real conversion rates, bounce rates, and engagement. "Users with LCP under 2.5s have 23% higher conversion" is a field-data insight
- INP measurement — Interaction to Next Paint requires real user interactions. You cannot meaningfully measure INP in a lab test (Lighthouse reports TBT as a proxy, but it is not the same thing)
Where Field Data Falls Short
- No debugging capability. Field data tells you LCP is 4.2 seconds at the 75th percentile. It does not tell you why. Was it a slow server response? A render-blocking stylesheet? A massive hero image? You need lab tools to diagnose
- Requires real traffic. New pages, pre-launch features, and staging environments have no field data. You are flying blind until real users visit
- Delayed feedback. CrUX updates monthly. Even your own RUM data requires enough samples to be statistically meaningful. Lab data gives you feedback in seconds
- Privacy constraints. You cannot collect identifying information about individual user sessions (nor should you). This limits how deep you can segment
Why Lab and Field Disagree
The disagreement is not random. There are specific, predictable reasons why lab and field numbers diverge.
| Dimension | Lab Data | Field Data |
|---|---|---|
| Device | Simulated mid-tier phone (CPU throttling) | Real devices: flagship phones, budget Androids, old iPhones, tablets, desktops |
| Network | Simulated throttled connection (fixed latency + bandwidth) | Real networks: 3G, 4G, 5G, Wi-Fi, satellite, with jitter and packet loss |
| User behavior | Cold load only, no interaction | Scroll, click, type, navigate back, switch tabs, multi-step flows |
| Geography | Single test location | Global distribution across all regions and ISPs |
| Browser state | Clean profile, no extensions | Dozens of extensions, cached resources, background tabs |
| Sample size | 1 test run (or a few) | Thousands to millions of real page loads |
| Timing | Snapshot at test time | 28-day rolling window (CrUX) or continuous (RUM) |
| Metrics available | LCP, CLS, TBT, FCP, SI, TTFB | LCP, INP, CLS, FCP, TTFB (real interactions) |
| Debugging | Full waterfall, flame chart, frame timeline | Aggregate numbers only (unless using attribution builds) |
| Speed of feedback | Seconds | Days to weeks for statistical significance |
The Throttling Problem
Lighthouse applies simulated throttling by default. It adds artificial delays to network requests after they have already completed at full speed. This is fast to run but misses real-world effects like TCP slow start, connection saturation, and resource contention.
WebPageTest supports applied throttling (also called packet-level throttling), which actually shapes traffic at the network layer. This is more realistic but still cannot replicate the behavior of a real congested mobile network where a cell tower is shared with thousands of other users.
Neither approach can simulate:
- DNS resolution variability across ISPs
- CDN edge cache misses for cold regions
- TLS handshake overhead on slow CPUs
- TCP congestion window resets from packet loss
- Background app interference on mobile devices
The Device Problem
A 4x CPU slowdown on an Apple M3 chip does not produce the same behavior as a MediaTek Dimensity 700 running natively. The throttled M3 still has:
- Larger L1/L2/L3 caches
- Faster memory bandwidth
- Better branch prediction
- No thermal throttling (your MacBook has a fan, that budget phone does not)
- No competition from other apps for RAM
The result: JavaScript that runs fine under simulated throttling can cause multi-second jank on real budget devices because the bottleneck is not raw CPU speed but memory pressure and thermal constraints.
Percentiles: P75 vs P95
When you look at field data, you are looking at a distribution, not a single number. The choice of which percentile to report changes the story completely.
P75 (75th percentile) means 75% of your users had an experience at or better than this value. Google uses P75 for Core Web Vitals thresholds and Search ranking. If your P75 LCP is 2.4 seconds, that means 75% of your users saw LCP in 2.4 seconds or less.
P95 (95th percentile) captures the experience of your worst-off users (excluding extreme outliers). If your P95 LCP is 8 seconds, that means 5% of your users — potentially millions of people at scale — waited 8 seconds or more for meaningful content.
Why P75?
Google chose P75 as the threshold for Core Web Vitals because:
- It is high enough to represent real pain points (not just the median, which hides the long tail)
- It is not so extreme that a few outliers (bots, broken connections, users closing tabs) dominate the metric
- It balances actionability with sensitivity — improvements at P75 are achievable and meaningful
When P95 Matters
P75 is where Google draws the line. P95 is where your worst user experiences live. Consider:
- A site with 1 million daily page loads at P95 LCP of 8s means 50,000 page loads per day are painfully slow
- These slow loads disproportionately affect users in developing markets — exactly the audience many companies are trying to grow
- Conversion rate impact at the tail is often more severe than at the median
If you only optimize for P75, you are explicitly choosing to ignore the bottom 25% of your user base.
Passing Core Web Vitals at P75 is the minimum bar for Search ranking benefits. If you are serious about performance, track P95 and P99 as well. The long tail is where your most frustrated users live.
The CrUX Dataset
The Chrome User Experience Report is the canonical source of field data for the public web. Understanding how it works — and its limitations — is essential.
How CrUX Collects Data
CrUX data comes from real Chrome users who meet these criteria:
- Using Chrome on Android, ChromeOS, Linux, macOS, or Windows (not iOS — Chrome on iOS uses WebKit, not Blink)
- Have usage statistic reporting enabled (opted in)
- Have synced their browsing history
This means CrUX data skews toward Chrome users and excludes Safari (iOS), Firefox, and other browsers entirely. For sites with heavy iOS traffic, CrUX may not represent the full audience.
CrUX Data Access Points
PageSpeed Insights (PSI) — The easiest way to check CrUX data for any URL. Enter a URL and you get both lab results (Lighthouse) and field data (CrUX) side by side. This is the first place to check.
CrUX API — Programmatic access to origin-level and URL-level CrUX data. Free, requires an API key. Returns P75 values and histogram distributions for Core Web Vitals.
BigQuery — The full CrUX dataset, updated monthly. Lets you run SQL queries across the entire dataset — compare sites, analyze trends, segment by connection type, device type, and country. Incredibly powerful for competitive analysis.
-- Example: Query LCP distribution for a specific origin
SELECT
origin,
effective_connection_type.name AS connection_type,
largest_contentful_paint.histogram AS lcp_histogram,
largest_contentful_paint.percentiles.p75 AS lcp_p75
FROM
`chrome-ux-report.all.202403`
WHERE
origin = 'https://example.com'
CrUX Dashboard — A Google Data Studio template that auto-generates trend charts from CrUX BigQuery data. Plug in your origin and get historical trends without writing SQL.
Implementing RUM with the web-vitals Library
The web-vitals library is a tiny (under 2KB gzipped) library maintained by the Chrome team. It provides reliable, accurate measurements of all Core Web Vitals using the same underlying browser APIs that CrUX uses.
Basic Setup
import { onLCP, onINP, onCLS, onFCP, onTTFB } from 'web-vitals';
function sendToAnalytics(metric) {
const body = JSON.stringify({
name: metric.name,
value: metric.value,
rating: metric.rating,
delta: metric.delta,
id: metric.id,
navigationType: metric.navigationType,
});
if (navigator.sendBeacon) {
navigator.sendBeacon('/api/vitals', body);
} else {
fetch('/api/vitals', { body, method: 'POST', keepalive: true });
}
}
onLCP(sendToAnalytics);
onINP(sendToAnalytics);
onCLS(sendToAnalytics);
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);
A few critical details in this code:
navigator.sendBeaconis essential. Unlikefetch,sendBeaconis guaranteed to deliver data even when the user navigates away or closes the tab. Metrics like CLS and INP report their final values on page unload — if you usefetchwithoutkeepalive: true, the request may be canceledmetric.deltagives you the change since the last report, not the cumulative value. CLS reports multiple times as shifts occur. Usedeltaif you are summing values server-sidemetric.ratingis"good","needs-improvement", or"poor"based on Core Web Vitals thresholds
Attribution Build for Debugging
The standard web-vitals build tells you what the metric value is. The attribution build tells you why.
import { onINP } from 'web-vitals/attribution';
onINP((metric) => {
const attribution = metric.attribution;
console.log('Slow interaction:', {
eventTarget: attribution.interactionTarget,
eventType: attribution.interactionType,
inputDelay: attribution.inputDelay,
processingDuration: attribution.processingDuration,
presentationDelay: attribution.presentationDelay,
longAnimationFrameEntries: attribution.longAnimationFrameEntries,
});
});
The attribution build is larger (around 4KB gzipped) but invaluable for debugging. It breaks down INP into its three phases:
- Input delay — Time from user interaction to when the event handler starts. Usually caused by long tasks blocking the main thread
- Processing duration — Time spent in event handlers. Your code's fault
- Presentation delay — Time from handler completion to next paint. Usually layout/style recalculation or rendering work
Using PerformanceObserver Directly
For custom metrics beyond Core Web Vitals, you can use the PerformanceObserver API directly.
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.entryType === 'largest-contentful-paint') {
console.log('LCP candidate:', {
element: entry.element,
url: entry.url,
startTime: entry.startTime,
size: entry.size,
renderTime: entry.renderTime,
loadTime: entry.loadTime,
});
}
}
});
observer.observe({ type: 'largest-contentful-paint', buffered: true });
The buffered: true option is critical — without it, you miss entries that occurred before the observer was registered. Since scripts typically load after the page has started rendering, many LCP candidates would be lost without buffering.
Why sendBeacon, not fetch?
When a user navigates away from your page, the browser cancels pending fetch requests. This is a problem because CLS and INP finalize their values on page visibility change or unload. If you collect metrics using fetch without keepalive: true, you lose the most important data point — the final metric value.
navigator.sendBeacon solves this. It is designed for "fire-and-forget" requests that must survive page unload. The browser queues the request and guarantees delivery even after the page is gone. The tradeoff: you cannot read the response, and the payload is limited (typically 64KB). For performance telemetry, this is exactly what you want.
If sendBeacon is unavailable, fetch with keepalive: true is the fallback. The keepalive flag tells the browser to keep the request alive even after the page unloads, up to a cumulative 64KB limit across all keepalive requests.
When to Use Each
This is not an either-or decision. Lab and field data serve fundamentally different purposes, and a mature performance practice uses both.
| Scenario | Use Lab Data | Use Field Data |
|---|---|---|
| Debugging a specific performance bottleneck | Yes — waterfall, flame chart, frame analysis | No — too aggregate for root cause analysis |
| Measuring real-world user impact | No — conditions are synthetic | Yes — this IS the real-world impact |
| Pre-launch performance testing | Yes — no real users yet | No — no traffic to measure |
| Setting performance budgets | Yes — reproducible, automatable in CI | Yes — validate budgets against real conditions |
| Tracking performance regressions over time | Yes — consistent baseline for comparison | Yes — catches regressions that lab tests miss |
| Optimizing for a specific market (India, Brazil, Nigeria) | Partially — WebPageTest can test from those locations | Yes — CrUX and RUM show actual user experience there |
| Measuring INP (Interaction to Next Paint) | No — requires real user interactions | Yes — the only way to measure real INP |
| Competitive benchmarking | Yes — WebPageTest for controlled comparison | Yes — CrUX BigQuery for real-world comparison |
| CI/CD pipeline gates | Yes — Lighthouse CI can block deploys | No — too slow for build pipelines |
The Ideal Workflow
- Develop — Use DevTools Performance panel to profile as you build. Catch obvious issues early
- Pre-merge — Run Lighthouse CI in your CI/CD pipeline. Set performance budgets. Block merges that regress key metrics
- Post-deploy — Monitor field data via RUM (web-vitals library + your analytics backend). Watch for regressions that lab tests missed
- Investigate — When field data shows a regression, use WebPageTest and DevTools to reproduce and diagnose
- Validate — After fixing, confirm improvement in both lab (immediate feedback) and field (delayed but authoritative) data
Building a Complete Performance Monitoring Stack
A production-grade performance monitoring setup combines lab and field data into a single workflow.
Layer 1: Lab (Development + CI)
- Chrome DevTools Performance panel during development
- Lighthouse CI in your CI/CD pipeline with performance budgets
- WebPageTest for deep-dive investigations and competitor analysis
- Custom DevTools recordings for specific user flows
Layer 2: Field (Production)
web-vitalslibrary collecting Core Web Vitals from all users- Attribution build enabled for a sample of traffic (not all — it adds bundle size)
- Data pipeline to your analytics backend (BigQuery, Datadog, custom)
- Dashboards segmented by device type, connection speed, geography, and page
Layer 3: Alerting
- Alert on P75 regressions (Core Web Vitals threshold crossings)
- Alert on P95 regressions (tail performance degradation)
- Alert on lab regression (Lighthouse CI budget failures)
- Weekly reports comparing lab trends vs field trends
- 1Lab data is for debugging and prevention. Field data is for truth and validation. Use both.
- 2Lighthouse simulated throttling does not replicate real network conditions — never trust a lab score as your users' reality.
- 3CrUX only includes Chrome users with sync enabled — supplement with your own RUM for full browser coverage.
- 4Always use navigator.sendBeacon or fetch with keepalive for metric collection — regular fetch loses data on page unload.
- 5P75 is Google's threshold, but P95 reveals your worst user experiences. Track both.
- 6The web-vitals attribution build is essential for diagnosing WHY a metric is slow, not just that it is slow.
- 7INP cannot be meaningfully measured in lab conditions — it requires real user interactions in the field.
| What developers do | What they should do |
|---|---|
| Treating Lighthouse score as the definitive measure of site performance Lighthouse runs in synthetic conditions that do not represent the diversity of real user devices, networks, and behaviors | Use Lighthouse for debugging and CI gates, but validate with CrUX and RUM for real-world performance |
| Only tracking P75 because that is what Google uses for ranking P75 passing means 25% of users may still have a poor experience. At scale, that is millions of bad page loads | Track P75, P95, and P99 to understand the full distribution of user experience |
| Using fetch without keepalive to send performance metrics CLS and INP report final values on page unload. Regular fetch requests are canceled when the user navigates away, losing the most critical data | Use navigator.sendBeacon or fetch with keepalive: true |
| Assuming CrUX data represents all your users CrUX only includes opted-in Chrome users. Safari (iOS), Firefox, and other browsers are excluded entirely | Supplement CrUX with your own RUM implementation that covers all browsers |
| Running Lighthouse in CI and assuming INP is covered because TBT passes TBT is a lab proxy that only measures main-thread blocking during load. It cannot capture real user interaction responsiveness | Implement field-based INP monitoring with the web-vitals library |