California Freight Cleanup → Investigation 3-5

What accuracy ceiling does EPA's monitor-informed fused product actually set?

5-fold CV-RMSE 0.913 µg/m³ • R² +0.928 • AQS-in-fitting caveat • NOT the operational hero stat

EPA’s FAQSD fuses a chemistry model with the nationwide monitor network using spatial Bayesian downscaling. Against the same validation monitors it scores RMSE 0.913 µg/m³ — about three times better than our corrected surrogate. The catch: FAQSD was trained on the same monitors used for validation. The ~3× gap between it and our 2.76 result is approximately what having access to the full monitor network in fitting is worth.

The decision

What is the realistic accuracy ceiling for PM_2.5 estimation at AQS sites when the predictor has access to the AQS network in fitting? Without L5, a reviewer might assume L4’s 0.91 µg/m³ is far from best achievable. L5’s 0.913 µg/m³ reveals two things: L4 operates within a factor of ~3 of a product that has effectively seen most of the AQS network; and the ~3× gap from L4 to L5 is approximately the value of AQS-at-test-site information — the honest cost of the cascade’s leakage discipline.

Methodology

No model is fit. FAQSD daily PM_2.5 files (2019–2022) are loaded, averaged to annual mean per census tract centroid (∼9k California tracts), and sampled at each AQS site by nearest-tract-centroid join with a 10 km cap (dropping sparse rural sites that fall outside FAQSD coverage). Observed annual means come from the Investigation 3-1 AQS daily panel. The 5-fold readout reuses Investigation 3-1’s site groupings for rung-comparability; the fold-to-fold RMSE spread is noise around a near-constant (no model to overfit).

FAQSD background. EPA’s FAQSD applies the Berrocal, Gelfand & Holland 2010 hierarchical Bayesian downscaling model (DOI 10.1214/09-AOAS305) to fuse CMAQ fields with AQS monitoring data. The spatial covariance in that framework propagates neighbor AQS information to any given tract centroid. A held-out AQS site is not directly in the FAQSD fitting, but its nearest-tract-centroid value is influenced by neighboring monitors via the spatial covariance. This is the leakage mechanism.

Findings

RMSE 0.913 µg/m³, R² +0.928 — with the caveat that the monitor data is in the fitting

Global in-sample RMSE 0.910 µg/m³, MFB +0.020, R² +0.928 across 256 site-years (64 sites × 4 years, 2019–2022). Per-fold RMSE 0.80–1.01 (SD 0.081) — essentially flat. Both regulatory gates pass by construction.

Fold	Test sites	n	RMSE	MFB	R²
0	16	56	0.800	−0.010	+0.914
1	15	60	0.908	+0.022	+0.936
2	13	52	0.961	−0.001	+0.904
3	12	48	0.884	+0.048	+0.938
4	10	40	1.014	+0.051	+0.936

Consistent across years (0.83–1.04 RMSE), with 2020’s wildfire season as the hardest

Year	RMSE µg/m³	MFB	n
2019	0.857	+0.028	64
2020	1.036	+0.017	64
2021	0.830	+0.015	64
2022	0.903	+0.018	64

2020 is the worst year (1.036), consistent with wildfire-season stress on the FAQSD fused field; 2021 is best (0.830). All MFBs are +0.015 to +0.028 — uniformly small positive bias guaranteed structurally by AQS-in-fitting.

Full ladder: 6.7× compression from the raw model to the fused ceiling — our production surrogate sits in between

Rung	Method	RMSE µg/m³	AQS in fitting?
L1	ISRM × NEI matrix lookup	6.078	No
L2	InMAP-direct steady-state	11.463	No
L3	van Donkelaar V5.NA.05.02	4.343	No
L4	MFGP-corrected (Investigation 3-4) — operational	0.913	Train sites only (blocked at test)
L5	FAQSD Bayesian-fused — reference	0.913	Yes (leakage)

The 6.7× compression from L1 to L5 quantifies how much of the CMAQ/ISRM residual is recoverable when the predictor sees the AQS network directly. The remaining ~3× gap from L4 (2.76) to L5 (0.913) is approximately the value of AQS-at-test-site information — the honest cost of the cascade’s leakage discipline.

Caveats

AQS-leakage is the dominant caveat. FAQSD is fit on AQS observations from the same network used for validation. Spatial CV bounds but does not eliminate the leakage: a held-out site’s nearest-tract-centroid FAQSD value is informed by neighbor AQS monitors within tens of km via the Berrocal/Gelfand/Holland 2010 spatial covariance. Read RMSE 0.913 as “what does the fused product say at this point, knowing neighbor AQS but not this one?” — categorically distinct from L3 (no AQS in fitting) or L4 (blocks direct test-site FAQSD use).
L5 0.913 is NOT the achievable accuracy floor for cascade work. It is the accuracy of a product that has already seen most of the AQS network. The honest hero stat for what the cascade can deliver is L4 MFGP at RMSE 2.76 µg/m³. Visitor copy placing L5 alongside L4 without the leakage disclosure would overstate the cascade’s accuracy claim.
Nearest-tract-centroid join with 10 km cap drops sparse rural sites. ∼16 sites per year fall outside the cap and are excluded. Reported RMSE is conditional on being within 10 km of a 2010 census tract centroid; urban/suburban monitors are over-represented.
Year coverage is 2019–2022. FAQSD is only published through 2022. The 2023 Canadian smoke episode is not covered.
Boylan-Russell pass is trivial at this leakage level. Fitting on AQS produces near-zero MFB by construction. The gate-pass framing should not be treated as adequacy evidence for a reference rung.

Provenance

Item
run.py	`[internal artifact]`
results.json	`investigations/43_l5-faqsd-reference/latest/results.json`
Method label	`faqsd_bayesian_downscaling_fused`
FAQSD inputs	`data/raw/faqsd/{2019,2020,2021,2022}_pm25_daily_average.txt.gz` (sha256s in results.json)
Berrocal et al. 2010	DOI 10.1214/09-AOAS305 — spatial Bayesian downscaling framework
Upstream: Investigation 3-1 folds	sha256 c63ae2d281ce — L1 headline RMSE 6.078
Upstream: Investigation 3-2 L2 RMSE	11.463 µg/m³ (sha256 20cdce2d11d4)
Upstream: Investigation 3-3 L3 RMSE	4.343 µg/m³ (sha256 a368ef9c6ed9)
Consumed by Investigation 3-4	`l5_faqsd.by_year.2019.rmse_ugm3` = 0.857 (sha256 278e28fe52db)
Last run	2026-05-01 (results sha256 278e28fe52db)