Skip to main content

California Freight Cleanup → Investigation 3-1

How well does the base transport model predict measured PM2.5?

5-fold CV-RMSE 6.08 µg/m³ • fails Tessum 2017 • R² = −3.0 • MFB −0.013 (passes Boylan-Russell)

We held the ISRM source-receptor model against 66 California air monitors across five years (2019–2023). The model is designed to track which pollution sources contribute to which locations — not to predict absolute concentrations at a given site. We document that failure explicitly here so no one reads the cascade’s headline accuracy stat (RMSE 2.76 after all corrections) as a marginal improvement over a functioning base. The base does not function as an absolute predictor. That’s why the corrections matter.

Should the California Freight Cleanup cascade present L1 ISRM × NEI as a predictive concentration model? No. L1’s role is sector decomposition and relative policy ranking — not absolute-level PM2.5 prediction. This gate quantifies the absolute-level failure so every downstream consumer (Investigation 1-1, 6, 7, 11, 14, 15) is correctly disclosed: their inputs come from a cascade that uses L4 MFGP for concentration levels, not L1.

Panel. 66 California AQS sites, stratified by air basin and monitoring setting (urban / rural / near-road), 2019–2023, requiring ≥80% daily completeness per year. The Phase A1 panel yields 330 site-years (117,128 daily observations).

Predictor (L1). The pre-computed isrm_sector_pm25.npz (~21k cells, on-road + residential + EGU + area + wildfire annual-mean total) sampled at each AQS site by nearest-cell Euclidean lookup. One predicted value per site per year — no daily structure, no AQS fitting.

Evaluation protocol. Three layers: (1) global in-sample RMSE, MFB, R² across all 330 site-years; (2) held-out 5-fold cross-validation, basin-stratified (same folds reused by Investigation 3-2–43 for rung-comparability); (3) stratified breakouts by season, air basin, concentration band, and setting. Strata with fewer than 30 paired daily observations are reported as null rather than fabricated metrics.

Baselines applied. Tessum et al. 2017 PNAS Table 2 (InMAP CV-RMSE 3.0–5.0 µg/m³, DOI 10.1073/pnas.1614453114) is the named comparator gate. Boylan & Russell 2006 Atmos. Env. (|MFB| ≤ 0.6, DOI 10.1016/j.atmosenv.2005.09.087) is the regulatory annual-scale criterion.

Held-out accuracy: RMSE 6.08 µg/m³ — outside the published accuracy range for this class of model

The headline held-out 5-fold mean RMSE is 6.08 µg/m³ (SD 2.03 across folds), exceeding the Tessum 2017 InMAP CV-RMSE upper bound of 5.0 µg/m³ by 22%. The global in-sample RMSE is 6.60 µg/m³.

Per-fold detail:

FoldTest sitesnRMSE µg/m³MFB
016809.24+0.059−10.83
115756.10−0.098−2.02
213656.42+0.015−3.40
312604.68−0.003−0.90
410503.96−0.050−0.019

R² = −3.0: the model is less useful than predicting the average everywhere

A constant predicting the AQS global mean would score R² = 0. ISRM × NEI scores R² = −3.0 — three times worse than that. The matrix preserves source-receptor proportionality (sector shares remain meaningful) but cannot place absolute PM2.5 at any specific site within ±5 µg/m³. This is not a model failure — it is a category error to use L1 as an absolute concentration predictor.

Performance collapses at high concentrations

By concentration band: RMSE 8.0 at <10 µg/m³, 8.4 at 10–25, 20.2 at 25–50, 84.0 at ≥50 µg/m³ (MFB −1.5 — under-predicts by roughly 150% at extreme concentrations). ISRM has no event-scale wildfire smoke; the cascade closes this gap via L4 MFGP + FAQSD’s AQS assimilation step.

By air basin (global daily obs):

BasinRMSEMFBn
SJV13.74+0.19721,282
Sacramento13.30+0.16610,557
LA Basin11.03+0.35914,447
rest_ca10.72+0.12042,481
Bay Area9.76+0.22824,899
San Diego8.70+0.5263,462

The regulatory bias check passes — but only because opposite-sign regional errors cancel

Global MFB −0.013 passes |MFB| ≤ 0.6 — but only because opposite-sign biases cancel. Stratified MFB runs from +0.526 (San Diego) to −0.763 (25–50 µg/m³ band). The global gate tells us nothing about L1 adequacy. Only the stratified table reveals the structural bias pattern.

Item
run.py[internal artifact]
results.jsoninvestigations/39_aqs-held-out-validation/latest/results.json
analysis.mdinvestigations/39_aqs-held-out-validation/latest/analysis.md
scenario.mdinvestigations/39_aqs-held-out-validation/latest/scenario.md
AQS paneldata/processed/aqs/daily_panel_2019_2023.parquet (sha256 5af5b25a2b0d)
Validation sitesdata/processed/aqs/validation_sites.json (sha256 a6bfd352357a)
Tessum 2017 baselinePNAS 114(13):3367–3372 (DOI 10.1073/pnas.1614453114)
Boylan & Russell 2006Atmos. Env. (DOI 10.1016/j.atmosenv.2005.09.087)
Last run2026-05-01 (results sha256 c63ae2d281ce)