California Freight Cleanup → Investigation 3-2

Does the full InMAP chemistry simulation beat the simpler matrix lookup?

5-fold CV-RMSE 11.46 µg/m³ • +5.39 vs L1 • fails Tessum by 130% • LA Basin RMSE 29.0

We hypothesized that running the full nonlinear InMAP simulation on 2023 California emissions would beat the simpler matrix lookup. It did not. The mismatch between 2023 emissions and the 2005 meteorology on disk likely explains most of the failure — but we cannot separate that from a genuine model limitation at this resolution. We document it honestly and skip this rung in the production chain.

The decision

Does running the full nonlinear InMAP simulation on 2023 California emissions close the gap to the Tessum 2017 baseline? If yes, InMAP-direct enters the MFGP as an honest L2 rung. If no, the cascade falls back to the three-rung L1→L3→L4 structure — and the InMAP failure becomes a documented negative finding, informative in its own right about what the met-year mismatch costs.

Methodology

Method change (v1 → v2). The prior v1 L2 was a per-fold empirical bias corrector: OLS of pm25_obs on (1, p, p², basin, setting) fit against L1 predictions. It achieved RMSE 2.553 µg/m³ and passed Tessum — but the win was structural. Any reasonable regression on AQS labels will beat the upstream predictor; the corrector added no physics. The empirical bias corrector is preserved in rfaq.validation.l2_bias_corrected for diagnostic use but is no longer the gate headline.

v2 (this run). InMAP v1.9.0 steady-state on data/processed/emissions/emissions_baseline_2023_inmap{,_elevated}.shp, met inmap_2005.ncf, 3,000 iterations. Output sampled at AQS lat/lons via point-in-polygon spatial join. No fitting against AQS. Same emissions inventory as L1 — the only difference is full nonlinear chemistry vs. the ISRM linearization.

Met-year constraint. Only inmap_2005.ncf is on disk. L2 is 2023 emissions × 2005 met — the same structural assumption as L1 (ISRM met is also baked in). Comparing L1 vs. L2 therefore holds meteorology constant: the cleanest test for what the linearization costs.

Findings

RMSE 11.46 µg/m³ — worse than the simpler model by 5.4 units

5-fold RMSE 11.46 µg/m³ (SD 3.98 across folds) against the Tessum 2017 ≤5.0 baseline. This is +5.39 µg/m³ worse than L1 ISRM (6.08) — the full nonlinear simulation is less accurate than the linearization on this configuration. R² = −13 globally; four of five folds have R² ≤ −14.

Fold	Test sites	n	RMSE	MFB	R²
0	16	80	13.50	−0.521	−24.26
1	15	75	13.67	−0.399	−14.16
2	13	65	11.94	−0.641	−14.23
3	12	60	13.74	−0.382	−15.35
4	10	50	4.47	−0.629	−0.30

LA Basin: RMSE 29.0 µg/m³ — over-predicts while other regions under-predict, no single correction can fix both

The bias structure inverts by region. LA Basin over-predicts (MFB +0.793, RMSE 29.0). SJV / Sacramento / rest_ca all under-predict (MFB −0.32 to −0.72). No single bias correction can fix both signs simultaneously — consistent with regime-dependent chemistry as the dominant failure mode (NH3-limited vs VOC-limited SOA formation) that the 2005 met year cannot capture for 2023 California conditions.

Production chain redrawn: base model → satellite reference → multi-fidelity surrogate

The three-rung comparison as computed in this investigation:

Rung	Method	RMSE µg/m³
L1	ISRM × NEI matrix lookup	6.078
L2	InMAP v1.9.0 steady-state (2023 emissions × 2005 met)	11.463
L3	van Donkelaar V5.NA.05.02 sum-of-7 (Investigation 3-3)	4.343

L2 is a documented failed rung. The cascade proceeds L1→L3→L4. L2 appears in Investigation 3-4’s ladder table as context only, labeled “InMAP-direct (failed Tessum gate).”

Caveats

Met-year / emissions-year mismatch is unresolved. We cannot separate “2005 met is structurally wrong for 2023 emissions regimes” from “InMAP at this resolution genuinely fails Tessum on California 2023 regardless of met.” A 2023 met re-run (estimated 1–3 days given WRF 2023 met data availability) is the only excursion that could rehabilitate L2. Until it runs, the failure should not be interpreted as either model-structural or met-structural specifically.
Boylan-Russell pass is misleading. Global MFB −0.508 passes |MFB| ≤ 0.6 only because LA Basin over-prediction cancels SJV/Sacramento under-prediction. The regulatory gate is not informative when bias structure inverts spatially.
v1 bias corrector is not the fair comparison. The v1 OLS corrector hit RMSE 2.553 but was fit to AQS labels — any regression on AQS will pass Tessum trivially. The v2 switch to a forward simulation is what makes the gate honest; the failed result is the honest answer.
Annual-mean evaluation only. Same scope drop as Investigation 3-1. Daily L2 evaluation requires a daily-resolved emissions inventory.

Provenance

Item
run.py	`[internal artifact]`
results.json	`investigations/40_l2-bias-corrected-validation/latest/results.json`
analysis.md	`investigations/40_l2-bias-corrected-validation/latest/analysis.md`
InMAP output	`data/outputs/inmap_baseline_2023/output.shp` (sha256 48324084e6c0)
InMAP version	v1.9.0 steady-state; met year 2005; emissions year 2023
Upstream: Investigation 3-1 folds	sha256 c63ae2d281ce — L1 headline RMSE 6.078
Upstream: Investigation 3-3 L3	sha256 bf05f6f2fb4e — L3 headline RMSE 4.343
Last run	2026-05-01 (results sha256 20cdce2d11d4)