Skip to main content

California Freight Cleanup → Investigation 3-2

Does the full InMAP chemistry simulation beat the simpler matrix lookup?

5-fold CV-RMSE 11.46 µg/m³ • +5.39 vs L1 • fails Tessum by 130% • LA Basin RMSE 29.0

We hypothesized that running the full nonlinear InMAP simulation on 2023 California emissions would beat the simpler matrix lookup. It did not. The mismatch between 2023 emissions and the 2005 meteorology on disk likely explains most of the failure — but we cannot separate that from a genuine model limitation at this resolution. We document it honestly and skip this rung in the production chain.

Does running the full nonlinear InMAP simulation on 2023 California emissions close the gap to the Tessum 2017 baseline? If yes, InMAP-direct enters the MFGP as an honest L2 rung. If no, the cascade falls back to the three-rung L1→L3→L4 structure — and the InMAP failure becomes a documented negative finding, informative in its own right about what the met-year mismatch costs.

Method change (v1 → v2). The prior v1 L2 was a per-fold empirical bias corrector: OLS of pm25_obs on (1, p, p², basin, setting) fit against L1 predictions. It achieved RMSE 2.553 µg/m³ and passed Tessum — but the win was structural. Any reasonable regression on AQS labels will beat the upstream predictor; the corrector added no physics. The empirical bias corrector is preserved in rfaq.validation.l2_bias_corrected for diagnostic use but is no longer the gate headline.

v2 (this run). InMAP v1.9.0 steady-state on data/processed/emissions/emissions_baseline_2023_inmap{,_elevated}.shp, met inmap_2005.ncf, 3,000 iterations. Output sampled at AQS lat/lons via point-in-polygon spatial join. No fitting against AQS. Same emissions inventory as L1 — the only difference is full nonlinear chemistry vs. the ISRM linearization.

Met-year constraint. Only inmap_2005.ncf is on disk. L2 is 2023 emissions × 2005 met — the same structural assumption as L1 (ISRM met is also baked in). Comparing L1 vs. L2 therefore holds meteorology constant: the cleanest test for what the linearization costs.

RMSE 11.46 µg/m³ — worse than the simpler model by 5.4 units

5-fold RMSE 11.46 µg/m³ (SD 3.98 across folds) against the Tessum 2017 ≤5.0 baseline. This is +5.39 µg/m³ worse than L1 ISRM (6.08) — the full nonlinear simulation is less accurate than the linearization on this configuration. R² = −13 globally; four of five folds have R² ≤ −14.

FoldTest sitesnRMSEMFB
0168013.50−0.521−24.26
1157513.67−0.399−14.16
2136511.94−0.641−14.23
3126013.74−0.382−15.35
410504.47−0.629−0.30

LA Basin: RMSE 29.0 µg/m³ — over-predicts while other regions under-predict, no single correction can fix both

The bias structure inverts by region. LA Basin over-predicts (MFB +0.793, RMSE 29.0). SJV / Sacramento / rest_ca all under-predict (MFB −0.32 to −0.72). No single bias correction can fix both signs simultaneously — consistent with regime-dependent chemistry as the dominant failure mode (NH3-limited vs VOC-limited SOA formation) that the 2005 met year cannot capture for 2023 California conditions.

Production chain redrawn: base model → satellite reference → multi-fidelity surrogate

The three-rung comparison as computed in this investigation:

RungMethodRMSE µg/m³
L1ISRM × NEI matrix lookup6.078
L2InMAP v1.9.0 steady-state (2023 emissions × 2005 met)11.463
L3van Donkelaar V5.NA.05.02 sum-of-7 (Investigation 3-3)4.343

L2 is a documented failed rung. The cascade proceeds L1→L3→L4. L2 appears in Investigation 3-4’s ladder table as context only, labeled “InMAP-direct (failed Tessum gate).”

Item
run.py[internal artifact]
results.jsoninvestigations/40_l2-bias-corrected-validation/latest/results.json
analysis.mdinvestigations/40_l2-bias-corrected-validation/latest/analysis.md
InMAP outputdata/outputs/inmap_baseline_2023/output.shp (sha256 48324084e6c0)
InMAP versionv1.9.0 steady-state; met year 2005; emissions year 2023
Upstream: Investigation 3-1 foldssha256 c63ae2d281ce — L1 headline RMSE 6.078
Upstream: Investigation 3-3 L3sha256 bf05f6f2fb4e — L3 headline RMSE 4.343
Last run2026-05-01 (results sha256 20cdce2d11d4)