California Freight Cleanup → Investigation 3-2
Does the full InMAP chemistry simulation beat the simpler matrix lookup?
5-fold CV-RMSE 11.46 µg/m³ • +5.39 vs L1 • fails Tessum by 130% • LA Basin RMSE 29.0We hypothesized that running the full nonlinear InMAP simulation on 2023 California emissions would beat the simpler matrix lookup. It did not. The mismatch between 2023 emissions and the 2005 meteorology on disk likely explains most of the failure — but we cannot separate that from a genuine model limitation at this resolution. We document it honestly and skip this rung in the production chain.
The decision
Does running the full nonlinear InMAP simulation on 2023 California emissions close the gap to the Tessum 2017 baseline? If yes, InMAP-direct enters the MFGP as an honest L2 rung. If no, the cascade falls back to the three-rung L1→L3→L4 structure — and the InMAP failure becomes a documented negative finding, informative in its own right about what the met-year mismatch costs.
Methodology
Method change (v1 → v2). The prior v1 L2 was a per-fold
empirical bias corrector: OLS of pm25_obs on
(1, p, p², basin, setting) fit against L1 predictions. It achieved RMSE
2.553 µg/m³ and passed Tessum — but the win was structural. Any reasonable
regression on AQS labels will beat the upstream predictor; the corrector added
no physics. The empirical bias corrector is preserved in
rfaq.validation.l2_bias_corrected for diagnostic use
but is no longer the gate headline.
v2 (this run). InMAP v1.9.0 steady-state on
data/processed/emissions/emissions_baseline_2023_inmap{,_elevated}.shp,
met inmap_2005.ncf, 3,000 iterations. Output sampled at AQS
lat/lons via point-in-polygon spatial join. No fitting against AQS.
Same emissions inventory as L1 — the only difference is full nonlinear
chemistry vs. the ISRM linearization.
Met-year constraint. Only inmap_2005.ncf is on disk.
L2 is 2023 emissions × 2005 met — the same structural assumption as L1
(ISRM met is also baked in). Comparing L1 vs. L2 therefore holds meteorology
constant: the cleanest test for what the linearization costs.
Findings
RMSE 11.46 µg/m³ — worse than the simpler model by 5.4 units
5-fold RMSE 11.46 µg/m³ (SD 3.98 across folds) against the Tessum 2017 ≤5.0 baseline. This is +5.39 µg/m³ worse than L1 ISRM (6.08) — the full nonlinear simulation is less accurate than the linearization on this configuration. R² = −13 globally; four of five folds have R² ≤ −14.
| Fold | Test sites | n | RMSE | MFB | R² |
|---|---|---|---|---|---|
| 0 | 16 | 80 | 13.50 | −0.521 | −24.26 |
| 1 | 15 | 75 | 13.67 | −0.399 | −14.16 |
| 2 | 13 | 65 | 11.94 | −0.641 | −14.23 |
| 3 | 12 | 60 | 13.74 | −0.382 | −15.35 |
| 4 | 10 | 50 | 4.47 | −0.629 | −0.30 |
LA Basin: RMSE 29.0 µg/m³ — over-predicts while other regions under-predict, no single correction can fix both
The bias structure inverts by region. LA Basin over-predicts (MFB +0.793, RMSE 29.0). SJV / Sacramento / rest_ca all under-predict (MFB −0.32 to −0.72). No single bias correction can fix both signs simultaneously — consistent with regime-dependent chemistry as the dominant failure mode (NH3-limited vs VOC-limited SOA formation) that the 2005 met year cannot capture for 2023 California conditions.
Production chain redrawn: base model → satellite reference → multi-fidelity surrogate
The three-rung comparison as computed in this investigation:
| Rung | Method | RMSE µg/m³ |
|---|---|---|
| L1 | ISRM × NEI matrix lookup | 6.078 |
| L2 | InMAP v1.9.0 steady-state (2023 emissions × 2005 met) | 11.463 |
| L3 | van Donkelaar V5.NA.05.02 sum-of-7 (Investigation 3-3) | 4.343 |
L2 is a documented failed rung. The cascade proceeds L1→L3→L4. L2 appears in Investigation 3-4’s ladder table as context only, labeled “InMAP-direct (failed Tessum gate).”
Caveats
- Met-year / emissions-year mismatch is unresolved. We cannot separate “2005 met is structurally wrong for 2023 emissions regimes” from “InMAP at this resolution genuinely fails Tessum on California 2023 regardless of met.” A 2023 met re-run (estimated 1–3 days given WRF 2023 met data availability) is the only excursion that could rehabilitate L2. Until it runs, the failure should not be interpreted as either model-structural or met-structural specifically.
- Boylan-Russell pass is misleading. Global MFB −0.508 passes |MFB| ≤ 0.6 only because LA Basin over-prediction cancels SJV/Sacramento under-prediction. The regulatory gate is not informative when bias structure inverts spatially.
- v1 bias corrector is not the fair comparison. The v1 OLS corrector hit RMSE 2.553 but was fit to AQS labels — any regression on AQS will pass Tessum trivially. The v2 switch to a forward simulation is what makes the gate honest; the failed result is the honest answer.
- Annual-mean evaluation only. Same scope drop as Investigation 3-1. Daily L2 evaluation requires a daily-resolved emissions inventory.
Provenance
| Item | |
|---|---|
| run.py | [internal artifact] |
| results.json | investigations/40_l2-bias-corrected-validation/latest/results.json |
| analysis.md | investigations/40_l2-bias-corrected-validation/latest/analysis.md |
| InMAP output | data/outputs/inmap_baseline_2023/output.shp (sha256 48324084e6c0) |
| InMAP version | v1.9.0 steady-state; met year 2005; emissions year 2023 |
| Upstream: Investigation 3-1 folds | sha256 c63ae2d281ce — L1 headline RMSE 6.078 |
| Upstream: Investigation 3-3 L3 | sha256 bf05f6f2fb4e — L3 headline RMSE 4.343 |
| Last run | 2026-05-01 (results sha256 20cdce2d11d4) |