California Freight Cleanup → Investigation 3-3

Can a satellite-fused PM_2.5 field pass the accuracy standard the forward simulations both missed?

5-fold CV-RMSE 4.34 µg/m³ • passes Tessum 2017 • R² = −0.785 • MFB +0.108 • no fitting

The van Donkelaar V5 product combines GEOS-Chem atmospheric modeling with satellite aerosol retrievals to produce a high-resolution PM_2.5 field without fitting to ground monitors. We use it as the mid-rung reference: the ceiling for what emissions-driven simulations should aspire to, and the training anchor for the final corrected surrogate.

The decision

How close can the emissions-driven L1/L2 predictions land relative to a published, chemistry-aware, satellite-anchored PM_2.5 field? L3 sets the achievable ceiling for a no-AQS-fitting rung. If L2 (InMAP-direct) had landed within ~1 µg/m³ of L3, the forward simulation would be doing its job. Instead, L2 was 7 µg/m³ worse — and L3 passes the Tessum gate that L2 could not. L3 becomes the mid-rung anchor for L4 MFGP in the post-2019 chain.

Why V5.NA.05.02 and not V6.GL.02.04

Two van Donkelaar products are on disk. V5.NA.05.02 (component fields) is used here for three reasons: (1) the seven components — BC, OM, NH4, NO3, SO4, SS, DUST — map directly onto the precursor groups the L1/L2 already track, keeping the accounting coherent; (2) V6 introduces a CNN bias correction trained on global ground monitors, making it an empirical correction on top of an empirical correction (we already evaluate an empirical correction at L2; burying another inside L3 muddies the ladder); (3) V5 sum-of-components is already trusted by Investigation 7-3, 13, and 26 for compositional work across the cascade, so reusing it keeps methodology coherent.

Methodology

No model is fit. rfaq.smoke.van_donkelaar_loader.species_at_points performs nearest-neighbor lookup on the 0.01° regular grid via searchsorted with left/right tie-break. NaN-masked cells (coastline, water) contribute 0 to the sum rather than poisoning it — a conservative choice that biases the sum down, making RMSE a lower bound on the true field accuracy at coastal sites. The 5-fold readout reuses Investigation 3-1’s basin-stratified site groupings for rung-comparability; since nothing is fit, this is not held-out predictive evaluation in the predictive-modeling sense.

Findings

RMSE 4.34 µg/m³ — inside the published accuracy window for this class of model

5-fold mean RMSE 4.343 µg/m³ (SD 1.914), inside the Tessum 2017 InMAP CV-RMSE window of 3.0–5.0 µg/m³. Global in-sample RMSE 4.408, MFB +0.108 (slight over-prediction, within Boylan-Russell |MFB| ≤ 0.6), R² = −0.785.

Fold	Test sites	n	RMSE	MFB	R²
0	16	80	3.109	+0.105	−0.340
1	15	75	1.965	+0.064	+0.687
2	13	65	4.691	+0.132	−1.350
3	12	60	4.964	+0.109	−1.134
4	10	50	6.984	+0.144	−2.174

Three-rung comparison: only the satellite-fused product meets the accuracy standard

The three-rung comparison:

Rung	Method	RMSE µg/m³	Tessum gate
L1	ISRM × NEI matrix lookup	6.078	Fail
L2	InMAP-direct steady-state	11.463	Fail
L3	van Donkelaar V5.NA.05.02 sum-of-7	4.343	Pass

The satellite-fused product passes by fusing GEOS-Chem with ground observations via geographically weighted regression (GWR), not formal data assimilation (no Kalman filter or variational scheme). This is the expected ordering: a fused product with access to the AQS network closes the gap that pure forward modeling cannot.

Slight systematic over-prediction (+0.108 bias) — consistent with what’s known about this product in the western US

The +0.108 MFB is consistent with the V5.NA.05.02 positive bias documented in van Donkelaar et al. 2021 ACP for the western US: small over-prediction from PM_2.5 component over-prediction in the GEOS-Chem prior, only partially corrected by the GWR fusion to ground stations. This bias is absorbed by the L4 MFGP correction step.

R² = −0.79 despite passing the RMSE standard: gets the level right, not the site-to-site pattern

The field passes the absolute-error baseline but explains less variance than predicting the AQS global mean. L3 gets the statewide level approximately right (MFB +0.108) — it does not resolve site-to-site variation. This is not a flaw unique to V5; it is the honest cost of no-AQS-fitting. A “passing” L3 should not be read as “satellite product matches AQS site-by-site.”

Caveats

Reference rung, not a model. The 4.34 µg/m³ headline is not held-out predictive evaluation. The fold readout exists only for rung-comparability. Citing fold variance (SD 1.91) as evidence of “good generalization” would be misleading.
V5 sum-of-7 ≠ V6 CNN-PM25. V6 would likely score better against AQS (estimated 2.5–3.5 µg/m³ RMSE) but for reasons related to the CNN’s access to AQS-family monitors in fitting — defeating the comparator’s purpose as a no-fitting reference.
NaN-as-zero biases the sum down at coastal sites. The 4.34 µg/m³ RMSE is a lower bound; coastal/masked-cell sites may have artificially better apparent RMSE.
Annual mean only. V5.NA.05.02 publishes annual files. Daily L3 evaluation requires V5/V6 daily products + Phase C MFGP work.

Provenance

Item
run.py	`[internal artifact]`
results.json	`investigations/41_l3-vandonkelaar-reference/latest/results.json`
Method label	`vandonkelaar_v5na0502_sum7components`
Components summed	BC, OM, NH4, NO3, SO4, SS, DUST (7 fields)
Spatial sampler	nearest-neighbor searchsorted on 0.01° grid
Upstream: Investigation 3-1 folds	sha256 c63ae2d281ce
Upstream: Investigation 3-2 L2 RMSE	11.463 µg/m³ (sha256 20cdce2d11d4)
Last run	2026-05-01 (results sha256 a368ef9c6ed9)

Can a satellite-fused PM2.5 field pass the accuracy standard the forward simulations both missed?