California Freight Cleanup → Investigation 3-3
Can a satellite-fused PM2.5 field pass the accuracy standard the forward simulations both missed?
5-fold CV-RMSE 4.34 µg/m³ • passes Tessum 2017 • R² = −0.785 • MFB +0.108 • no fittingThe van Donkelaar V5 product combines GEOS-Chem atmospheric modeling with satellite aerosol retrievals to produce a high-resolution PM2.5 field without fitting to ground monitors. We use it as the mid-rung reference: the ceiling for what emissions-driven simulations should aspire to, and the training anchor for the final corrected surrogate.
The decision
How close can the emissions-driven L1/L2 predictions land relative to a published, chemistry-aware, satellite-anchored PM2.5 field? L3 sets the achievable ceiling for a no-AQS-fitting rung. If L2 (InMAP-direct) had landed within ~1 µg/m³ of L3, the forward simulation would be doing its job. Instead, L2 was 7 µg/m³ worse — and L3 passes the Tessum gate that L2 could not. L3 becomes the mid-rung anchor for L4 MFGP in the post-2019 chain.
Why V5.NA.05.02 and not V6.GL.02.04
Two van Donkelaar products are on disk. V5.NA.05.02 (component fields) is used here for three reasons: (1) the seven components — BC, OM, NH4, NO3, SO4, SS, DUST — map directly onto the precursor groups the L1/L2 already track, keeping the accounting coherent; (2) V6 introduces a CNN bias correction trained on global ground monitors, making it an empirical correction on top of an empirical correction (we already evaluate an empirical correction at L2; burying another inside L3 muddies the ladder); (3) V5 sum-of-components is already trusted by Investigation 7-3, 13, and 26 for compositional work across the cascade, so reusing it keeps methodology coherent.
Methodology
No model is fit. rfaq.smoke.van_donkelaar_loader.species_at_points
performs nearest-neighbor lookup on the 0.01° regular grid via
searchsorted with left/right tie-break. NaN-masked cells (coastline,
water) contribute 0 to the sum rather than poisoning it — a conservative
choice that biases the sum down, making RMSE a lower bound on the true field
accuracy at coastal sites. The 5-fold readout reuses Investigation 3-1’s basin-stratified
site groupings for rung-comparability; since nothing is fit, this is not
held-out predictive evaluation in the predictive-modeling sense.
Findings
RMSE 4.34 µg/m³ — inside the published accuracy window for this class of model
5-fold mean RMSE 4.343 µg/m³ (SD 1.914), inside the Tessum 2017 InMAP CV-RMSE window of 3.0–5.0 µg/m³. Global in-sample RMSE 4.408, MFB +0.108 (slight over-prediction, within Boylan-Russell |MFB| ≤ 0.6), R² = −0.785.
| Fold | Test sites | n | RMSE | MFB | R² |
|---|---|---|---|---|---|
| 0 | 16 | 80 | 3.109 | +0.105 | −0.340 |
| 1 | 15 | 75 | 1.965 | +0.064 | +0.687 |
| 2 | 13 | 65 | 4.691 | +0.132 | −1.350 |
| 3 | 12 | 60 | 4.964 | +0.109 | −1.134 |
| 4 | 10 | 50 | 6.984 | +0.144 | −2.174 |
Three-rung comparison: only the satellite-fused product meets the accuracy standard
The three-rung comparison:
| Rung | Method | RMSE µg/m³ | Tessum gate |
|---|---|---|---|
| L1 | ISRM × NEI matrix lookup | 6.078 | Fail |
| L2 | InMAP-direct steady-state | 11.463 | Fail |
| L3 | van Donkelaar V5.NA.05.02 sum-of-7 | 4.343 | Pass |
The satellite-fused product passes by fusing GEOS-Chem with ground observations via geographically weighted regression (GWR), not formal data assimilation (no Kalman filter or variational scheme). This is the expected ordering: a fused product with access to the AQS network closes the gap that pure forward modeling cannot.
Slight systematic over-prediction (+0.108 bias) — consistent with what’s known about this product in the western US
The +0.108 MFB is consistent with the V5.NA.05.02 positive bias documented in van Donkelaar et al. 2021 ACP for the western US: small over-prediction from PM2.5 component over-prediction in the GEOS-Chem prior, only partially corrected by the GWR fusion to ground stations. This bias is absorbed by the L4 MFGP correction step.
R² = −0.79 despite passing the RMSE standard: gets the level right, not the site-to-site pattern
The field passes the absolute-error baseline but explains less variance than predicting the AQS global mean. L3 gets the statewide level approximately right (MFB +0.108) — it does not resolve site-to-site variation. This is not a flaw unique to V5; it is the honest cost of no-AQS-fitting. A “passing” L3 should not be read as “satellite product matches AQS site-by-site.”
Caveats
- Reference rung, not a model. The 4.34 µg/m³ headline is not held-out predictive evaluation. The fold readout exists only for rung-comparability. Citing fold variance (SD 1.91) as evidence of “good generalization” would be misleading.
- V5 sum-of-7 ≠ V6 CNN-PM25. V6 would likely score better against AQS (estimated 2.5–3.5 µg/m³ RMSE) but for reasons related to the CNN’s access to AQS-family monitors in fitting — defeating the comparator’s purpose as a no-fitting reference.
- NaN-as-zero biases the sum down at coastal sites. The 4.34 µg/m³ RMSE is a lower bound; coastal/masked-cell sites may have artificially better apparent RMSE.
- Annual mean only. V5.NA.05.02 publishes annual files. Daily L3 evaluation requires V5/V6 daily products + Phase C MFGP work.
Provenance
| Item | |
|---|---|
| run.py | [internal artifact] |
| results.json | investigations/41_l3-vandonkelaar-reference/latest/results.json |
| Method label | vandonkelaar_v5na0502_sum7components |
| Components summed | BC, OM, NH4, NO3, SO4, SS, DUST (7 fields) |
| Spatial sampler | nearest-neighbor searchsorted on 0.01° grid |
| Upstream: Investigation 3-1 folds | sha256 c63ae2d281ce |
| Upstream: Investigation 3-2 L2 RMSE | 11.463 µg/m³ (sha256 20cdce2d11d4) |
| Last run | 2026-05-01 (results sha256 a368ef9c6ed9) |