Skip to main content

California Freight Cleanup → Investigation 3-9

Can a daily L4 MFGP close the cascade’s operational gap?

Pooled L4 RMSE 3.67 µg/m³ vs L5 (FAQSD-direct) 1.80 — L4 is 104% worse

Investigation 3-7 flagged the daily L4 MFGP as the path to closing the cascade’s daily operational gap. Investigation 3-9 runs it: two CMAQ cadences tested, independently. Both fail. The failure mode sits at the data-design layer, not the modeling layer—EQUATES’ frozen 2014 emissions cannot supply day-specific signal in a 2019 field.

Wildfire smoke alert systems, short-term exposure assessment for vulnerable populations, and EPA AQI nowcasting all require daily PM2.5 estimates. Investigation 3-7 set the operational ceiling: FAQSD L5 direct at 2.85 µg/m³ daily RMSE, degrading to 5.76 µg/m³ during wildfire windows.

A daily L4 MFGP—a chemistry prior to correct the FAQSD field and cut wildfire-window error—is the obvious next step. Investigation 3-9 tests whether CMAQ EQUATES, the only daily chemistry product available for California, can serve as the L1 prior in the Le Gratiet hierarchical GP chain at daily cadence.

Two versions of a per-day 2-level Le Gratiet MFGP, fit independently for each of 76 sample days (52 weekly Wednesdays plus all days in the Kincade Fire window and subsequent smoke residue period). The L1 prior tested in each:

VersionL1 priorRationale
v1 (annual L1) CMAQ EQUATES annual-mean PM2.5 per site (constant across days) Test whether a chemistry-informed constant improves over FAQSD-direct
v2 (daily L1) CMAQ EQUATES daily PM2.5 per site (day-varying) Test whether daily chemistry variation improves over annual constant

For each sample day, ρ1 is estimated via the Kennedy-O’Hagan ratio estimator. A spatial GP (Matérn + WhiteKernel + ConstantKernel) is fit on the L3 − ρ1 × L1 residual at 80% of California AQS sites, with 20% held out for RMSE evaluation. The pooled L4 RMSE across all 76 days’ test observations is compared to the L5 (FAQSD-direct) baseline computed on the same site-days.

Bar chart comparing L5 RMSE (1.80 µg/m³), v1 annual-L1 L4 RMSE (3.45 µg/m³), and v2 daily-L1 L4 RMSE (3.67 µg/m³) pooled across 76 sample days in California 2019. Both L4 variants are substantially worse than L5.
Pooled RMSE comparison across 896 test observations (76 days, 20% hold-out). Both L4 MFGP variants perform substantially worse than FAQSD-direct (L5). Switching from annual to daily CMAQ L1 (v1 → v2) slightly worsens L4 rather than improving it—confirming the failure is structural, not a cadence mismatch.

Adding a chemistry layer made accuracy worse, not better — by a factor of two

v2 (daily CMAQ L1) pooled RMSE = 3.668 µg/m³ vs L5 = 1.799 µg/m³. The L4 correction layer amplifies error rather than reducing it. Mean ρ1 (CMAQ→FAQSD coupling) = 1.420 across 76 days, σ = 0.425—CMAQ’s daily variation is not correlated with actual day-to-day AQS PM2.5 variation.

v1 → v2: switching to daily CMAQ makes L4 slightly worse, not better

v1 (annual L1) pooled RMSE = 3.451; v2 (daily L1) = 3.668—a +6.3% degradation, consistent across all sub-aggregates (wildfire window, out-of-window, all four seasons). More information does not help when the daily CMAQ variation is noise from an emissions-vintage mismatch, not signal from 2019 atmospheric chemistry.

The chemistry model uses 2014 emissions to simulate 2019 conditions — that mismatch is the root cause

CMAQ EQUATES was designed for cross-year annual trend comparison using a fixed 2014 emissions baseline. Its day-to-day variation reflects 2014 meteorology applied to 2014 emissions—not 2019 meteorology or 2019 activity levels. The GP residual fitted on (AQS − ρ × CMAQdaily) is fitting vintage-mismatch noise, not the spatial structure of PM2.5. This is a structural data-design property of EQUATES, not a retrieval failure.

L4 adds error in both regimes — no operating condition where it wins

Wildfire-window L4 RMSE (v2): 4.420 µg/m³ vs L5 wildfire: 2.220. Out-of-window: 3.124 vs 1.486. No fire year, no season, no sub-aggregate where the CMAQ EQUATES-anchored daily L4 outperforms FAQSD-direct.

ItemSHA-256 (12-char)
results.jsonf8c4415fd366
analysis.md
scenario.md
Upstream: Investigation 3-4 (annual L4 anchor) investigations/42_l4-mfgp-corrected/latest/results.json b89d8204eb15
Upstream: Investigation 3-7 (daily L5 anchor) investigations/48_daily-cadence-validation/latest/results.json 0fd223c77070
Data: CMAQ EQUATES CA sites 2019 data/raw/cmaq_equates/ca_site_cmaq_pm25_2019_daily.csv b62eef10c906
Run timestamp 2026-05-03T22:24:55   76 sample days   896 test observations