California Freight Cleanup → Investigation 3-7
Does the PM2.5 cascade hold at daily resolution?
Global daily RMSE 2.85 µg/m³ (3.12× annual) • Wildfire-window 5.76 µg/m³Every prior validation rung was evaluated at annual-mean PM2.5, which is correct for the annual-exposure portfolio decision. This investigation tests the daily extension to determine which operational use cases—wildfire alerts, AQI nowcasting, school-day tracking—are and are not licensed by the existing cascade. The honest answer: annual policy analysis is fully licensed; daily deployment during wildfire events is not.
The decision
We produce annual-mean PM2.5 fields for portfolio mortality estimation — the right cadence for 5–30-year investment decisions. CARB and CEC evaluators periodically ask whether the same architecture could support operational use cases: real-time fire-smoke alerts, daily AQI forecasts, exposure tracking for disadvantaged community health triggers.
Before making any such claim, we needed an honest test. Investigation 3-7 asks a specific, narrow question: does the fused PM2.5 reference product — the only tier with daily inputs available — meet operational quality standards at daily resolution? The answer sets clear boundaries on what the cascade can and cannot honestly claim beyond its annual-policy scope.
Methodology
FAQSD produces daily PM2.5 estimates at census-tract centroids by Bayesian downscaling of CMAQ output constrained by AQS monitors (Berrocal, Gelfand & Holland 2010) — the same fused values serving as the L5 reference rung in the annual validation ladder.
For each of the 66 California AQS sites in the validation panel (Investigation 3-1 5-fold spatial split), we identify the nearest FAQSD tract centroid within 10 km and match its daily value against AQS daily PM2.5 (parameter 88101) — approximately 22,600 site-day pairs per calendar year. RMSE, mean fractional bias (MFB), and R² are computed globally and stratified by year (2019–2022), season (DJF/MAM/JJA/SON), month, and a calendar-coarse wildfire window (Aug–Oct 2020 and Jul–Sep 2021).
Only L5 (FAQSD) is evaluated at daily cadence. Daily L1 would be the annual-constant repeated 365 times and is not informative. Daily L2/L3/L4 require daily CMAQ outputs not on disk and a daily MFGP not yet built. Investigation 3-9 subsequently attempted a daily L4 MFGP (see Investigation 3-9), establishing that CMAQ EQUATES is not a viable daily L1 prior for this purpose.
Findings
Day-by-day accuracy is 3× worse than annual-mean — expected, not alarming
Across 90,524 site-days (2019–2022), daily L5 RMSE is 2.85 µg/m³ against the annual L5 anchor of 0.91 µg/m³ (Investigation 3-5): a 3.12× ratio, within the expected 3–5× band from day-to-day noise cancellation on annual averaging. Daily R² = 0.924; MFB = +0.076 (small positive bias, consistent across years). The cascade is not broken at daily cadence — it behaves as a correctly-calibrated fused product at this scale.
2019 is the cleanest year (RMSE 1.74 µg/m³); 2020 is the worst (4.30)
Year-to-year variation tracks wildfire severity directly. 2019 had no major statewide smoke event. 2020 was California’s record fire season; FAQSD, even with direct AQS access, was overwhelmed by the transient spatial structure of smoke plumes. 2019 is the reference year for the companion Investigation 3-9 daily L4 MFGP experiment.
Wildfire windows: RMSE 5.76 µg/m³—not licensed for operational use
Within the calendar-coarse wildfire window (Aug–Oct 2020 and Jul–Sep 2021), daily RMSE rises to 5.76 µg/m³ versus 2.18 µg/m³ outside those windows — a 2.6× degradation. August is the single worst calendar month (5.75 µg/m³). FAQSD’s AQS-informed Bayesian downscaling does not resolve transient smoke-plume spatial structure at daily resolution. Any the cascade claim about operational wildfire-event response is not licensed by this validation.
Accuracy degrades in summer and fall when wildfire smoke is active; winter and spring are clean
JJA daily RMSE = 3.91 µg/m³; SON = 3.40; DJF = 2.22; MAM = 1.27. The pattern tracks wildfire smoke seasonality directly — not mixing-layer or chemical process variation. A daily cascade for non-fire seasons would perform substantially better, but the high-stakes operational window (summer smoke events) is precisely where performance is worst.
Caveats
- L1/L4 absence is principled, not provisional. Daily L1 would be a constant repeated 365 times and uninformative. Daily L4 requires a daily MFGP that has not been built. Investigation 3-9 quantifies what happens when an annual-CMAQ L1 prior is used for daily MFGP: it performs worse than FAQSD-direct, confirming the scope boundary.
- AQS leakage. FAQSD is fit on AQS observations. Spatial CV bounds (does not eliminate) leakage at daily cadence. Daily RMSE should be read as “what the fused product says at this site on this day, knowing neighbor AQS but not this one”, not as held-out predictive accuracy for an unfitted model.
- 3.12× daily/annual ratio is structurally expected, not a failure. Annual averaging cancels FAQSD-vs-AQS day-to-day noise. Comparing daily to annual without this context would misrepresent cascade capability.
- Wildfire window is calendar-coarse. A site-date tagging approach via HMS smoke plume overpass or NOAA fire-weather flag would sharpen the smoke-event signal; the crude calendar windows used here are a first-cut probe.
- Boylan-Russell MFB gate is an annual-scale baseline. Applied at daily cadence it is a coarse upper bound only; failing it during wildfire windows (global MFB = +0.076, within gate) is not a defect finding.
Provenance
| Item | SHA-256 (12-char) | |
|---|---|---|
| results.json | 0fd223c77070 |
|
| analysis.md | — | |
| scenario.md | — | |
| Upstream: Investigation 3-5 (annual L5 anchor) | investigations/43_l5-faqsd-reference/latest/results.json | 278e28fe52db |
| Upstream: Investigation 3-1 (5-fold CV splits) | investigations/39_aqs-held-out-validation/latest/results.json | c63ae2d281ce |
| Run timestamp | 2026-05-02T13:07:07 90,524 site-days 4 years (2019–2022) | |