Skip to main content

California Freight Cleanup → Investigation 8-4

Does fusing the transport model with monitor readings beat either source alone?

Fusion RMSE 1.946 µg/m³ • 4.4% better than model • 6.2% better than kriging • 24.8% mean uncertainty reduction

The transport model covers all 21,164 grid cells but has systematic biases. The 112 EPA air monitors are direct measurements but leave most of California unsampled. We combined both using Bayesian updating, validated against monitors held out of the fitting, and quantified how much each region’s uncertainty shrinks. The combined surface beats either source alone — but only after rescaling the model to match the monitor network first.

Should the California Freight Cleanup cascade use ISRM-alone, kriging-alone, or a fused model + monitor surface as the PM2.5 prior for health-impact calculations? The fused surface outperforms both single-source options once the model is AQS-calibrated. The 4.4% RMSE improvement motivates Bayesian updating as a standard step in any policy-grade ISRM workflow.

Stage 1: Empirical variogram. AQS monitors are treated as a spatial random field. All 6,216 pairwise distances and semivariances (γ = 0.5 (zi−zj)²) are computed from haversine distances. An exponential variogram model is fit by weighted L-BFGS-B minimization over 20 binned lag classes. Result: nugget 0.0 (measurement error treated as negligible), sill 7.91 (µg/m³)², range 149.2 km.

Stage 2: Ordinary kriging. The fitted variogram interpolates 112 AQS annual-mean 2023 observations to all 21,164 ISRM grid cells. n = 112 monitors. The kriging system is solved once (Kaug factorized) and applied in 1,000-cell chunks. Output is a kriging mean and kriging variance at each cell; kriging variance depends only on variogram structure and monitor geometry, not on ISRM.

Stage 3: Gaussian Bayesian fusion. Model and kriging surfaces are combined via the precision-weighted conjugate update:

The fusion is analytic — no MCMC, no iteration. Everything hinges on the 10% CV assumption for ISRM. That is the most consequential methodological choice in this investigation.

Stage 4: LOO cross-validation (note: spatially optimistic — held-out sites are predicted using neighbors within the 149.2 km variogram range; treat the LOO RMSE as a best-case lower bound on prediction error at genuinely unmonitored locations). All 112 monitors are held out one at a time. For each left-out monitor, the pipeline re-runs ordinary kriging on 111 monitors and applies Bayesian fusion at the left-out cell. Predicted vs. observed values generate RMSE and R² for three competing fields: ISRM-only, kriging-only, and fusion.

Stage 5: Health impact comparison. Di et al. 2017 CRF (β = 0.00705) applied to all three PM2.5 fields against the age-65+ mortality burden, quantifying the decision stake of field choice.

Combined beats either source alone: 4.4% better than the model, 6.2% better than monitor interpolation

After AQS-anchored rescaling, the fused surface achieves LOO RMSE 1.946 µg/m³ vs. model-only 2.036 (4.4% improvement) and kriging-only 2.073 (6.2% improvement). That is what Bayesian fusion is supposed to deliver: combined is better than either source alone. Pre-rescale, fusion was worse than kriging — the ISRM prior was biased enough to drag the posterior away from the monitor evidence. AQS-anchored rescaling fixes that at source. Post-rescale, the result is clean.

FieldRMSE (µg/m³)Mean std (µg/m³)
Model only (ISRM)2.0360.4270.805
Kriging only (AQS)2.0730.4061.619
Fusion (model + AQS)1.9460.4770.678

The transport model carries most of the spatial structure; monitors add a targeted 15.7% uncertainty reduction on top

Model std 0.805 → fusion std 0.678: the ISRM’s spatial structure reduces std by 58.1% relative to kriging-alone (0.678 vs. 1.619). AQS monitors reduce std by 15.7% relative to model-alone (0.678 vs. 0.805). These percentages measure reduction against different baselines and do not sum to 100% — they are not shares of a single precision budget, but independent marginal contributions of each source. Pre-rescale, this ranking was inverted — monitors were the dominant source because the biased ISRM contributed little. Post-rescale, the model earns its weight.

Uncertainty drops 24.8% on average statewide; cells near monitors see up to 98% reduction

Mean posterior variance across all 21,164 cells is 24.8% lower than the model prior. Maximum reduction (near monitors) reaches 98.1%; minimum (far from monitors in rural Sierra and Central Valley) is 0.0%. Regional breakdown:

RegionFusion mean (µg/m³)Fusion stdUncertainty reductionDiscrepancy
Bay Area6.830.60411.8%−0.06
LA Basin9.360.77117.7%−0.22
rest-CA6.860.5999.9%+0.10
Sacramento7.880.65416.5%−0.01
SJV8.310.67612.9%+0.47

Population-level mortality barely moves: fusion adds precision at the cell level, not a new death count

Di et al. 2017 CRF applied to model-only (1,826 deaths) vs. fusion (1,814 deaths): −12 deaths (−0.7%). That is sub-MC noise (σ ≥ 30 deaths/yr) and should not be over-interpreted. The pre-rescale delta was −611 deaths (−24.4%) — an artifact of ISRM over-prediction that the AQS-anchored rescale corrected at source. The honest post-rescale finding: fusion adds cell-level precision, not a population-level mortality correction.

Item
run.py[internal artifact]
results.jsoninvestigations/38_sensor-fusion-bayesian/latest/results.json (sha256 f9a7e68e410c)
analysis.mdinvestigations/38_sensor-fusion-bayesian/latest/analysis.md
scenario.mdinvestigations/38_sensor-fusion-bayesian/latest/scenario.md
AQS PM2.5data/aqs/california_pm25_annual_2023.parquet — 112 FRM/FEM monitors, 2023
ISRM PM2.5data/processed/isrm_sector_pm25_aqs_anchored.npz — B′ AQS-anchored rescale
Grid covariatesdata/processed/grid_covariates_real.parquet
Mortality ratesdata/raw/evaldata_v1.6.1/mortalityRates2013.shp (2013 vintage)
CRFDi et al. 2017 NEJM 376:2513–2522 (β = 0.00705, ≥65)
Last run2026-05-01 (results sha256 f9a7e68e410c)