California Freight Cleanup → Investigation 8-4

Does fusing the transport model with monitor readings beat either source alone?

Fusion RMSE 1.946 µg/m³ • 4.4% better than model • 6.2% better than kriging • 24.8% mean uncertainty reduction

The transport model covers all 21,164 grid cells but has systematic biases. The 112 EPA air monitors are direct measurements but leave most of California unsampled. We combined both using Bayesian updating, validated against monitors held out of the fitting, and quantified how much each region’s uncertainty shrinks. The combined surface beats either source alone — but only after rescaling the model to match the monitor network first.

The decision

Should the California Freight Cleanup cascade use ISRM-alone, kriging-alone, or a fused model + monitor surface as the PM_2.5 prior for health-impact calculations? The fused surface outperforms both single-source options once the model is AQS-calibrated. The 4.4% RMSE improvement motivates Bayesian updating as a standard step in any policy-grade ISRM workflow.

Methodology

Stage 1: Empirical variogram. AQS monitors are treated as a spatial random field. All 6,216 pairwise distances and semivariances (γ = 0.5 (z_i−z_j)²) are computed from haversine distances. An exponential variogram model is fit by weighted L-BFGS-B minimization over 20 binned lag classes. Result: nugget 0.0 (measurement error treated as negligible), sill 7.91 (µg/m³)², range 149.2 km.

Stage 2: Ordinary kriging. The fitted variogram interpolates 112 AQS annual-mean 2023 observations to all 21,164 ISRM grid cells. n = 112 monitors. The kriging system is solved once (K_aug factorized) and applied in 1,000-cell chunks. Output is a kriging mean and kriging variance at each cell; kriging variance depends only on variogram structure and monitor geometry, not on ISRM.

Stage 3: Gaussian Bayesian fusion. Model and kriging surfaces are combined via the precision-weighted conjugate update:

Prior: ISRM total PM_2.5, variance = (0.10 × μ_model)² (10% CV assumed)
Likelihood: kriging mean and kriging variance from Stage 2
Posterior (analytic, closed form): σ²_post = 1 / (1/σ²_model + 1/σ²_krig)

The fusion is analytic — no MCMC, no iteration. Everything hinges on the 10% CV assumption for ISRM. That is the most consequential methodological choice in this investigation.

Stage 4: LOO cross-validation (note: spatially optimistic — held-out sites are predicted using neighbors within the 149.2 km variogram range; treat the LOO RMSE as a best-case lower bound on prediction error at genuinely unmonitored locations). All 112 monitors are held out one at a time. For each left-out monitor, the pipeline re-runs ordinary kriging on 111 monitors and applies Bayesian fusion at the left-out cell. Predicted vs. observed values generate RMSE and R² for three competing fields: ISRM-only, kriging-only, and fusion.

Stage 5: Health impact comparison. Di et al. 2017 CRF (β = 0.00705) applied to all three PM_2.5 fields against the age-65+ mortality burden, quantifying the decision stake of field choice.

Findings

Combined beats either source alone: 4.4% better than the model, 6.2% better than monitor interpolation

After AQS-anchored rescaling, the fused surface achieves LOO RMSE 1.946 µg/m³ vs. model-only 2.036 (4.4% improvement) and kriging-only 2.073 (6.2% improvement). That is what Bayesian fusion is supposed to deliver: combined is better than either source alone. Pre-rescale, fusion was worse than kriging — the ISRM prior was biased enough to drag the posterior away from the monitor evidence. AQS-anchored rescaling fixes that at source. Post-rescale, the result is clean.

Field	RMSE (µg/m³)	R²	Mean std (µg/m³)
Model only (ISRM)	2.036	0.427	0.805
Kriging only (AQS)	2.073	0.406	1.619
Fusion (model + AQS)	1.946	0.477	0.678

The transport model carries most of the spatial structure; monitors add a targeted 15.7% uncertainty reduction on top

Model std 0.805 → fusion std 0.678: the ISRM’s spatial structure reduces std by 58.1% relative to kriging-alone (0.678 vs. 1.619). AQS monitors reduce std by 15.7% relative to model-alone (0.678 vs. 0.805). These percentages measure reduction against different baselines and do not sum to 100% — they are not shares of a single precision budget, but independent marginal contributions of each source. Pre-rescale, this ranking was inverted — monitors were the dominant source because the biased ISRM contributed little. Post-rescale, the model earns its weight.

Uncertainty drops 24.8% on average statewide; cells near monitors see up to 98% reduction

Mean posterior variance across all 21,164 cells is 24.8% lower than the model prior. Maximum reduction (near monitors) reaches 98.1%; minimum (far from monitors in rural Sierra and Central Valley) is 0.0%. Regional breakdown:

Region	Fusion mean (µg/m³)	Fusion std	Uncertainty reduction	Discrepancy
Bay Area	6.83	0.604	11.8%	−0.06
LA Basin	9.36	0.771	17.7%	−0.22
rest-CA	6.86	0.599	9.9%	+0.10
Sacramento	7.88	0.654	16.5%	−0.01
SJV	8.31	0.676	12.9%	+0.47

Population-level mortality barely moves: fusion adds precision at the cell level, not a new death count

Di et al. 2017 CRF applied to model-only (1,826 deaths) vs. fusion (1,814 deaths): −12 deaths (−0.7%). That is sub-MC noise (σ ≥ 30 deaths/yr) and should not be over-interpreted. The pre-rescale delta was −611 deaths (−24.4%) — an artifact of ISRM over-prediction that the AQS-anchored rescale corrected at source. The honest post-rescale finding: fusion adds cell-level precision, not a population-level mortality correction.

Caveats

10% CV ISRM prior is assumed, not fitted. This single parameter controls the effective weight between the ISRM prior and the kriging likelihood. If real ISRM CV is larger (e.g., 20–30% in wildfire-affected SJV years), the fusion would weight monitors more heavily. The 10% assumption is consistent with the post-rescale model performance but was not derived from the data.
Zero nugget treats AQS monitors as error-free. Annual-mean AQS FRM/FEM uncertainty is typically ±1 µg/m³, small relative to the 7.91 sill, so this is defensible. A non-zero nugget would slightly smooth the kriging surface near monitors.
Ordinary kriging, not co-kriging or regression kriging. The ISRM model field does not enter the kriging step as a covariate. Regression-kriging (kriging of ISRM residuals) would let the model’s spatial structure guide interpolation between monitors and could reduce kriging variance in under-monitored regions.
Variogram fit from 2023 only. The 149.2 km range characterizes 2023’s spatial autocorrelation structure. Wildfire-dominated years can generate correlation at very different ranges. A multi-year ensemble variogram would be more robust.
Do not over-claim R² = 0.48. n = 112 LOO-CV at R² < 0.5 means more than half of statewide PM_2.5 spatial variance remains unexplained. This is not a “validated” model fit; it is a cross-validation result that confirms the fusion is better than its constituent sources.

Provenance

Item
run.py	`[internal artifact]`
results.json	`investigations/38_sensor-fusion-bayesian/latest/results.json` (sha256 f9a7e68e410c)
analysis.md	`investigations/38_sensor-fusion-bayesian/latest/analysis.md`
scenario.md	`investigations/38_sensor-fusion-bayesian/latest/scenario.md`
AQS PM_2.5	`data/aqs/california_pm25_annual_2023.parquet` — 112 FRM/FEM monitors, 2023
ISRM PM_2.5	`data/processed/isrm_sector_pm25_aqs_anchored.npz` — B′ AQS-anchored rescale
Grid covariates	`data/processed/grid_covariates_real.parquet`
Mortality rates	`data/raw/evaldata_v1.6.1/mortalityRates2013.shp` (2013 vintage)
CRF	Di et al. 2017 NEJM 376:2513–2522 (β = 0.00705, ≥65)
Last run	2026-05-01 (results sha256 f9a7e68e410c)