Skip to main content

California Freight Cleanup → Investigation 3-11

Which physics-informed GP kernel modification actually helps?

Kernel C (CMAQ covariate): 2.542 µg/m³ (−-22.0% mean vs baseline 2.762; one of five folds worsens, three improve, one is flat) • A/B: both worse

The the CEC freight solicitation proposal scoring rubric flags a physics-informed GP kernel (advection-diffusion structure in the covariance) as the highest-leverage methodology lever for Criterion 1 (Technical Merit). Investigation 3-11 is the honest empirical test: three candidate kernels, one single-axis change per experiment, materiality threshold set before running. The lever does pull weight—but only via mean-prior augmentation (Kennedy-O’Hagan 2001 style), not via covariance-form physics encoding.

The CEC proposal scoring rewards physics-informed model improvements. But the honest version of that claim requires an actual experiment: does encoding physical knowledge about atmospheric transport into the model improve accuracy on real California monitoring data, or is it purely a methodological signal with no numerical payoff?

Either verdict is defensible. A real improvement supports the technical-merit narrative. A clean negative — three variants tested, none beat the baseline — is itself a rigor signal. We found a partial positive: one of three variants wins by 8%, but the mechanism is different from what the standard atmospheric physics framing would suggest.

All three candidate kernels are tested as drop-in replacements for the isotropic Matern(ν=1.5) + WhiteKernel residual fit inside Investigation 3-4’s exact 2019 4-level MFGP chain. Every other chain element is preserved bit-identical: the 64-site California AQS panel, the Investigation 3-1 5-fold spatial CV split, the ρk OLS-through-origin layer, and the canonical recursive predict using only L1 at test. Only the δk GP kernel varies. The baseline RMSE must reproduce Investigation 3-4’s 2.762 µg/m³ exactly before any kernel comparison is valid; it does (2.762, Δ = 0.000).

KernelDescriptionPhysics encoding
Baseline Isotropic Matern(ν=1.5) + WhiteKernel None—Investigation 3-4 exact form
Kernel A Input (lat,lon) rotated to align with CA-aggregate prevailing-wind transport axis (75° ENE per NCEP 2019 850 hPa); ARD Matern Anisotropic advection: correlation falls off slower along-wind than cross-wind
Kernel B ARD Matern with independent (lat, lon) length-scales fit by MLE Data-driven anisotropy; reality-check for A
Kernel C Input extended to [lat, lon, cmaq_z]; cmaq_z filled with ρ1 × f1_test at predict time Chemistry-prior covariate: GP encodes spatial proximity AND chemistry-prior similarity

The materiality threshold is set at 0.140 µg/m³ (≈ 5% of baseline 2.762, larger than typical fold-to-fold noise). A kernel winning by less than this is reported as “no improvement below materiality”—not as a win. CA aggregate wind direction (255° from-direction, transport axis 75°) is from the NCEP/NCAR Reanalysis 2019 annual-mean 850 hPa composite.

Bar chart of 5-fold CV-RMSE for baseline (2.762), Kernel A (3.012), Kernel B (2.871), and Kernel C (2.542) µg/m³. Kernel C is the only variant below the baseline; A and B are both above.
5-fold spatial CV-RMSE (California 2019, 64 AQS sites) for the baseline and three candidate kernels. The horizontal line marks the materiality threshold (0.140 µg/m³ improvement). Only Kernel C clears it.

Adding chemistry-model output as an input feature wins: 8% accuracy improvement

RMSE 2.542 µg/m³ versus baseline 2.762 µg/m³. The improvement clears the 0.140 µg/m³ materiality threshold with margin. 3 of 5 folds improve substantially; 1 fold is flat; 1 fold worsens. The kernel is not uniformly better; it is a high-mean-low-floor tradeoff with increased fold-to-fold standard deviation (0.8921.045).

Kernel C’s win is mechanistically interpretable, not an AQS leakage artifact

The concern with adding the CMAQ value as a kernel-input covariate is the NARGP-style collapse mode documented in Investigation 3-4 (adding the prior-rung value as input lets the GP collapse to plain spatial kriging, producing a spurious −1.84 µg/m³ “improvement”). Investigation 3-11 guards against this by using ρ1 × f1_test (the surrogate-reconstructed CMAQ, not held-out CMAQ truth) at predict time. The −0.22 µg/m³ improvement, compared to the spurious −1.84 µg/m³ from the NARGP-collapse mode, is consistent with a real modest signal rather than a leakage artifact.

Wind-alignment and data-driven stretch variants both hurt accuracy

Wind-aligned anisotropy (Kernel A, RMSE 3.012) worsens RMSE on all 5 folds. Data-driven ARD anisotropy (Kernel B, RMSE 2.871) is within fold noise of the baseline. A hand-coded physics anisotropy at the CA-aggregate scale is the wrong granularity: California’s wind field is basin-segmented (Bay Area NW marine, San Joaquin Valley stagnation, LA WSW onshore, Mojave N synoptic), so a single 75° transport axis averages across regimes with nearly orthogonal preferred directions. At 64 AQS training sites (13 per fold), neither kernel has enough data to find a better anisotropy than the isotropic baseline.

The winning mechanism is not the one the physics framing implies — that matters for how we describe it

Kernel C is closer to a Kennedy-O’Hagan 2001 model-bias-correction in 3-D input space than to a true PDE-derived kernel (advection-diffusion structure in the covariance function). A reviewer who reads carefully will notice this. The defensible proposal claim: “three candidate physics-informed kernel modifications tested; the chemistry-prior covariate variant reduced RMSE by 8%; the pure-anisotropy variants did not. The winning mechanism is mean-prior augmentation, not covariance-form physics encoding.” This is more credible than claiming a clean advection-diffusion kernel win.

ItemSHA-256 (12-char)
results.json91c8c82d18ce
analysis.md
scenario.md
Upstream: Investigation 3-4 (4-level MFGP baseline) investigations/42_l4-mfgp-corrected/latest/results.json b89d8204eb15
Upstream: Investigation 3-1 (5-fold CV splits) investigations/39_aqs-held-out-validation/latest/results.json c63ae2d281ce
Run timestamp 2026-05-03T22:37:04   64 CA AQS sites   5-fold spatial CV   n_restarts = 6