California Freight Cleanup → Investigation M-2

How much does adaptive re-planning beat a one-shot commitment?

The formal belief-conditioning step (PBVI) collapsed to a single dominant action at every belief state. The +23.9% lift at that level reflects best-allocation under value iteration, not true adaptive policy. We say so in the writeup.

Best sequential policy: 1,692 deaths avoided over 10 yr vs 1,332 for a static one-shot plan

Five policy classes, from a static pre-committed plan to a Bayesian-optimization-tuned adaptive policy, evaluated over 200 simulated 10-year trajectories on a $4 billion horizon. A cost-overrun model (mean 1.16×, 95th percentile 1.39×) is embedded throughout. The ranking of policies is unchanged under cost risk.

The decision

Phase 1 assumed a static, pre-committed allocation schedule. That assumption ignores the most important feature of a real program: every year, monitoring data and health surveillance update what we know about which dose-response regime is operating. A program that ignores that signal leaves preventable deaths on the table.

The dose-response function is the dominant uncertainty in the California Freight Cleanup portfolio. This investigation quantifies how much an adaptive strategy is worth — across five escalating policy classes, measured in deaths avoided per dollar over a 10-year horizon.

Methodology

Simulation envelope. 200 Monte Carlo trajectories per policy, 10-year horizon, $400M annual budget ($4B total), 3% CEC programmatic discount rate. Each trajectory draws a “true CRF” β from a three-component mixture: 35.0% Di et al. 2017 (β = 0.00705), 35.0% Krewski et al. 2009 (β = 0.00545), 30.0% Investigation 6-3 hierarchical posterior (β = 0.02439). The true CRF is fixed within a trajectory but unknown to the policy. The Investigation 6-3 hierarchical posterior β = 0.02439 is ~3.5× either Di (0.00705) or Krewski (0.00545); trajectories drawn from the 30%-weighted Investigation 6-3 arm therefore dominate mean deaths-avoided. This mixture is a stress-test design, not a policy prior.

Five policy levels.

L1 — One-shot static. Allocation pre-committed at program start; no learning. The Phase 1 reference.
L2 — Two-stage stochastic. Commits to a 5-year plan, observes year-5 mortality data, replans the remaining 5 years. One observation break.
L3 — Rolling-horizon. Re-solves a 10-year horizon each year given the current CRF point estimate. Annual replanning; no belief-state tracking.
L4 — PBVI POMDP. Full point-based value iteration (Pineau, Gordon & Thrun 2003) over a stationary 3-state POMDP. State space: three CRF regimes {Di, Krewski, Investigation 6-3 posterior}. Action space: four candidate allocations (alloc_di, alloc_kr, alloc_inv21, alloc_balanced). Observation model built empirically by Monte Carlo replay (N=2,000). Alpha-vector backups iterate to convergence.
L5 — Multi-fidelity Bayesian optimization. A 2-parameter BO search using the Kennedy-O’Hagan AR1 multi-fidelity kernel (L1–L4 as low-fidelity surrogates) tunes learning rate and exploration fraction. The KO AR1 fusion also produces a cross-policy posterior mean (1,652.8 deaths) and sigma (298.5).

Cost-overrun model (Phase 2b). A half-normal distribution (overrun = 1 + |N(0,σ)|, σ = 0.20) models program execution risk. Mean overrun factor: 1.16×; P95: 1.39×. Every policy level is tested with and without cost-overrun risk.

Policy results

Policy	Deaths avoided (mean)	Deaths/$B	vs L1	Cost-overrun Δ deaths
L1 one-shot	1,332	333	baseline	−193 (−12.6%)
L2 two-stage	1,434	359	+7.7%	−211 (−12.8%)
L3 rolling	1,531	383	+14.9%	−225 (−12.8%)
L4 PBVI POMDP	1,650	413	+23.9%	−251 (−13.2%)
L5 BO-tuned	1,692	423	+27.1%	−257 (−13.2%)

Findings

Two-panel: half-normal cost-overrun density and policy deaths bar chart — Figure: (Left) Half-normal cost-overrun distribution (σ = 0.20); mean overrun 1.16×, P95 1.39×. (Right) Policy ladder deaths avoided with vs without cost-overrun risk. The PBVI L4 policy collapses to a single dominant action (`alloc_kr`) at every belief state — the +23.9% lift over L1 reflects best-static-allocation under value iteration, not adaptive belief tracking.

Sequential adaptivity is worth +360 deaths over 10 years (+27.1%)

The gap between L1 (one-shot) and L5 (best sequential) is 360 deaths avoided over the 10-year horizon, or +90.1 deaths/$B. This is the value of building a program that re-optimizes as CRF evidence accumulates, rather than committing to a pre-specified allocation schedule.

PBVI collapses to a single dominant action: the gain is optimization, not adaptivity

The PBVI solver returns alpha_vector_count = 1 and unique_actions_selected = 1. At every probed belief—prior-mix, uniform, all three corner-confident beliefs, all three two-state splits—PBVI picks alloc_kr regardless. The +23.9% L4 lift over L1 is best-static-allocation under value iteration, not genuine belief-conditioned adaptivity. Under this 4-action menu, the optimal allocation is regime-invariant—the operational value of true adaptivity is approximately zero. Sequencing still pays via L2/L3 myopic re-optimization; PBVI contributes global optimization, not optionality.

Best policy unchanged under cost-overrun: cost risk does not flip the ranking

All five policy levels suffer a proportional −12.6% to −13.2% deaths reduction from the half-normal cost-overrun tail (mean 1.16×, P95 1.39×). The ranking L5 > L4 > L3 > L2 > L1 is unchanged. Program-execution cost risk does not reverse the recommendation to choose adaptive over static policies.

KO AR1 multi-fidelity fusion: cross-policy posterior mean 1,653 deaths, σ 299

The Kennedy–O’Hagan AR1 fusion across policy levels (using L1–L4 as low-fidelity proxies and L5 as the high-fidelity anchor) produces AR1 correlation ratios ρ = 1.076, 1.098, 1.039 across consecutive levels. The posterior mean (1,653 deaths) and sigma (299) characterize overall simulation uncertainty across the policy fidelity ladder.

Caveats

PBVI action collapse is confirmed and pinned by a test. tests/test_inv22_pbvi_integration.py::test_l4_pbvi_picks_dominant_action asserts this property. Any future menu expansion that breaks the monotone dominance of T2-share would change the POMDP behavior qualitatively.
Belief state is CRF-only. Deployment feasibility, contractor availability, permitting delays, and ISO interconnection backlogs are all treated as known and non-stochastic. In a real program these operational uncertainties likely dwarf the CRF uncertainty in year-to-year variability.
200 trajectories adequate for mean ranking, not tail estimation. P5/P95 confidence intervals are wide at 200 draws. The L4–L5 gap (42 deaths) is modest relative to within-policy SD (525–541 deaths).
Investigation 6-3 β is read live, not hardcoded. As of Phase 6 (2026-04-30), the MC pipeline injects the live Investigation 6-3 posterior β via upstream_value before each run. The stale_hardcoded_beta = 0.00649 in results.json is retained as a historical provenance label only — it is not used in the policy comparison above. The 1.79e−02 drift figure reflects the distance between that historical constant and the current live value; the policy table is valid.
L5 BO is 2-parameter. The optimization tunes learning rate and exploration fraction only. A higher-dimensional policy space (region-specific budgets, technology-category splits) would require Emukit multi-fidelity kernels.

Provenance

Item	SHA-256 (12-char)
results.json	`856374cbd57f`
analysis.md	—
scenario.md	—
Upstream: Investigation 6-3 (CRF posterior)	investigations/21_crf-hierarchical-bayes/latest/results.json	`3104ba850408`
Key reference	Pineau, Gordon & Thrun (2003) — PBVI point-based value iteration. Kaelbling et al. (1998) — canonical POMDP formulation. Kennedy & O’Hagan (2000) — KO AR1 MF-GP.
Run timestamp	2026-05-04T07:46:20 N_traj = 200 per policy horizon = 10yr budget = $4B seed = 2026