Skip to main content

California Freight Cleanup → Investigation M-2

How much does adaptive re-planning beat a one-shot commitment?

The formal belief-conditioning step (PBVI) collapsed to a single dominant action at every belief state. The +23.9% lift at that level reflects best-allocation under value iteration, not true adaptive policy. We say so in the writeup.

Best sequential policy: 1,692 deaths avoided over 10 yr vs 1,332 for a static one-shot plan

Five policy classes, from a static pre-committed plan to a Bayesian-optimization-tuned adaptive policy, evaluated over 200 simulated 10-year trajectories on a $4 billion horizon. A cost-overrun model (mean 1.16×, 95th percentile 1.39×) is embedded throughout. The ranking of policies is unchanged under cost risk.

Phase 1 assumed a static, pre-committed allocation schedule. That assumption ignores the most important feature of a real program: every year, monitoring data and health surveillance update what we know about which dose-response regime is operating. A program that ignores that signal leaves preventable deaths on the table.

The dose-response function is the dominant uncertainty in the California Freight Cleanup portfolio. This investigation quantifies how much an adaptive strategy is worth — across five escalating policy classes, measured in deaths avoided per dollar over a 10-year horizon.

Simulation envelope. 200 Monte Carlo trajectories per policy, 10-year horizon, $400M annual budget ($4B total), 3% CEC programmatic discount rate. Each trajectory draws a “true CRF” β from a three-component mixture: 35.0% Di et al. 2017 (β = 0.00705), 35.0% Krewski et al. 2009 (β = 0.00545), 30.0% Investigation 6-3 hierarchical posterior (β = 0.02439). The true CRF is fixed within a trajectory but unknown to the policy. The Investigation 6-3 hierarchical posterior β = 0.02439 is ~3.5× either Di (0.00705) or Krewski (0.00545); trajectories drawn from the 30%-weighted Investigation 6-3 arm therefore dominate mean deaths-avoided. This mixture is a stress-test design, not a policy prior.

Five policy levels.

Cost-overrun model (Phase 2b). A half-normal distribution (overrun = 1 + |N(0,σ)|, σ = 0.20) models program execution risk. Mean overrun factor: 1.16×; P95: 1.39×. Every policy level is tested with and without cost-overrun risk.

Policy Deaths avoided (mean) Deaths/$B vs L1 Cost-overrun Δ deaths
L1 one-shot1,332333baseline193 (−12.6%)
L2 two-stage1,434359+7.7%211 (−12.8%)
L3 rolling1,531383+14.9%225 (−12.8%)
L4 PBVI POMDP1,650413+23.9%251 (−13.2%)
L5 BO-tuned1,692423+27.1%257 (−13.2%)
Two-panel: half-normal cost-overrun density and policy deaths bar chart
Figure: (Left) Half-normal cost-overrun distribution (σ = 0.20); mean overrun 1.16×, P95 1.39×. (Right) Policy ladder deaths avoided with vs without cost-overrun risk. The PBVI L4 policy collapses to a single dominant action (alloc_kr) at every belief state — the +23.9% lift over L1 reflects best-static-allocation under value iteration, not adaptive belief tracking.

Sequential adaptivity is worth +360 deaths over 10 years (+27.1%)

The gap between L1 (one-shot) and L5 (best sequential) is 360 deaths avoided over the 10-year horizon, or +90.1 deaths/$B. This is the value of building a program that re-optimizes as CRF evidence accumulates, rather than committing to a pre-specified allocation schedule.

PBVI collapses to a single dominant action: the gain is optimization, not adaptivity

The PBVI solver returns alpha_vector_count = 1 and unique_actions_selected = 1. At every probed belief—prior-mix, uniform, all three corner-confident beliefs, all three two-state splits—PBVI picks alloc_kr regardless. The +23.9% L4 lift over L1 is best-static-allocation under value iteration, not genuine belief-conditioned adaptivity. Under this 4-action menu, the optimal allocation is regime-invariant—the operational value of true adaptivity is approximately zero. Sequencing still pays via L2/L3 myopic re-optimization; PBVI contributes global optimization, not optionality.

Best policy unchanged under cost-overrun: cost risk does not flip the ranking

All five policy levels suffer a proportional −12.6% to −13.2% deaths reduction from the half-normal cost-overrun tail (mean 1.16×, P95 1.39×). The ranking L5 > L4 > L3 > L2 > L1 is unchanged. Program-execution cost risk does not reverse the recommendation to choose adaptive over static policies.

KO AR1 multi-fidelity fusion: cross-policy posterior mean 1,653 deaths, σ 299

The Kennedy–O’Hagan AR1 fusion across policy levels (using L1–L4 as low-fidelity proxies and L5 as the high-fidelity anchor) produces AR1 correlation ratios ρ = 1.076, 1.098, 1.039 across consecutive levels. The posterior mean (1,653 deaths) and sigma (299) characterize overall simulation uncertainty across the policy fidelity ladder.

ItemSHA-256 (12-char)
results.json856374cbd57f
analysis.md
scenario.md
Upstream: Investigation 6-3 (CRF posterior) investigations/21_crf-hierarchical-bayes/latest/results.json 3104ba850408
Key reference Pineau, Gordon & Thrun (2003) — PBVI point-based value iteration. Kaelbling et al. (1998) — canonical POMDP formulation. Kennedy & O’Hagan (2000) — KO AR1 MF-GP.
Run timestamp 2026-05-04T07:46:20   N_traj = 200 per policy   horizon = 10yr   budget = $4B   seed = 2026