Skip to main content
Studies · CA Air Quality · Investigation 22 · Phase 2

Should the Policy Be Fixed or Adaptive?

Phase 1 scored portfolios as one-shot picks. A five-level sequential ladder (one-shot → two-stage → rolling → POMDP → BO-optimal) lets the policy learn the true CRF regime over the first 3–5 years and reallocate the remaining budget. Best sequential policy avoids 777 deaths per trajectory vs 651 under commit-and-forget — a 19.2% gain, worth $1.5B at VSL.

5
Policy Levels
10
Year Horizon
+19%
Best vs One-shot
$1.5B
Value of Adapting
Why Sequential?

Commit-and-Forget Leaves Lives on the Table

Phase 1 scored each portfolio (T1, T2, B1, B2, W1, W2, Solar, combinations) as a single commitment: pick once, deploy for 10 years, count deaths. That framing assumes the CRF is known and fixed. Reality: over the first 3–5 years the program will observe actual mortality, refine the CRF posterior, and have every opportunity to reallocate the remaining budget.

In the Phase 2 Inv 21 hierarchical Bayesian analysis the posterior CRF CI brackets both Di (HR 1.073) and Krewski (HR 1.056) — so initial deployments under one CRF can be reallocated under the other as data rolls in. Inv 22 makes that information flow explicit.

The question: how much additional value does adaptive sequential allocation deliver over the Phase 1 commit-at-year-0 approach, and how sophisticated does the policy need to be to capture that value?

Fidelity Ladder

From One-Shot to Bayes-Optimal

Five policy fidelity levels. Each one drives a 10-year, $4B simulated deployment with a mixture-prior true CRF (35% Di / 35% Krewski / 30% Inv 21 posterior), noisy lagged mortality observations, and conjugate Bayesian updating.

L1
One-shot static schedule (Phase 1) Commit T2+B2+W1 annual spend at year 0, never adjust. Matches Phase 1 Inv 12 portfolio picks.
651
deaths/traj
L2
Two-stage stochastic program Commit 50% now; observe CRF for 5 years; commit remaining 50% under realized regime.
702
deaths/traj
L3
Rolling-horizon annual refit Every year re-optimize next-year allocation based on Bayesian posterior over CRF.
743
deaths/traj
L4
POMDP belief-state value iteration Explicit belief over {Di, Krewski, Inv 21} regimes; value-iterate optimal action per belief.
735
deaths/traj
L5
BO-optimal policy parameters (Emukit MFGP) Multi-fidelity BO over aggressiveness + risk-aversion; finds the policy that dominates all others.
777
deaths/traj
Fused
Kennedy–O’Hagan AR1 posterior Precision-weighted across L1–L4 anchored by L5. Captures the improvement ladder with uncertainty.
762
deaths/traj

200 Monte Carlo trajectories per policy. Discount 3%. Observations lagged 2 years with σ = 35% + 10 deaths. Belief state: Gaussian over log-CRF β with conjugate update.

Policy Comparison

The Gap is 19%

Policy Mean Deaths Avoided 95% CI Deaths per $B vs L1 One-shot
L1 One-shot (Phase 1)651[531, 2133]163+0.0%
L2 Two-stage702[572, 2302]175+7.8%
L3 Rolling horizon743[621, 2254]186+14.1%
L4 POMDP belief-state735[587, 2560]184+12.9%
L5 BO-optimal777[621, 2677]194+19.2%

Mean + 95% CI across 200 MC trajectories; each trajectory samples a different true CRF. Deaths per $B computed from mean lifetime discounted spend ($4B).

Finding
The best sequential policy (L5_bo) delivers 777 deaths avoided per trajectory vs 651 under Phase 1's one-shot approach — a 19.2% improvement, or about 125 extra lives saved over a $4B 10-year program. At the EPA $11.6M VSL, that is $1.5B of value left on the table by commit-and-forget policies.
Where the Value Comes From

Learning Beats Committing

L1 locks in a single allocation that splits the difference between Di and Krewski regimes. If the true CRF ends up Krewski-like, it under-allocates to transport (T2); if Di-like, it over-allocates. Either way, the fixed schedule is sub-optimal in hindsight.

L2 (two-stage) captures the simplest version of the gain: commit 50% early, wait 5 years for mortality to accumulate, and redirect the remaining budget. That alone adds 7.8% deaths avoided. L3 rolling-horizon refits annually, which adds another 6.3 pp because the early-year posterior collapses onto the true regime faster than 5-year batching.

L4 makes the belief state explicit (three discrete regime hypotheses + soft Bayesian weighting) and value-iterates the best response per belief. That nearly matches L3. L5 then wraps the policy in a 2-parameter family (aggressiveness, risk-aversion) and runs multi-fidelity Bayesian optimization to find the corner of the Pareto frontier — dominating all lower levels on expected deaths avoided.

Method Detail

The Policy Space

To compare one-shot commitments against adaptive schedules on equal footing, each fidelity level is expressed as a decision rule mapping the current evidence state to next-year spend — that is the object the sequential ladder is grading.

Each policy is a function π(state) → budget allocation. For L1–L4, π is hand-written; for L5, π is parameterized by (aggressiveness, risk-aversion) and the two parameters are optimized via Kennedy–O’Hagan MFGP over 30 evaluations of a cheap surrogate (50 MC trajectories) and 5 evaluations of the true simulator (200 MC trajectories).

Observations are noisy lagged mortality signals: given a fraction of option o deployed and a true CRF β, the annual deaths avoided is linear-interpolated between the Di and Krewski anchors published in Phase 1. Observations are lagged 2 years (epi surveillance) with Gaussian noise scaling with the signal. Bayesian updates use a conjugate normal-normal model on log-CRF.

Sources: Bellman 1957 (DP); Kaelbling et al. 1998 (POMDP); Kennedy & O'Hagan 2000 (MFGP); Tange 2018 (BO in public health). The Inv 21 hierarchical-Bayes posterior over CA-pooled CRF motivates the regime-belief formulation here.

Implication for the portfolio. This investigation says the Phase 1 deliverable should not be a single portfolio pick but a policy — a contingent plan that reacts to 3-year mortality observations. The 19% lives-avoided improvement at a $4B 10-year budget is worth $1.5B against a $0 incremental cost (the policy itself is just a decision rule).