California Freight Cleanup → Investigation 8-3
How much does it cost to find the best monitor sites?
3.3× speedup (BOCA cost 1.5 vs full-sim 5.0) • 4/5 oracle overlap • 2% EVSI loss • full-simulation tier never neededInvestigation 27 identified which five candidate sites to deploy monitors to. This investigation asks the prior question: how much does it cost to find them? A full one-year simulation per candidate costs 100 times more than a cheap gap-score screen. We ran a tiered screening approach — cheap tests first, expensive tests only for survivors — and recovered 4 of 5 optimal sites at 3.3 times lower cost, without ever running the most expensive tier.
The decision
Should a CEC or CARB site-evaluation programme run full simulations on all 16 candidate sites before choosing 5 — or use a tiered screening hierarchy that applies cheap tests first and expensive tests only to survivors? The answer determines whether multi-fidelity evaluation is worth the algorithm complexity at programme scale. Investigation 8-3 tests this on the curated 16-site set from Investigation 8-2.
Methodology
Fidelity ladder. Four evaluation levels are defined over the 16 Investigation 8-2 candidate sites:
| Fidelity | Label | Relative cost | Noise σ |
|---|---|---|---|
| 1 | Gap-score screen only | 0.01 | 0.25 |
| 2 | Haversine-adjusted gap | 0.05 | 0.15 |
| 3 | Climate-signal UCB | 0.20 | 0.08 |
| 4 | Full 1-year simulation | 1.00 | 0.02 |
Cost units are relative to a full-simulation budget of 5.0 (one full-sim evaluation per candidate). Total budget cap: 50 evaluation units.
Algorithm. A BOCA-inspired successive-halving UCB — closer in
mechanism to Jamieson & Talwalkar 2016 than to Kandasamy et al. 2017
BOCA — maintains an independent Gaussian conjugate posterior per candidate. The
acquisition function is (info_gain + ucb_weight × UCB) / sqrt(cost).
Promotion to a higher fidelity requires sufficient lower-fidelity evaluations;
a candidate is committed once posterior std < 0.10 and at least one
fid-3 or fid-4 evaluation is complete.
Baseline. Single-fidelity UCB evaluates at fidelity 4 (full simulation) throughout, selecting the 5 highest-UCB candidates from up to 50 evaluations.
Oracle. The true EVSI ranking from Investigation 8-2’s L4 scoring formula applied to each candidate’s attributes. Oracle overlap — how many of the algorithm’s top-5 selections match the oracle top-5 — is the primary quality metric.
Findings
3.3× faster at 2% value loss: tiered screening works
BOCA selects 5 sites at total cost 1.5 units vs. 5.0 for full-simulation UCB — a 3.3× reduction. Realized true EVSI: BOCA 2.5186 vs. full-sim 2.5720, a gap of −0.054 (−2.1%). At the 21k-cell full grid (Investigation 8-2 caveat), the speedup compounds to an estimated 5–8× because the cheap gap-score screening phase scales with candidate count while full-simulation cost grows with each candidate evaluated at full fidelity.
| Algorithm | Total cost (units) | Realized EVSI | Oracle overlap |
|---|---|---|---|
| BOCA screening | 1.5 | 2.5186 | 4/5 |
| Full-simulation UCB | 5.0 | 2.5720 | 5/5 |
The most expensive evaluation was never needed — cheap screens plus one mid-tier check resolved every site
36 total evaluations: 30 at fidelity 1 (gap score, cost 0.01), 6 at fidelity 3 (climate-UCB, cost 0.20). Fidelity 2 (haversine-adjusted gap) and fidelity 4 (full simulation) were never used. The acquisition function jumped directly from fid-1 to fid-3 commit for every top-ranked candidate — fid-2’s cost/noise ratio offered no additional discrimination worth paying for. This is not a shortcut. It is the multi-fidelity logic working as designed: cheap noisy screens identify the high-EVSI cluster; medium-cost confirmation commits; expensive full-sim is never required.
The one missed site shows the predictable cost of cheap-screen-first logic
BOCA selects rest_ca_cell18810 (posterior mean 0.385) over oracle rank-5 sjv_cell9709 (true EVSI 0.370). sjv_cell9709 was eliminated by one noisy fid-1 observation of −0.217 at step 4, dragging its posterior mean to −0.017. The algorithm never promoted it to fid-3. One bad cheap draw eliminated a genuinely good candidate before any higher-fidelity evidence was collected. That is the cost of the 3.3× speedup. Tolerance for this tradeoff is a policy decision, not a methodology failure.
Caveats
- Not canonical BOCA. The algorithm has no joint GP linking fidelity levels, no bias-budget term, and no kernel ridge regression between fidelity levels. The name “BOCA-inspired” is retained for the multi-fidelity successive-narrowing architecture; the algorithm docstring labels this distinction explicitly.
- Fidelity costs are nominal, not empirical. The 0.01/0.05/0.20/1.00 ladder represents a plausible order-of-magnitude ratio for gap-screen vs. field reconnaissance vs. short-term monitoring vs. full-year simulation. Real CEC procurement timelines would change these ratios.
- Fidelity-gating thresholds are hand-tuned. The promotion rules (fid-2 requires ≥1 lower-fid eval; fid-3 requires ≥2; fid-4 requires ≥4) are not derived from a bias-budget optimization; they are engineering choices.
- 16-site candidate set understates the problem at scale. On 16 sites, gap-score UCB signal alone is strong enough to identify most top candidates quickly. On the full 21k-cell grid, cheap fidelity screening would provide far larger absolute savings.
- Single seed (42) deterministic run. The 4/5 oracle overlap and −0.054 EVSI gap could shift to 3/5 or 5/5 under different seeds. The 3.3× speedup ratio is tied to the fidelity-cost ladder, not the seed.
Provenance
| Item | |
|---|---|
| run.py | [internal artifact] |
| results.json | investigations/32_monitor-multifidelity/latest/results.json (sha256 1b31413e6cd2) |
| analysis.md | investigations/32_monitor-multifidelity/latest/analysis.md |
| scenario.md | investigations/32_monitor-multifidelity/latest/scenario.md |
| Upstream (Investigation 8-2) | investigations/27_monitor-adaptive/latest/results.json (sha256 c92a5b8aface) — candidate sites, n_monitors, EVSI parameters |
| Kandasamy et al. 2017 | NeurIPS — multi-fidelity BO framing (inspiration, not algorithm) |
| Jamieson & Talwalkar 2016 | Successive halving (closer to implemented mechanism) |
| Srinivas et al. 2010 | GP-UCB acquisition function |
| Last run | 2026-05-01 (results sha256 1b31413e6cd2) |