Skip to main content

California Freight Cleanup → Investigation 8-3

How much does it cost to find the best monitor sites?

3.3× speedup (BOCA cost 1.5 vs full-sim 5.0) • 4/5 oracle overlap • 2% EVSI loss • full-simulation tier never needed

Investigation 27 identified which five candidate sites to deploy monitors to. This investigation asks the prior question: how much does it cost to find them? A full one-year simulation per candidate costs 100 times more than a cheap gap-score screen. We ran a tiered screening approach — cheap tests first, expensive tests only for survivors — and recovered 4 of 5 optimal sites at 3.3 times lower cost, without ever running the most expensive tier.

Should a CEC or CARB site-evaluation programme run full simulations on all 16 candidate sites before choosing 5 — or use a tiered screening hierarchy that applies cheap tests first and expensive tests only to survivors? The answer determines whether multi-fidelity evaluation is worth the algorithm complexity at programme scale. Investigation 8-3 tests this on the curated 16-site set from Investigation 8-2.

Fidelity ladder. Four evaluation levels are defined over the 16 Investigation 8-2 candidate sites:

FidelityLabelRelative costNoise σ
1Gap-score screen only0.010.25
2Haversine-adjusted gap0.050.15
3Climate-signal UCB0.200.08
4Full 1-year simulation1.000.02

Cost units are relative to a full-simulation budget of 5.0 (one full-sim evaluation per candidate). Total budget cap: 50 evaluation units.

Algorithm. A BOCA-inspired successive-halving UCB — closer in mechanism to Jamieson & Talwalkar 2016 than to Kandasamy et al. 2017 BOCA — maintains an independent Gaussian conjugate posterior per candidate. The acquisition function is (info_gain + ucb_weight × UCB) / sqrt(cost). Promotion to a higher fidelity requires sufficient lower-fidelity evaluations; a candidate is committed once posterior std < 0.10 and at least one fid-3 or fid-4 evaluation is complete.

Baseline. Single-fidelity UCB evaluates at fidelity 4 (full simulation) throughout, selecting the 5 highest-UCB candidates from up to 50 evaluations.

Oracle. The true EVSI ranking from Investigation 8-2’s L4 scoring formula applied to each candidate’s attributes. Oracle overlap — how many of the algorithm’s top-5 selections match the oracle top-5 — is the primary quality metric.

3.3× faster at 2% value loss: tiered screening works

BOCA selects 5 sites at total cost 1.5 units vs. 5.0 for full-simulation UCB — a 3.3× reduction. Realized true EVSI: BOCA 2.5186 vs. full-sim 2.5720, a gap of −0.054 (−2.1%). At the 21k-cell full grid (Investigation 8-2 caveat), the speedup compounds to an estimated 5–8× because the cheap gap-score screening phase scales with candidate count while full-simulation cost grows with each candidate evaluated at full fidelity.

AlgorithmTotal cost (units)Realized EVSIOracle overlap
BOCA screening1.52.51864/5
Full-simulation UCB5.02.57205/5

The most expensive evaluation was never needed — cheap screens plus one mid-tier check resolved every site

36 total evaluations: 30 at fidelity 1 (gap score, cost 0.01), 6 at fidelity 3 (climate-UCB, cost 0.20). Fidelity 2 (haversine-adjusted gap) and fidelity 4 (full simulation) were never used. The acquisition function jumped directly from fid-1 to fid-3 commit for every top-ranked candidate — fid-2’s cost/noise ratio offered no additional discrimination worth paying for. This is not a shortcut. It is the multi-fidelity logic working as designed: cheap noisy screens identify the high-EVSI cluster; medium-cost confirmation commits; expensive full-sim is never required.

The one missed site shows the predictable cost of cheap-screen-first logic

BOCA selects rest_ca_cell18810 (posterior mean 0.385) over oracle rank-5 sjv_cell9709 (true EVSI 0.370). sjv_cell9709 was eliminated by one noisy fid-1 observation of −0.217 at step 4, dragging its posterior mean to −0.017. The algorithm never promoted it to fid-3. One bad cheap draw eliminated a genuinely good candidate before any higher-fidelity evidence was collected. That is the cost of the 3.3× speedup. Tolerance for this tradeoff is a policy decision, not a methodology failure.

Item
run.py[internal artifact]
results.jsoninvestigations/32_monitor-multifidelity/latest/results.json (sha256 1b31413e6cd2)
analysis.mdinvestigations/32_monitor-multifidelity/latest/analysis.md
scenario.mdinvestigations/32_monitor-multifidelity/latest/scenario.md
Upstream (Investigation 8-2)investigations/27_monitor-adaptive/latest/results.json (sha256 c92a5b8aface) — candidate sites, n_monitors, EVSI parameters
Kandasamy et al. 2017NeurIPS — multi-fidelity BO framing (inspiration, not algorithm)
Jamieson & Talwalkar 2016Successive halving (closer to implemented mechanism)
Srinivas et al. 2010GP-UCB acquisition function
Last run2026-05-01 (results sha256 1b31413e6cd2)