Skip to main content

California Freight Cleanup → Investigation 6-10

Does the mortality model survive the two most common reviewer challenges?

SE ratio 1.074× — cluster-robust; +4.7% HR shift — 1-yr vs. 5-yr window

Pre-cascade robustness discipline for the central CRF parameter. Two targeted tests answer the two most common Cox-model reviewer challenges before HR = 1.28 enters the portfolio NB calculations.

The mortality risk estimate (HR = 1.28 per 10 μg/m³) feeds every health-cost and net-benefit calculation in the California Freight Cleanup portfolio. Before that number propagates into portfolio rankings, it needs to survive two challenges a knowledgeable reviewer would raise. First: people in the same region share similar air-quality exposure, which can make standard statistical confidence intervals look narrower than they really are. Second: the model uses a 5-year trailing PM2.5 average, following the Pope 2009 cumulative-exposure framework. Does shifting to a 1-year window move the estimate outside a defensible range?

Test A — Cluster-robust sandwich SE (Lin-Wei / Tsiatis, G = 5). The standard Cox approach (lifelines cluster_col=) is O(N²) on score residuals and was infeasible at N = 706K (process killed after 25 minutes). statsmodels PHReg groups= uses the Tsiatis score-vector aggregation (O(N log N) under Breslow ties), completing in ~60 s. The mathematical result is identical; only the computational path differs. The apples-to-apples local baseline (statsmodels, single-pollutant, 5-year window, no clustering: HR = 1.41) is run first so that the SE ratio and window-shift comparisons are not confounded by the multi-pollutant Phase 7c adjustment.

Test B — 1-year PM2.5 exposure window. AQS state annual mean PM2.5 aggregated to census-region means (same mapping as Investigation 6-7), joined to the panel as pm25_1yr_mean. Same statsmodels PHReg specification (single-pollutant, no cluster) run on pm25_1yr_mean vs. pm25_5yr_mean.

Test C (consciously omitted) — Gamma frailty. lifelines does not support shared gamma frailty for 706K subjects tractably. Investigation 6-3’s hybrid already captures between-group heterogeneity via DerSimonian-Laird τ = 0.129; per-group frailty requires NCHS RDC county FIPS (deferred to Phase 7).

Verdict rule. ROBUST: both tests within ±10% HR shift and CI overlaps baseline. MILDLY SENSITIVE: one test 5–10% shift or CI overlap <50%. SENSITIVE: >10% shift or no CI overlap.

Important baseline note: the local baseline for these robustness tests is HR = 1.41 (single-pollutant, statsmodels PHReg, 5-yr window), not the Investigation 6-3 headline of HR = 1.28. The gap reflects two adjustments made in Investigation 6-3’s Phase 7c run: (a) multi-pollutant joint Cox (PM2.5 + NO2 + O3) attenuates βPM2.5 by ~9% (carrying HR from ~1.41 to ~1.28); (b) smoking and BMI covariates on ~50% of subjects shift the point estimate within that range. The SE ratio and window-shift comparisons are apples-to-apples within this investigation; the level difference between 1.41 and 1.28 does not affect those tests.

TestMetricBaselineTest resultVerdict
A: Cluster-robust SE (G = 5) SE ratio (cluster / standard) 1.0001.074× ROBUST
A: Cluster-robust SE HR shift 1.411.41 (+0.00%) ROBUST
A: Cluster-robust SE CI overlap with local std YES ROBUST
B: 1-year vs. 5-year window HR per 10 μg/m³ 1.411.48 (+4.7%) ROBUST
B: 1-year vs. 5-year window CI overlap with 5-year 59.8% ROBUST

Correcting for within-region correlation widens the confidence interval by only 7.4% — not material

The Lin-Wei/Tsiatis cluster-robust SE (G = 5 census regions) inflates by 1.074× relative to the standard non-robust SE. The Investigation 6-3 CI is not materially understated by within-region exposure correlation. HR is unchanged — clustering adjusts the standard error but not the point estimate (clustering affects only SE, not point estimate). G = 5 is below the Lin-Wei G ≥ 10 reliability threshold, so the SE ratio is an informative lower bound rather than a precisely calibrated CI width—but at 7.4% inflation the qualitative verdict (ROBUST) is stable.

The 1-year exposure window gives a 4.7% higher estimate — directionally consistent with what we expect, not a model instability

The 1-year window (HR = 1.48) is higher than the 5-year window (HR = 1.41)—directionally consistent with evidence that acute-year PM2.5 spikes (wildfire events, stagnation episodes) carry excess mortality effects beyond the long-run chronic exposure average (Wettstein 2018, Reid 2016). Both windows are well within the ±10% ROBUST threshold; the CI overlap fraction of 59.8% confirms the two estimates are not statistically distinguishable. The 5-year window remains the policy default; the 1-year result is the upward sensitivity.