California Freight Cleanup → Investigation 6-10

Does the mortality model survive the two most common reviewer challenges?

SE ratio 1.074× — cluster-robust; +4.7% HR shift — 1-yr vs. 5-yr window

Pre-cascade robustness discipline for the central CRF parameter. Two targeted tests answer the two most common Cox-model reviewer challenges before HR = 1.28 enters the portfolio NB calculations.

Decision context

The mortality risk estimate (HR = 1.28 per 10 μg/m³) feeds every health-cost and net-benefit calculation in the California Freight Cleanup portfolio. Before that number propagates into portfolio rankings, it needs to survive two challenges a knowledgeable reviewer would raise. First: people in the same region share similar air-quality exposure, which can make standard statistical confidence intervals look narrower than they really are. Second: the model uses a 5-year trailing PM_2.5 average, following the Pope 2009 cumulative-exposure framework. Does shifting to a 1-year window move the estimate outside a defensible range?

Methodology

Test A — Cluster-robust sandwich SE (Lin-Wei / Tsiatis, G = 5). The standard Cox approach (lifelines cluster_col=) is O(N²) on score residuals and was infeasible at N = 706K (process killed after 25 minutes). statsmodels PHReg groups= uses the Tsiatis score-vector aggregation (O(N log N) under Breslow ties), completing in ~60 s. The mathematical result is identical; only the computational path differs. The apples-to-apples local baseline (statsmodels, single-pollutant, 5-year window, no clustering: HR = 1.41) is run first so that the SE ratio and window-shift comparisons are not confounded by the multi-pollutant Phase 7c adjustment.

Test B — 1-year PM_2.5 exposure window. AQS state annual mean PM_2.5 aggregated to census-region means (same mapping as Investigation 6-7), joined to the panel as pm25_1yr_mean. Same statsmodels PHReg specification (single-pollutant, no cluster) run on pm25_1yr_mean vs. pm25_5yr_mean.

Test C (consciously omitted) — Gamma frailty. lifelines does not support shared gamma frailty for 706K subjects tractably. Investigation 6-3’s hybrid already captures between-group heterogeneity via DerSimonian-Laird τ = 0.129; per-group frailty requires NCHS RDC county FIPS (deferred to Phase 7).

Verdict rule. ROBUST: both tests within ±10% HR shift and CI overlaps baseline. MILDLY SENSITIVE: one test 5–10% shift or CI overlap <50%. SENSITIVE: >10% shift or no CI overlap.

Headline results

Important baseline note: the local baseline for these robustness tests is HR = 1.41 (single-pollutant, statsmodels PHReg, 5-yr window), not the Investigation 6-3 headline of HR = 1.28. The gap reflects two adjustments made in Investigation 6-3’s Phase 7c run: (a) multi-pollutant joint Cox (PM_2.5 + NO₂ + O₃) attenuates β_PM2.5 by ~9% (carrying HR from ~1.41 to ~1.28); (b) smoking and BMI covariates on ~50% of subjects shift the point estimate within that range. The SE ratio and window-shift comparisons are apples-to-apples within this investigation; the level difference between 1.41 and 1.28 does not affect those tests.

Test	Metric	Baseline	Test result	Verdict
A: Cluster-robust SE (G = 5)	SE ratio (cluster / standard)	1.000	1.074×	ROBUST
A: Cluster-robust SE	HR shift	1.41	1.41 (+0.00%)	ROBUST
A: Cluster-robust SE	CI overlap with local std	—	YES	ROBUST
B: 1-year vs. 5-year window	HR per 10 μg/m³	1.41	1.48 (+4.7%)	ROBUST
B: 1-year vs. 5-year window	CI overlap with 5-year	—	59.8%	ROBUST

Correcting for within-region correlation widens the confidence interval by only 7.4% — not material

The Lin-Wei/Tsiatis cluster-robust SE (G = 5 census regions) inflates by 1.074× relative to the standard non-robust SE. The Investigation 6-3 CI is not materially understated by within-region exposure correlation. HR is unchanged — clustering adjusts the standard error but not the point estimate (clustering affects only SE, not point estimate). G = 5 is below the Lin-Wei G ≥ 10 reliability threshold, so the SE ratio is an informative lower bound rather than a precisely calibrated CI width—but at 7.4% inflation the qualitative verdict (ROBUST) is stable.

The 1-year exposure window gives a 4.7% higher estimate — directionally consistent with what we expect, not a model instability

The 1-year window (HR = 1.48) is higher than the 5-year window (HR = 1.41)—directionally consistent with evidence that acute-year PM_2.5 spikes (wildfire events, stagnation episodes) carry excess mortality effects beyond the long-run chronic exposure average (Wettstein 2018, Reid 2016). Both windows are well within the ±10% ROBUST threshold; the CI overlap fraction of 59.8% confirms the two estimates are not statistically distinguishable. The 5-year window remains the policy default; the 1-year result is the upward sensitivity.

Caveats

G = 5 is below the Lin-Wei reliability threshold. Reliable Lin-Wei/Tsiatis inference requires G ≥ 10. NHIS public-use ships only 4 census regions + NHANES national stratum. The SE ratio direction (1.074× inflation) is informative; treat the absolute CI width as a conservative bound. NCHS RDC county FIPS would give G = 3K+ counties.
Local baseline (HR = 1.41) differs from Investigation 6-3 headline (HR = 1.28). Two reasons: (a) Investigation 6-3 Phase 7c uses multi-pollutant joint Cox (PM_2.5 + NO₂ + O₃), attenuating β_PM2.5 by ~9%; (b) Investigation 6-3 includes smoking + BMI on ~50% of subjects. Robustness tests here are apples-to-apples (same spec); the level difference does not invalidate the SE ratio or window-shift comparisons.
Gamma frailty omitted. Unmeasured within-region between-subject heterogeneity is not tested. Investigation 6-3’s DerSimonian-Laird τ already captures between-group heterogeneity; within-region frailty requires RDC county-level data.