California Freight Cleanup → Investigation 6-10
Does the mortality model survive the two most common reviewer challenges?
SE ratio 1.074× — cluster-robust; +4.7% HR shift — 1-yr vs. 5-yr windowPre-cascade robustness discipline for the central CRF parameter. Two targeted tests answer the two most common Cox-model reviewer challenges before HR = 1.28 enters the portfolio NB calculations.
Decision context
The mortality risk estimate (HR = 1.28 per 10 μg/m³) feeds every health-cost and net-benefit calculation in the California Freight Cleanup portfolio. Before that number propagates into portfolio rankings, it needs to survive two challenges a knowledgeable reviewer would raise. First: people in the same region share similar air-quality exposure, which can make standard statistical confidence intervals look narrower than they really are. Second: the model uses a 5-year trailing PM2.5 average, following the Pope 2009 cumulative-exposure framework. Does shifting to a 1-year window move the estimate outside a defensible range?
Methodology
Test A — Cluster-robust sandwich SE (Lin-Wei / Tsiatis, G = 5).
The standard Cox approach (lifelines cluster_col=) is O(N²) on
score residuals and was infeasible at N = 706K (process killed after
25 minutes). statsmodels PHReg groups= uses the Tsiatis
score-vector aggregation (O(N log N) under Breslow ties), completing in ~60 s.
The mathematical result is identical; only the computational path differs.
The apples-to-apples local baseline (statsmodels, single-pollutant, 5-year window,
no clustering: HR = 1.41) is run first so that the SE ratio and window-shift
comparisons are not confounded by the multi-pollutant Phase 7c adjustment.
Test B — 1-year PM2.5 exposure window.
AQS state annual mean PM2.5 aggregated to census-region means (same
mapping as Investigation 6-7), joined to the panel as pm25_1yr_mean.
Same statsmodels PHReg specification (single-pollutant, no cluster) run on
pm25_1yr_mean vs. pm25_5yr_mean.
Test C (consciously omitted) — Gamma frailty. lifelines does not support shared gamma frailty for 706K subjects tractably. Investigation 6-3’s hybrid already captures between-group heterogeneity via DerSimonian-Laird τ = 0.129; per-group frailty requires NCHS RDC county FIPS (deferred to Phase 7).
Verdict rule. ROBUST: both tests within ±10% HR shift and CI overlaps baseline. MILDLY SENSITIVE: one test 5–10% shift or CI overlap <50%. SENSITIVE: >10% shift or no CI overlap.
Headline results
Important baseline note: the local baseline for these robustness tests is HR = 1.41 (single-pollutant, statsmodels PHReg, 5-yr window), not the Investigation 6-3 headline of HR = 1.28. The gap reflects two adjustments made in Investigation 6-3’s Phase 7c run: (a) multi-pollutant joint Cox (PM2.5 + NO2 + O3) attenuates βPM2.5 by ~9% (carrying HR from ~1.41 to ~1.28); (b) smoking and BMI covariates on ~50% of subjects shift the point estimate within that range. The SE ratio and window-shift comparisons are apples-to-apples within this investigation; the level difference between 1.41 and 1.28 does not affect those tests.
| Test | Metric | Baseline | Test result | Verdict |
|---|---|---|---|---|
| A: Cluster-robust SE (G = 5) | SE ratio (cluster / standard) | 1.000 | 1.074× | ROBUST |
| A: Cluster-robust SE | HR shift | 1.41 | 1.41 (+0.00%) | ROBUST |
| A: Cluster-robust SE | CI overlap with local std | — | YES | ROBUST |
| B: 1-year vs. 5-year window | HR per 10 μg/m³ | 1.41 | 1.48 (+4.7%) | ROBUST |
| B: 1-year vs. 5-year window | CI overlap with 5-year | — | 59.8% | ROBUST |
Correcting for within-region correlation widens the confidence interval by only 7.4% — not material
The Lin-Wei/Tsiatis cluster-robust SE (G = 5 census regions) inflates by 1.074× relative to the standard non-robust SE. The Investigation 6-3 CI is not materially understated by within-region exposure correlation. HR is unchanged — clustering adjusts the standard error but not the point estimate (clustering affects only SE, not point estimate). G = 5 is below the Lin-Wei G ≥ 10 reliability threshold, so the SE ratio is an informative lower bound rather than a precisely calibrated CI width—but at 7.4% inflation the qualitative verdict (ROBUST) is stable.
The 1-year exposure window gives a 4.7% higher estimate — directionally consistent with what we expect, not a model instability
The 1-year window (HR = 1.48) is higher than the 5-year window (HR = 1.41)—directionally consistent with evidence that acute-year PM2.5 spikes (wildfire events, stagnation episodes) carry excess mortality effects beyond the long-run chronic exposure average (Wettstein 2018, Reid 2016). Both windows are well within the ±10% ROBUST threshold; the CI overlap fraction of 59.8% confirms the two estimates are not statistically distinguishable. The 5-year window remains the policy default; the 1-year result is the upward sensitivity.
Caveats
- G = 5 is below the Lin-Wei reliability threshold. Reliable Lin-Wei/Tsiatis inference requires G ≥ 10. NHIS public-use ships only 4 census regions + NHANES national stratum. The SE ratio direction (1.074× inflation) is informative; treat the absolute CI width as a conservative bound. NCHS RDC county FIPS would give G = 3K+ counties.
- Local baseline (HR = 1.41) differs from Investigation 6-3 headline (HR = 1.28). Two reasons: (a) Investigation 6-3 Phase 7c uses multi-pollutant joint Cox (PM2.5 + NO2 + O3), attenuating βPM2.5 by ~9%; (b) Investigation 6-3 includes smoking + BMI on ~50% of subjects. Robustness tests here are apples-to-apples (same spec); the level difference does not invalidate the SE ratio or window-shift comparisons.
- Gamma frailty omitted. Unmeasured within-region between-subject heterogeneity is not tested. Investigation 6-3’s DerSimonian-Laird τ already captures between-group heterogeneity; within-region frailty requires RDC county-level data.