The Question
Investigation 13 showed that ML can predict 10-year mortality from health data alone. But people aren’t just their diagnoses — they’re also their bank accounts, their education, their social networks. The social determinants of health (SDOH) literature is clear: wealth predicts longevity. The question is whether adding wealth, income, education, and marital status to a model that already knows your chronic diseases, smoking status, and BMI actually improves prediction — or whether health status already captures most of what wealth does.
ADM Prediction (Made Before Running Models)
Predicted winner: Health+SDOH ML, but modest gain. Chetty et al. (2016) showed a 10–15 year life expectancy gap between top and bottom income quintiles. But most of this gap works through health behaviors and conditions already in our feature set. The interesting question isn’t whether wealth matters — it’s whether wealth adds information BEYOND what health status already captures.
Results
ROC Curves
Feature Importance (Top 8, Health+SDOH)
Wealth Mortality Gradient
Multi-Model Comparison
The wealth-health gradient may differ by age, sex, or baseline health. Does adding SDOH features improve prediction more for some groups than others?
A well-calibrated model should match predicted probabilities to observed frequencies. Does adding SDOH improve calibration as well as discrimination?
The ADM Insight
Wealth matters for longevity — but not as much as your doctor’s notes. Adding socioeconomic data improves prediction, confirming that the health-wealth gradient is real. But the marginal improvement suggests that most of wealth’s effect on mortality is mediated through the health conditions we already measure: diabetes, heart disease, obesity, depression. Wealth’s independent effect is real but smaller than the mediated pathway.
Cohort: HRS RAND respondents aged 50–90 (same as Q13 mortality). Binary outcome: died within 10 years of baseline.
Three models compared: (1) Domain: Charlson-style published risk score using age, chronic conditions, smoking, BMI. (2) Health-only ML: GradientBoostingClassifier on demographics + chronic conditions + health behaviors + functional status. (3) Health+SDOH ML: same GradientBoosting architecture but adds wealth quintile, income quintile, education years, married/partnered status, and medication count.
SDOH variables: Wealth and income quintiles computed within-cohort using pd.qcut. Education measured in years of schooling. Marital status coded as married/partnered vs not. Medication count is total number of prescription medications.
Evaluation: 5-fold stratified cross-validation. Bootstrap 95% CIs from 1,000 resamples. Calibration analysis (Brier score, ECE). Multi-model comparison (LogisticRegression, RandomForest, GradientBoosting).
Wealth is measured at a single timepoint: Lifetime wealth trajectory may matter more than a snapshot. A recently-bankrupt former millionaire and a lifelong low-income worker look very different despite similar current wealth.
HRS oversamples Black and Hispanic respondents: Wealth distributions may not be nationally representative. Within-cohort quintiles may not match population quintiles.
Medication count is ambiguous: It’s partly an SDOH variable (access to healthcare) and partly a health variable (disease burden). Its placement in the “SDOH” feature set is debatable.
Reverse causation: Poor health may cause low wealth (medical bankruptcy, inability to work), not just the other way. Cross-sectional data cannot disentangle the direction.