Longevity Study → Investigation 9

Can We Predict BMI Trajectory?

7,682 people. Domain models work short-term. ML works long-term. The hybrid model crosses over at 12 years.

7,682
People Tracked
+10.3%
Hybrid Improvement (12yr)
HYBRID
Winner at 12+ years

ADM Prediction (Made Before Running Models)

Predicted winner: Hybrid (crossover). Population age-BMI curves capture the average pattern (BMI rises through middle age, plateaus ~65, declines after 75). Individual deviation is driven by lifestyle changes and disease onset. At short horizons, population curves may suffice. At longer horizons, individual variation compounds and ML adds value.

Expected: Domain competitive short-term; hybrid wins at 8–12 years. Actual: Hybrid overtakes domain at 8-year horizon (+10.3% at 12yr). Prediction confirmed.

The Crossover: When Hybrid Wins

At short horizons, the population BMI curve is competitive because individuals haven't had time to diverge from the mean. As the prediction horizon extends, the hybrid model that combines population trajectory with learned individual adjustments overtakes both the domain-only and ML-only models.

Improvement over baseline by prediction horizon
Population BMI Curve (with IQR band)
Individual BMI Trajectories
What Drives the Prediction?

Current BMI dominates at short horizons, but activity level, depression score, and metabolic markers become increasingly important at longer prediction windows — patterns the hybrid model captures.

Top 8 Feature Importances (Hybrid Model)

The domain model uses published population BMI curves stratified by age and sex. Given a person's current age, sex, and BMI, it projects forward using the population mean trajectory for their demographic group.

The ML model uses gradient-boosted trees trained on baseline biomarkers, lifestyle factors, and historical BMI changes to predict future BMI directly.

The hybrid model uses the population curve prediction as a backbone feature and trains on the residual — the difference between the population prediction and actual outcome. This preserves the population-level signal while learning individual corrections.

Marginal hybrid advantage: The hybrid model improves over ML-alone by only +0.4% at the 12-year horizon. This suggests the population age curve provides a useful but small prior — most of the hybrid's value comes from the ML component.

Self-reported BMI: HRS BMI is calculated from self-reported height and weight, which systematically underestimates true BMI (overreported height, underreported weight). Measured BMI data would provide cleaner signals.

Biennial measurements: BMI is measured every 2 years. Rapid weight changes (illness, lifestyle shifts) between waves are invisible. The "crossover at 12 years" may partly reflect this measurement limitation.

Prediction vs intervention: Predicting BMI trajectory is different from changing it. The model identifies who will gain or lose weight but doesn't address whether intervention would alter the trajectory.

ADM Insight

BMI trajectories are deceptively predictable short-term — most people's weight doesn't change dramatically in 2 years. Domain knowledge (population regression to mean, age-related trends) handles this well. But over 12 years, individual life events — career changes, health shocks, behavioral shifts — create divergence that only data-driven models capture. The hybrid model wins at the timescale that matters for intervention planning.