Longevity → Interactive

The Biology Paradox

Three methods predict biological age from seven blood biomarkers. A 40-year-old formula. A gradient-boosted ML model. A hybrid of both. Click through to see which one loses.

What You're About to See

Biological age measures how fast your body is aging relative to your calendar age. We tested three approaches on 1,907 NHANES participants, correlating predicted age-acceleration against actual health outcomes (hypertension, obesity, hyperglycemia) via split-half cross-validation.

KDM (Published Formula) — Klemera-Doubal method, published 2006. Encodes 40 years of geroscience into 7 regression coefficients.
ML Alone (GradientBoosting) — learns from the same 7 biomarkers with no domain guidance. Pure data-driven.
Hybrid (KDM + ML) — feeds the KDM delta as a feature into the ML model. Domain knowledge becomes training signal.
Method 1 of 3
Use arrows or Space to navigate
What The Hybrid Learned

Feature Importance

The hybrid model's top input was the KDM formula's own output — not any raw biomarker. Domain knowledge became the most informative training signal.

KDM Delta
44.1%
HbA1c
15.8%
CRP
15.3%
Age
10.3%
HDL
7.5%
Cholesterol
6.3%
Sex
0.7%

The domain formula's output (KDM Delta) alone accounts for 44% of the hybrid model's predictive power. The ML model didn't replace domain knowledge — it amplified it.

The Lesson

What Just Happened?

You watched three methods attempt the same task: predict how fast a person is aging from seven blood biomarkers. The data didn't change. The method did. And more computation made it worse.

The KDM formula encodes 40 years of geroscience research — which biomarkers track aging, how they covary, and how to weight them. It achieves r = 0.527 with just seven regression coefficients. The gradient-boosted ML model, given the same seven inputs but no domain guidance, found spurious correlations and dropped to r = 0.478.

The hybrid's top feature was KDM_Delta (44% importance), followed by HbA1c (16%) and CRP (15%). The ML model didn't replace domain knowledge — it amplified it. The domain formula became the most informative input, and the ML model learned the residual patterns the formula missed.

The Direction Matters

The same data. The same models. But flip the prediction direction, and the winner reverses completely.

Domain wins

Forward Prediction

Biomarkers → biological age
KDM (Domain) r = 0.527
AI r = 0.478
Hybrid r = 0.631
AI wins

Reverse Prediction

Age → biomarkers
KDM (Domain) r = 0.271
AI r = 0.564
Hybrid r = 0.575

Forward: domain knowledge correctly models how biomarkers encode aging. Reverse: the KDM formula was never designed to predict individual biomarker values from age — AI learns these nonlinear mappings 2× better. The question determines the method.

This is the opposite of the energy and PFAS studies, where increasing fidelity monotonically improved results. Biology is different: the published formula already encodes the right structure. An ML model that ignores that structure doesn't just start from scratch — it actively overfits to noise the formula correctly ignores.

This is Analysis Driven Modeling™ at its sharpest: sometimes the right answer is a 40-year-old formula, sometimes it's ML, and sometimes it's both. The question determines the method. Not the other way around.

Validation: All correlations from forward split-half on 1,907 NHANES participants. Features (CRP, cholesterol, HDL, HbA1c) and validation targets (hypertension, obesity, hyperglycemia) use different biomarkers, preventing circular prediction. Hybrid vs. ML improvement is statistically significant (p < 0.001).