The Biology Paradox — Right Fidelity AI

What You're About to See

Biological age measures how fast your body is aging relative to your calendar age. We tested three approaches on 1,907 NHANES participants, correlating predicted age-acceleration against actual health outcomes (hypertension, obesity, hyperglycemia) via split-half cross-validation.

KDM (Published Formula) — Klemera-Doubal method, published 2006. Encodes 40 years of geroscience into 7 regression coefficients.

ML Alone (GradientBoosting) — learns from the same 7 biomarkers with no domain guidance. Pure data-driven.

Hybrid (KDM + ML) — feeds the KDM delta as a feature into the ML model. Domain knowledge becomes training signal.

Method 1 of 3

Use ← → arrows or Space to navigate

What The Hybrid Learned

Feature Importance

The hybrid model's top input was the KDM formula's own output — not any raw biomarker. Domain knowledge became the most informative training signal.

KDM Delta

44.1%

HbA1c

15.8%

CRP

15.3%

Age

10.3%

HDL

7.5%

Cholesterol

6.3%

Sex

0.7%

The domain formula's output (KDM Delta) alone accounts for 44% of the hybrid model's predictive power. The ML model didn't replace domain knowledge — it amplified it.

The Lesson

What Just Happened?

You watched three methods attempt the same task: predict how fast a person is aging from seven blood biomarkers. The data didn't change. The method did. And more computation made it worse.

The KDM formula encodes 40 years of geroscience research — which biomarkers track aging, how they covary, and how to weight them. It achieves r = 0.527 with just seven regression coefficients. The gradient-boosted ML model, given the same seven inputs but no domain guidance, found spurious correlations and dropped to r = 0.478.

The hybrid's top feature was KDM_Delta (44% importance), followed by HbA1c (16%) and CRP (15%). The ML model didn't replace domain knowledge — it amplified it. The domain formula became the most informative input, and the ML model learned the residual patterns the formula missed.

The Direction Matters

The same data. The same models. But flip the prediction direction, and the winner reverses completely.

Domain wins

Forward Prediction

Biomarkers → biological age

KDM (Domain) r = 0.527

AI r = 0.478

Hybrid r = 0.631

AI wins

Reverse Prediction

Age → biomarkers

KDM (Domain) r = 0.271

AI r = 0.564

Hybrid r = 0.575

Forward: domain knowledge correctly models how biomarkers encode aging. Reverse: the KDM formula was never designed to predict individual biomarker values from age — AI learns these nonlinear mappings 2× better. The question determines the method.

This is the opposite of the energy and PFAS studies, where increasing fidelity monotonically improved results. Biology is different: the published formula already encodes the right structure. An ML model that ignores that structure doesn't just start from scratch — it actively overfits to noise the formula correctly ignores.

This is Analysis Driven Modeling™ at its sharpest: sometimes the right answer is a 40-year-old formula, sometimes it's ML, and sometimes it's both. The question determines the method. Not the other way around.

Validation: All correlations from forward split-half on 1,907 NHANES participants. Features (CRP, cholesterol, HDL, HbA1c) and validation targets (hypertension, obesity, hyperglycemia) use different biomarkers, preventing circular prediction. Hybrid vs. ML improvement is statistically significant (p < 0.001).

Study Back to Longevity Next → Bio-Age Calculator