Skip to content
Longevity Study → Investigation 4

Do Health Norms Transfer Across Populations?

9,254 Americans. 9,549 Chinese. Published adjustment factors are 20 years old. Fine-tuning does better — reducing transfer error by 57% vs raw transfer and 42% vs published adjustments. But domain knowledge still identifies which biomarkers need adjustment.

18,803
Total Subjects
NHANES + CHNS combined
2
Populations Compared
USA (NHANES) & China (CHNS)
NUANCED
Verdict
Domain provides the framework; data calibrates the numbers

ADM Prediction (Made Before Running Models)

Predicted winner: Domain frames the problem. Cross-population transfer is a calibration problem, not a learning problem. Published Asian-specific biomarker differences (HbA1c, CRP, BMI) are well-documented. The question is whether these adjustments are still accurate or outdated by westernization.

Expected: Domain knowledge provides the framework; data calibrates the numbers. Actual: Published adjustments confirmed for 4/7 biomarkers, but 3/7 reversed — outdated literature hurts. Fine-tuning on 30% local data outperforms pure transfer. Prediction confirmed.

The Challenge
Cross-Population Biomarker Transfer

Clinical reference ranges were developed primarily in Western populations. When applied to East Asian populations, systematic biases emerge — CRP levels differ by 85%, triglycerides are 15% higher in China, and HbA1c thresholds may misclassify diabetes risk. Published adjustment factors exist but date from early 2000s studies. Dietary patterns, urbanization rates, and metabolic syndrome prevalence have shifted dramatically in both countries since then.

US

NHANES 2017-2018

9,254 participants. Nationally representative US sample with full biomarker panels.

CN

CHNS 2009

9,549 participants. Multi-province Chinese longitudinal study with comparable biomarkers.

D

Domain Model

Published literature adjustment factors for cross-population biomarker conversion.

ML

Fine-Tuned Model

Model trained on NHANES, fine-tuned on a small CHNS subset to learn current population shifts.

The raw transfer baseline applies NHANES-trained norms directly to CHNS data with no adjustment. This measures the naive cross-population gap.

The adjusted transfer applies published adjustment factors from the medical literature — offsets for HbA1c, CRP, HDL, etc., derived from studies in the early 2000s. These capture known population differences but may be outdated.

The fine-tuned transfer trains on NHANES, then fine-tunes on a held-out CHNS subset. This lets the model learn current population-specific patterns rather than relying on 20-year-old literature values. The result: 57% error reduction vs raw transfer and 42% vs published adjustments.

Population Comparison
NHANES vs CHNS Biomarker Means

Side-by-side comparison of mean biomarker values across the two populations. Some differences are well-documented (CRP, triglycerides); others are less studied. The magnitude of difference varies substantially across biomarkers.

Mean values by biomarker. NHANES (blue) vs CHNS (red). Difference labels below each pair.
Transfer Methods
Transfer Error by Adjustment Strategy

Three strategies for applying NHANES norms to CHNS data: raw transfer (no adjustment), published literature adjustments, and fine-tuning on cross-population data. Fine-tuning outperforms because the world has changed since the adjustment factors were published.

Normalized transfer error. Lower is better. Fine-tuning reduces error by 57% vs raw and 42% vs published.
Published vs Discovered
Which Differences Were Already Known?

Domain knowledge correctly identifies which biomarkers need adjustment. The model discovers updated magnitudes and interaction patterns the literature doesn't capture.

Green = published adjustment exists. Gold = newly discovered by model. Dashed line = published adjustment magnitude.
Distribution Overlap
HbA1c Distribution: NHANES vs CHNS

The population distributions overlap substantially but are shifted. A fixed threshold (e.g., HbA1c ≥ 6.5% for diabetes) misclassifies more in one population than the other. Visualizing the overlap makes clear why simple offsets are insufficient.

HbA1c (%) distributions. Blue = NHANES, Red = CHNS. Dashed gold line = diabetes threshold.
ADM Insight
This investigation shows where deploying ML alone would waste money. Published population adjustments are outdated — dietary westernization in China has shifted metabolic profiles since the early 2000s. Fine-tuning on current data improves calibration by 42%. But notice what domain knowledge provides that no fine-tuning can: the framework. It correctly identifies WHICH biomarkers differ (CRP, HbA1c, triglycerides) and WHY (dietary patterns, urbanization, genetic factors). Without that framework, a purely data-driven approach would need orders of magnitude more data to rediscover what physiology already knows. The cheapest model is the one that starts with the right questions.

Two populations only: NHANES (USA) and CHNS (China). Transfer to other populations (South Asian, Sub-Saharan African, Latin American) would require separate validation.

Different survey years: NHANES 2017-2018 vs CHNS 2009. The 8-year gap introduces temporal confounding — some “population differences” may be secular trends.

Not all biomarkers matched: NHANES and CHNS measure overlapping but not identical biomarker panels. Comparison limited to 7 shared biomarkers.

Fine-tuning sample size: The fine-tuned model uses a held-out subset of CHNS. In practice, obtaining labeled data from a new population is the bottleneck — the method assumes some local data is available.

Key Takeaways
  1. Fine-tuning beats published adjustments by 42% — literature values from the early 2000s no longer reflect current population metabolic profiles. Urbanization, dietary shifts, and obesity trends have changed the numbers.
  2. Domain knowledge identifies WHAT to adjust; ML calibrates HOW MUCH — published literature correctly flags HbA1c, CRP, and BMI as requiring cross-population adjustment. But the magnitudes are wrong. The hybrid approach uses both.
  3. CRP shows the largest gap (85% difference) — inflammation markers are the most sensitive to population-specific dietary and lifestyle patterns, making fixed adjustment factors particularly unreliable.