The molecular track: what I built, found, and why I'm pivoting — Build Log

I’m a modeler by trade: simulation, calibration, being honest about uncertainty. This is me pointing those tools at Alzheimer’s and ALS — the two diseases that are mine. The full personal story is here →

What I set out to do: start with protein aggregation — the mechanism at the center of both diseases — using public deep-mutational-scanning data where scientists have already measured what mutations do. That measurement is an answer key. If a model is wrong, the data says so immediately. No lab required. That built-in check is why a solo effort can make a checkable contribution instead of just writing reports.

The build

Step 1: Reproduce the field’s model

Before building anything I reproduced CANYA — a leading published protein aggregation model — on its own held-out data. AUROC 0.82 on 7,626 random peptides (the paper reports 0.809). Good: my pipeline is faithful, and I have a concrete baseline to improve on.

Step 2: Found and fixed its overconfidence

A model can rank things correctly and still lie about how sure it is. CANYA does. Its raw calibration error was 0.068 — systematically overconfident. Isotonic recalibration cut that approximately ~6x to 0.011, with its ranking completely unchanged. This is a standard technique, not a breakthrough — but it makes the model’s uncertainty trustworthy. The thing I most want this mission to be known for is never handing someone a confidently wrong number about their disease.

Step 3: Built and tested an Aβ-specific disease model

Alzheimer’s has a handful of inherited mutations — dominantly-passed, causing early-onset familial AD — that make amyloid-β aggregate. I trained a from-scratch Aβ-specific model on 14,015 double-mutants and tested it on 468 held-out single-mutants it had never seen, including all 8 familial mutations.

Against the lab measurements, my model’s Spearman correlation was 0.62; zero-shot CANYA managed 0.20. On telling the familial mutations apart from the rest, my model reached AUROC 0.92 — matching the measured assay’s own discrimination ceiling (~0.90) and well above zero-shot CANYA (0.76). To be clear about provenance: that the assay separates the known familial mutations from the rest is the source lab’s own published finding (Seuma et al., eLife 2021), reproduced here — not a discovery of this work. What I add is a from-scratch model that recovers that separation and the honestly calibrated uncertainty around it.

What I will not claim — and this is as important as the result

The 0.92 familial AUROC has a training-advantage component: my model trained on Aβ data; CANYA was applied cold. The honest claim is “Aβ-specific modeling far outperforms a transferred general model,” not “my architecture beats theirs.”
Matching ~0.90 is within its noise. “Matched the ceiling,” not “surpassed it.”
The familial AUROC rests on only 8 positive examples. The number to trust is the 0.62 vs 0.20 Spearman across 468 points.
The method is standard. The contribution is doing it carefully, honestly, and in the open.

Step 4: Tried to cross the protein boundary — and learned why it fails

I pointed the Aβ model at the ALS protein TDP-43. It didn’t just fail — it failed backwards (transfer Spearman -0.19). The reason: aggregation’s relationship to disease inverts sign between the two proteins. A hydrophobic substitution correlates with harm in Aβ (+0.30) but with protection in TDP-43 (-0.45). This is consistent with known biology — it emerged from the pipeline rather than being put in by hand, on 1,196 TDP-43 measurements. The method is surfacing something real. And there are years of genuine open problems here.

The reckoning

After the initial results, I kept going — twelve investigations in total. I tested whether bigger models (protein-language-model representations, scaling four-fold) or cleverer training objectives (asymmetric tail-aware loss) could close the remaining weakness: systematic under-prediction of the most pathogenic variants. They don’t. Not for lack of trying. The pathogenic tail is a signal limit of a shallow single-protein assay; a bigger model or a fancier objective doesn’t resolve a data limit.

I also tried to ask the boldest disease-framed question the data could answer: do second-site suppressors exist for the familial mutations? The answer is PARTIAL at its thinnest: one fragile lead, resting on two discordant replicates, losing significance on either alone — not a map, not a therapeutic claim (you cannot add a second mutation to a person).

Here is the honest account: the Lehner / Bolognesi lab has this comprehensively. They published the energetic and thermodynamic model of Aβ aggregation with hundreds of measured couplings and a structural model of the transition state (Seuma et al., Science Advances 2025), and earlier the deep mutational scan that established that the familial mutations are separable by their measured aggregation effect (Seuma et al., eLife 2021) — the result I reproduce above, not one I discovered. My contribution on the molecular side is real — a rigorously reproduced, honestly calibrated model with trustworthy uncertainty intervals, and a controlled demonstration that the failure of cross-protein transfer is biological, not representational — but it is a contribution built on their foundation, not a competing discovery. I need to be clear about that.

The most defensible thing I built on the molecular track is not a new result about Aβ. It is a model that refuses to be confidently wrong about what it knows and doesn’t know — one whose prediction intervals are sharp where it is informed and honestly wide where it is blind. That’s the edge I came here to demonstrate. And it pointed toward what I should actually be spending time on.

The pivot

I started this mission to build the thing I most wanted to exist — honest foresight about disease progression, with real uncertainty bands, while there’s still time for it to matter.

That is the problem I am turning toward next: calibrated individual prognosis for ALS, starting with speech. Bulbar-onset ALS takes voice first. Acoustic biomarkers from recorded speech may reveal disease progression earlier and more precisely than functional scales alone. The data exists, though it’s access-gated: recorded speech from the Speech Accessibility Project, and longitudinal functional-scale (ALSFRS) trajectories from PRO-ACT across thousands of ALS patients. When the applications clear, the study opens.

Access update · 2026-07-20

PRO-ACT access has now been granted. A separate clinical ALS companion study is in verifier-first planning, with participant-level data kept local and outside any AI context; no analysis has run yet. PRO-ACT contains longitudinal clinical trajectories, not speech audio, so the speech-prognosis study still depends on its own longitudinal audio access.

I’m not claiming I’ve solved anything on the speech side. I haven’t started yet. What I’m saying is that the same discipline I applied to the molecular track — calibrated uncertainty, honest intervals, clear statements of what the model knows and doesn’t — is exactly what a clinical prognosis tool needs. That is the work I came here to build. The molecular work was the proving ground. The speech work is the thing.

The full technical account of the molecular track: the study page → · More in the build log as the speech track opens. — Michael

The molecular track: what I built, what I found, and why I’m pivoting.