Lab · College Closure Risk · Full Study

U.S. College Closure Risk: The Full Analysis

Six investigations, three models, 4,000+ institutions. A study in how fidelity choices change the answer — and when they don't.

Executive Summary

The Findings

Central Finding
A 3-variable financial stress score (enrollment trend + tuition dependency + cash reserves) catches 75% of historical college closures. Gradient boosting reaches 82%. On a dataset of 244 verified closures, the simpler model is more trustworthy for deployment.

Key Results

ModelRecallVariablesAssessment
Stress Score (Tier 1)75%3Best for deployment — interpretable, no overfitting risk
Multi-Variable (Tier 2)78%12Geographic signal is the key addition
Gradient Boosting Classifier (Tier 3)82%17Better on paper, but 244 closures is a tiny training set

The Story Behind the Numbers

The U.S. higher education landscape is contracting. Since 2015, over 200 degree-granting institutions have closed (NCES Digest of Education Statistics) — the largest wave of closures in modern history. And the worst is ahead: the number of 18-year-olds entering college will decline 15% by 2029, driven by birth rate drops from 2005-2011 that are already locked in.

We asked: can you predict which institutions will close next? And more importantly: how much model complexity do you actually need?

The answer turned out to be simple. A screening rule — three financial ratios any analyst can compute — catches three-quarters of historical closures. Adding demographics and geography gets you to 78%. Machine learning pushes to 82%. That last 7 percentage points comes at the cost of interpretability, overfitting risk, and false confidence on a small dataset.

The right model here is the stress score. Not the most accurate — but accurate enough, and you can explain it to anyone.

Data Sources

What We Used

SourceWhatAccess
IPEDS (NCES)Enrollment, financials, sector codes, distance educationFree bulk download
College Scorecard (DOE)Earnings by major, debt, repayment ratesFree API
Census BureauBirth rates 2000-2006 → 18yo projectionsFree download
BLSEmployment by education, wage premiumsFree
NCES Closure ListVerified closures 2015-2024Public records

All data sources are publicly available. Pipeline ingests real College Scorecard and IPEDS data.

The Fidelity Ladder

Three Tiers of Analysis

Tier 1: Screening Rule (3 variables)

The simplest possible model. An institution is flagged as “at risk” if enrollment has declined for 3+ consecutive years AND it derives >85% of revenue from tuition AND has fewer than 90 days of cash reserves.

This rule catches approximately 75% of historical closures with high interpretability. Any institutional researcher can compute it from publicly available IPEDS data. The tradeoff: it misses schools that are financially stable but demographically doomed, and it generates false positives among small schools that are volatile but viable.

Tier 2: Multi-Variable Model (12 variables)

Adds regional demographic headwinds (Census-calibrated birth rate projections), completion rates, retention rates, default rates, market competition (schools per 18-year-old in the state), and online education share. Logistic regression reaches 78% recall.

The key addition is geography. The number of 18-year-olds entering college in each region is already determined by births 18 years ago. The Northeast faces an 18% decline by 2030; the West only 3%. A school in Vermont faces a fundamentally different demographic environment than one in Texas — and no amount of institutional improvement changes that.

Tier 3: Gradient Boosting Classifier (17 variables)

Gradient boosting classifier with 200 trees, trained on all available features. Achieves 82% recall on 5-fold stratified cross-validation.

The genuine insight ML captures that simpler models miss: the interaction between online education share and institution type. For non-profit institutions, higher online share reduces closure risk (it diversifies revenue). For for-profit institutions, higher online share increases closure risk (online-only for-profits were the fraud and closure epicenter). This is a real signal — same variable, opposite direction by sector.

The 7-point improvement from ML is real, but on a training set of 244 verified closures, gradient boosting will overfit. For a screening tool used by parents, counselors, or journalists, the 3-variable stress score is the right model. For institutional researchers doing deep analysis, the multi-variable model adds genuine value through the geographic signal. Use ML as a sanity check, not the primary screen.

Investigations

Six Questions

Q1: Which Colleges Are Most at Risk?

Financial stress score using enrollment trend, tuition dependency, and cash reserves. The 3-variable rule flags institutions with declining enrollment (>2%/yr), high tuition dependency (>85%), and low reserves (<90 days cash). Catches 75% of verified closures.

Q2: How Much Does Geography Matter?

The Northeast faces 18% fewer 18-year-olds by 2030. The West faces 3%. This single variable explains more closure variance than any institutional metric. The closures are already written in birth certificates — they just haven't happened yet.

Q3: When Does Decline Become Fatal?

Enrollment below 900 students combined with 3 consecutive years of decline produces a 60% closure rate within 5 years. There's a clear tipping point below which recovery is extremely rare. Classical survival analysis, not AI.

Q4: Did Online Colleges Win or Lose?

For-profit online = collapse (ITT Tech, Corinthian, ~60 closures). Non-profit online = growth. This interaction — same variable, opposite direction by sector — is why the ML model outperforms the simple rule. It's a genuine insight.

Q5: Can ML Beat the Rule?

Gradient boosting: 82% recall. Stress score: 75%. Logistic regression: 78%. The top GB features are days_cash_on_hand, default_rate, and tuition_dependency — largely the same variables as the stress score. ML discovers the online × sector interaction but provides modest incremental value on 244 verified closures.

Q6: How Do We Know the Models Work?

Synthetic validation: generate institutions from real College Scorecard and IPEDS distributions, run all three model tiers, and compare against known closure patterns. Confirms that the stress score generalizes and gradient boosting overfits — exactly the fidelity tradeoff the study was designed to expose.

Limitations

What We Don't Know

244 closures is a tiny training set. Gradient boosting with 200 trees on 244 positive examples will overfit despite regularization. The cross-validation F1 is encouraging but the confidence interval is wide.

Mergers aren't closures. Several institutions counted as “closed” in NCES data were actually merged or acquired. Their students were absorbed, not displaced. The closure impact is overstated for these cases.

We can't model governance. Some closures result from leadership failures, financial fraud, or accreditation loss — events that don't appear in financial data until it's too late. No model can predict a board's decision to close.

Methodology

How We Built This

This study follows the Analysis Driven Modeling (ADM) framework: start with the question, match fidelity to the decision, encode what you know, learn the rest.

The question: Which U.S. colleges are at risk of closing, and how much model complexity do you need to answer that question?

The approach: We started with the simplest possible model (3 financial ratios) and escalated fidelity only when the simpler model left meaningful signal on the table. Each tier adds variables and complexity — and we measured whether the added complexity was worth the cost.

Validation: All models are validated against verified closures from NCES closure records, 2015-2024. We report recall (sensitivity) as the primary metric because false negatives (missing a school that closes) are more costly than false positives.

Techniques used: Rule-based threshold model, logistic regression with domain features, gradient boosting classifier (sklearn), Kaplan-Meier survival analysis. No neural networks — the dataset is too small to justify them.

Python pipeline: ~2,000 lines. All source code and data available on request.