How Much Is Better Site Characterization Worth?
$500K in targeted measurements cuts the planning window from 32 to 22 years, saving $15M+ in contingency. And the 3D model reveals K matters more than Kd.
What Is Kd (Distribution Coefficient)?
Kd measures how much PFAS sticks to soil versus staying dissolved in water. A higher Kd means more contaminant sorbs to the solid phase — the plume moves slower, but cleanup takes longer because you have to flush more pore volumes to desorb the chemical.
For PFOS, published Kd values range from 0.5 to 20 L/kg depending on soil organic carbon content, mineralogy, and pH — a 40x range that dominates cleanup time predictions. At Kd = 0.5, the plume moves nearly as fast as groundwater. At Kd = 20, it creeps. The retardation factor R = 1 + (ρb/n) · Kd controls everything downstream: arrival time, pump duration, remediation cost.
The problem: most site investigations measure Kd from 3–5 soil samples. That's not enough to constrain a parameter with a 40x range. Every dollar of uncertainty in Kd propagates into millions of dollars of uncertainty in cleanup cost.
Site Characterization Cuts the Spread in Half
We importance-weighted our 200-realization 3D Monte Carlo ensemble to compare two scenarios: wide parameter uncertainty (typical 3–5 sample investigation) versus targeted characterization of both hydraulic conductivity and sorption (20–30 pump tests + lab samples, ~$500K).
Without characterization, the P5 arrival time is 5 years and the P95 is 37 years — a 32-year spread. With targeted characterization, the P5 shifts to 8 years and the P95 drops to 30 years. The planning window shrinks from 32 years to 22 years. That's the difference between “it might already be here, or we have decades” and “we have 8–30 years to act.”
200-realization 3D Monte Carlo, importance-weighted. Wide: Kd log_std=0.8, K log_std=0.3. Narrow: Kd log_std=0.3, K log_std=0.15.
Where Does the Uncertainty Come From?
Not all parameters contribute equally to cleanup time uncertainty. We decomposed the Monte Carlo variance to identify which parameters matter most — and therefore which measurements have the highest return on investment.
K (hydraulic conductivity) alone accounts for 51% of arrival time variance. Kd (sorption) adds another 30%. Together they explain 81% of the spread. Hydraulic gradient contributes 18%; porosity is negligible at 2%.
This tells site managers exactly where to spend their characterization budget: pump tests and slug tests first (they target the dominant parameter), then soil sorption testing. The 3D model reveals that K matters almost twice as much as Kd — a finding that 2D models miss because they can't represent preferential flow paths.
Spend Money on Information Before Concrete
ROI of characterization: 30x. $500K in targeted pump tests and sorption measurements narrows the planning window from 32 years to 22 years, saving $15M+ in avoided contingency. Only Monte Carlo can make this calculation — deterministic models can't quantify the value of reducing uncertainty because they don't represent uncertainty in the first place.
This is a value-of-information analysis. The question isn't “what's the answer?” — it's “how much would a better answer be worth?” The Monte Carlo framework lets you put a dollar value on every additional measurement. When the answer is “30x return,” the characterization campaign pays for itself before the first well is drilled.
For remediation contractors pricing long-term contracts: uncertainty in K and Kd can swing arrival time by 32 years. If your bid is based on single parameter values from the site assessment, you're gambling. Require the site owner to fund targeted characterization before you commit to a fixed-price contract.
200-realization 3D Monte Carlo, importance-weighted. Wide: Kd log_std=0.8, K log_std=0.3. Narrow: Kd log_std=0.3, K log_std=0.15.
Can Machine Learning Replace Measurement?
If Kd is the dominant uncertainty, and site characterization costs $500K, an obvious question arises: can we predict Kd from cheaper-to-measure soil properties instead? Soil organic carbon content, pH, clay percentage, and grain size are routinely measured during any environmental site assessment. If these properties could reliably predict Kd, you could skip the expensive sorption testing entirely.
We tested this idea using the largest available dataset of PFAS sorption measurements: 1,227 laboratory experiments compiled from 47 published studies, covering 47 PFAS compounds across 451 different soil types (Kühne et al. 2025, Environ. Sci. Technol.). Each experiment measured how much PFAS sorbed to a soil sample under controlled conditions, along with the soil’s organic carbon content, pH, clay/silt/sand fractions, and cation exchange capacity.
We trained three models at increasing sophistication — mirroring the fidelity progression used throughout this study.
Three Models, One Test
Organic Carbon Partitioning
The standard textbook model: Kd = Koc × foc, where Koc is how strongly the chemical partitions to organic carbon (known from its molecular structure) and foc is the fraction of organic carbon in the soil (measurable for ~$50/sample). We added published corrections for pH and clay content. No data fitting — pure chemistry.
Random Forest
A random forest is an ensemble of 200 decision trees, each trained on a random subset of the data. It learns patterns from ALL available features — molecular weight, fluorine count, chain length, pH, organic carbon, clay content, and cation exchange capacity — without any physics constraints. The model finds whatever statistical relationships maximize prediction accuracy.
Physics Backbone + Learned Correction
The physics model provides the baseline prediction. A second machine learning model (gradient boosting) is trained not on Kd directly, but on the error in the physics prediction — learning to correct the systematic biases that pure chemistry misses. The final prediction is: physics estimate + learned correction.
Reading the cards: R² measures prediction accuracy on held-out data — 1.0 is perfect, 0.0 is no better than guessing. We used 10-fold cross-validation: train on 90% of the data, test on the remaining 10%, rotate 10 times, average the scores. The ML models score 0.83–0.84, which looks strong. But that score is computed within the training distribution. The real question is what happens outside it.
The Cape Cod Test
Joint Base Cape Cod is our validation site. USGS measured PFOS concentrations at 62 monitoring wells in 2020. From the observed plume extent (2,700 meters in 55 years), we back-calculated the effective field Kd: 0.39 L/kg — far below the literature default of 1.5 L/kg (Anderson et al. 2019). Cape Cod’s glacial outwash sand has very little organic carbon (0.3%) and almost no clay (5%) — an extreme soil that sits at the far edge of the training data.
We gave each model Cape Cod’s soil properties and asked: what is the PFOS Kd?
| Model | Predicted Kd | Error vs. Field |
|---|---|---|
| Literature default (Anderson 2019) | 1.50 L/kg | 285% |
| Pure physics (Koc × foc) | 4.00 L/kg | 926% |
| Physics-informed ML | 7.02 L/kg | 1,700% |
| Pure ML (Random Forest) | 12.67 L/kg | 3,148% |
| Actual (field, USGS 2020) | 0.39 L/kg | — |
Every model overshoots by at least an order of magnitude. The physics-informed model does beat pure ML — cutting the error roughly in half — but “half of terrible” is still terrible. The simple literature default of 1.5 L/kg, despite being “wrong,” outperforms all three ML approaches.
Why R² = 0.84 Can Be 1,700% Wrong
This is the most important lesson in the entire experiment. An R² of 0.84 means the model explains 84% of the variation within the training data. That data is overwhelmingly from moderate soils — the median organic carbon in the dataset is 1.3%, the median clay content is 24%. Within that range, the model interpolates well.
Cape Cod sits far outside that range: 0.3% organic carbon, 5% clay. The model has almost no training examples from soils this extreme. When it extrapolates — predicting outside the range of data it learned from — it fails catastrophically. This is the fundamental limitation of data-driven models: they learn patterns in the data they’ve seen, and those patterns don’t necessarily hold in new territory.
The Lab-to-Field Gap
Even if the training data included more sandy, low-carbon soils, a deeper problem remains. The 1,227 measurements are laboratory batch sorption experiments — a researcher takes a soil sample, crushes and sieves it, shakes it with PFAS-contaminated water, and measures how much PFAS sticks to the soil particles. This is a controlled, small-scale measurement.
In the field, PFAS transport through intact geological structure involves processes that crushed-soil experiments cannot capture:
- Preferential flow — water (and contaminant) channels through high-permeability paths, bypassing most of the soil matrix
- Air-water interface sorption — in unsaturated sands, PFAS accumulates at air-water boundaries that don’t exist in a shaken flask (Brusseau 2018)
- Scale effects — a 10-gram lab sample cannot represent the heterogeneity of a 2,700-meter plume
- Non-equilibrium transport — batch experiments assume the PFAS reaches sorption equilibrium; in flowing groundwater, it may not
The field Kd of 0.39 L/kg at Cape Cod is an effective value that integrates all of these real-world mechanisms. It is not the same quantity that lab experiments measure, even though both are called “Kd.”
For practitioners: If a vendor offers an “AI-powered Kd prediction” trained on published sorption data, ask them how it performs on out-of-distribution soils. A model that scores R² = 0.84 on its own test set can be off by 30x on your site. The $500K you spend on field characterization is not just buying a better number — it’s buying a number that actually describes your aquifer.
Data: Kühne et al. (2025), “Modeling PFAS Sorption in Soils Using Machine Learning,” Environ. Sci. Technol., doi:10.1021/acs.est.4c13284. Dataset: 1,227 Kd entries, 47 PFAS, 451 soils, 47 source studies (supplementary file es4c13284_si_002.xlsx). Models: scikit-learn RandomForestRegressor (200 trees), GradientBoostingRegressor (200 trees, lr=0.05, max_depth=4). Physics backbone: log Kd = log Koc(CF2) + log foc − 0.08(pH − 6) + 0.005(clay%). 10-fold cross-validation, shuffle, seed=42. Cape Cod soil: Sand=85%, Silt=10%, Clay=5%, Corg=0.3%, pH=6.0, CEC=2.0 cmol+/kg (Walter et al. 2018, USGS SIR 2018-5139). Field Kd: 0.39 L/kg, back-calculated from observed plume extent at 62 USGS monitoring wells (Water Quality Portal, 2020 sampling campaign, 49 PFOS detections, 1.3–610 ng/L).