Pre-CV Screening Inflates Drug Response AUCs by ≥0.10

Today's Overview

Pre-CV Feature Screening Creates Widespread Leakage in Cancer Drug Response Models Pre-CV feature screening inflates accuracy by 16.6% MSE on average across 265 cancer drugs.

Featured

01Pre-CV Feature Screening Creates Widespread Leakage in Cancer Drug Response Models

Accurate prediction of drug response in cancer cell lines is central to identifying genomic biomarkers that can guide patient stratification. Yet most published models rely on supervised feature selection applied to the entire dataset before cross-validation, a protocol that leaks label information and spuriously lowers prediction error.

Re-analyzing 265 drugs across 1,462 cell lines showed that leakage-free cross-validation increases mean squared error by 16.6% on average. The leaked and corrected pipelines share almost no selected features (mean Jaccard = 0.18), and 36% of drugs have zero overlap, even though the leaked version retains fivefold more genes. Both pipelines recover known drug targets at similar rates, indicating the extra features capture noise, not biology. A survey of 32 recent methods found leakage in 72%, collectively cited >3,000 times, with error reductions mirroring those reported over elastic-net baselines.

All experiments remain in silico on public cell-line panels; no prospective wet-lab validation is reported. The audit focused only on one leakage mode—pre-CV feature screening—so other protocol flaws may remain. The authors supply code templates for leakage-free evaluation, but adoption will require re-running benchmarks across the field.

Pre-CV feature screening inflates accuracy by 16.6% MSE on average across 265 cancer drugs.Leaked models select five times more genes yet share <20% overlap with leakage-corrected models.72% of surveyed methods (23/32) contain this leakage, equaling typical gains over elastic-net baselines.

Source: Widespread data leakage inflates accuracy and corrupts biomarker discovery in cancer drug response prediction

Also Worth Noting

Benchmarking pKa prediction on 90,000 public data pointsGeneral AIDD

Seven pKa prediction algorithms (three commercial, four open-source ML) were benchmarked on a curated 90,000-entry public data set from 31,000 molecules to quantify accuracy across charge states and polyprotic species. link (Chem)

Today's Observation

Cancer drug response prediction is a canonical benchmark for multi-omics machine learning, yet a widespread data-leakage pitfall undercuts its utility. A survey of 32 recent studies shows that 72 % perform feature selection before cross-validation, inflating mean-squared error by 16.6 % on average across 265 compounds in GDSC and CCLE. Leakage drives models to pick five times more genes than leakage-corrected pipelines, and the two gene sets overlap <20 %, indicating the inflated scores reflect sample-specific noise rather than generalizable signal. The gains reported over plain elastic-net baselines disappear once leakage is removed, implying that many “state-of-the-art” improvements are illusory.

Practically, any project that screens thousands of molecular features must nest selection inside each CV fold or use an external validation cohort. The identical issue applies to other omics-assisted tasks—e.g., predicting CRISPR essentiality or patient outcome—where pre-filtering is tempting. Until journals and competitions enforce stricter code inspection, practitioners should treat published MSE or Pearson r values as upper bounds and retrain models with scrupulous nested CV before deploying biomarkers or moving into expensive in-vitro confirmation.

The above is personal commentary for reference only. Refer to the original papers for authoritative content.