Inlay

Extremely good and important work. Thank you @bttyeo.bsky.social

Cross-validation is widely used to compare predictive performance — not just to benchmark algorithms, but for scientific applications like ranking biomarkers. But performance across folds isn't independent, so standard tests (e.g. paired t-test) inflate false positives. 2/N