For years, we've known that running a standard t-test on cross-validation folds violates sample independence. We wanted to see how widespread this issue actually is.
The result? 97% of the studies used an invalid statistical test. ๐งต๐
Shaoshi Zhang
In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests.
Here's our new preprint doi.org/10.64898/202... led by @tianchu.bsky.social @hetuli.bsky.social @shaoshiz.bsky.social @nichols.bsky.social 1/N