Inlay

For years, we've known that running a standard t-test on cross-validation folds violates sample independence. We wanted to see how widespread this issue actually is. The result? 97% of the studies used an invalid statistical test. 🧵👇

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/202... led by @tianchu.bsky.social @hetuli.bsky.social @shaoshiz.bsky.social @nichols.bsky.social 1/N