Paper link here: arxiv.org/abs/2601.02906
Joint work with @juice500ml.bsky.social, @kalvinchang.bsky.social, Ming-Hao Hsu, @florian-eichin.com, Zhizheng Wu, Alane Suhr, @mhedderich.bsky.social, David Harwath, @davidrmortensen.bsky.social, and @barbaraplank.bsky.social!
Multilingual speech foundation models such as Whisper are trained on web-scale data, where data for each language consists of a myriad of regional varieties. However, different regional varieties ofte...