//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
WiAIR is dedicated to celebrating the remarkable contributions of female AI researchers from around the globe. Our goal is to empower early career researchers, especially women, to pursue their passion for AI and make an impact in this exciting field.
Women in AI Research - WiAIR




Loading...
🎧 Listen to the episode! 🎬 YouTube: www.youtube.com/watch?v=3QXH... 🎙️ Spotify: open.spotify.com/episode/1aWC... 🍎 Apple: podcasts.apple.com/ca/podcast/1... 📄 Paper: arxiv.org/pdf/2601.11778 #WiAIR #MultilingualAI #LLMs #MachineTranslation #NLProc
Neural MT metrics show the strongest alignment with downstream performance. But the proxy has limits: some specialized benchmarks, including MGSM and INCLUDE, show weaker or more variable correlations. Task-specific evaluation remains necessary. (4/5 🧵)
Translation quality is measured on 3 parallel corpora using 7 MT metrics, both lexical and neural, then systematically correlated with downstream benchmark scores across languages and tasks. (3/5 🧵)
The paper evaluates 14 LLMs across 5 model families on 9 multilingual benchmarks spanning knowledge, reading comprehension, NLI, commonsense & mathematical reasoning, truthfulness, and regional knowledge. (2/5 🧵)
✨ Can translation quality serve as a scalable proxy for multilingual LLM evaluation? In our latest #WiAIR episode, we host Dr. Saadia Gabriel (@skgabrie.bsky.social) to discuss "Translation as a Scalable Proxy for Multilingual Evaluation". (1/5 🧵)
1mo
1mo
1mo
1mo
1mo
Women in AI Research - WiAIR
Women in AI Research - WiAIR
Women in AI Research - WiAIR
Women in AI Research - WiAIR
Women in AI Research - WiAIR