//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
- All systems will be human evaluated (no downsampling using automatic metrics) and we are preparing a new contrastive humeval protocol - LLM benchmarking focussed on open-weight models - Abstract submission has been replaced with a model card poll All details are at www2.statmt.org/wmt26/transl...
How well do LLMs handle multilinguality? 🌍🤖 🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
You may participate in up to 20 language pairs out of which we host 9 new ones: Czech to Vietnamese Chinese to Japanese (direction reversed) EN to Armenian EN to Belarusian EN to Indonesian EN to Kazakh EN to Ladin EN to Ligurian EN to Northern Sámi
Ready for our poster today at #COLM2025! 💭This paper has had an interesting journey, come find out and discuss with us! @swetaagrawal.bsky.social @kocmitom.bsky.social Side note: being a parent in research does have its perks, poster transportation solved ✅
Instruction following context in prompts. Systems may disregard them but failing to follow instructions is considered a translation error. You can expect the following phenomena: formal/informal voice, glossaries, structured translation (JSON, HTML, ...), style and expressions (e.g. "yuhuuu", "tbh")
Multimodal context - same as last year, for spoken domain, we provide original video, while for other domains, image can be provided with additional context (such as screenshots or infographics). Purely text-to-text systems can still participate as in the past
3mo
We'd like to officially announce the 21st iteration of the WMT General Machine Translation shared task and invite you to participate. Here is the list of main changes:
This project wouldn’t have been possible without the brilliant minds driving the work: Lorenzo Proietti, @sted19.bsky.social and @zouharvi.bsky.social
7mo
One way to raise the bar is by rethinking the source selection process: instead of random samples, we built model that chooses the most difficult data for translation. And we’ve already put our work into practice: this year’s WMT25 General MT test set use our approach to make eval more challenging.
3mo
🚩Machine Translation is far from “solved” - the test sets just got too easy. 🚩 Yes, the systems are much stronger. But the other half of the story is that test sets haven’t kept up. It’s no longer enough to just take a random news article and expect systems to stumble.
8mo
3mo
3mo
3mo
9mo
9mo
9mo
Tom Kocmi
Tom Kocmi
Tom Kocmi
Tom Kocmi
Cohere Labs
Tom Kocmi
Tom Kocmi
Tom Kocmi
Julia Kreutzer
Tom Kocmi