Accepted at ACL main! Come chat about dialectal MT at our poster today at 4 pm.
Also, check out this largely bug-free package for generating your own synthetic dialectal data:
pypi.org/project/dial...
Are you tired of getting meh results from LLMs in your native language and resorting to English instead? We are too!
Frustrated with how most of the world’s low-resource languages have NO evaluation resources?
📢 Check out ChiKhaPo, a massively multilingual lexical comprehension and generation benchmark covering 2700+ languages.
www.arxiv.org/abs/2510.16928
You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅
We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️
(random is still a devilishly good baseline)
Multimodal LLMs can read text in images, but why do they often perform worse than when the same text is given as tokens? Our work studies the modality gap of models perceiving text as pixels and shows how to close it.
📄 arxiv.org/abs/2603.09095
🧵👇 #NLProc #LLM #ComputerVision