⭐️ We're going to launch Grassroots Science, a year-long ambitious, massive-scale, fully open-source initiative aimed at developing multilingual LLMs aligned to diverse and inclusive human preferences in Feb 2025.
🌐 Check our website: grassroots.science.
#NLProc #GrassrootsScience
📢 Calling all SEA-passionate individuals!
SEACrowd is excited to launch our contributor call for SEA-VL Phase 2: Building Visual Language Models for Southeast Asia! 🌏
After the success of Phase 1, we're now taking on a bigger mission (see thread)👇
First bsky post about tinlab at #EMNLP2024! A few highlights:
* Presentations from Aditya Yedetore and Hayley Ross on neural network generalizations!
* I'm giving a keynote at GenBench & organizing BlackboxNLP
* Ask me about our faculty hiring & PhD/postdoc positions at Boston University!
📣 New paper!
We observe that reasoning language models finetuned only on English data are capable of zero-shot cross-lingual reasoning through a "quote-and-think" pattern.
However, this does not mean they reason the same way across all languages or in new domains.
[1/N]
🤗 Super excited to have this work out!
Turns out by calculating the angles 📐 between representations, you can pick out difficult data samples! This can be very useful for assembling hard test sets or more efficient training sets.
See more cool results and visuals in the 🧵
SEA-VL: Building AI for Southeast Asian Research 🌏
We release SEA-VL, the largest vision-language dataset tailored for SEA’s diverse culture.
📜 arXiv: arxiv.org/abs/2503.07920
🤗 Data: huggingface.co/collections/...
Check the thread 🧵
Grassroots Science
We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples?
🧵NEW PAPER by @jenniferlumeng.bsky.social
SEACrowd
SEACrowd
Naomi Saphra
Najoung Kim
Yong Zheng-Xin (Yong)
Ruochen
LMs need linguistics! New paper, with @futrell.bsky.social, on LMs and linguistics that conveys our excitement about what the present moment means for linguistics and what linguistics can do for LMs. Paper: arxiv.org/abs/2501.17047. 🧵below.