Inlay

Profile

Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures? 🤞means luck in US but deeply offensive in Vietnam 🚨 📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior! 📜: arxiv.org/abs/2502.17710

Feb 26, 2025

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

7mo

Apr 25, 2025

Our paper documenting the environmental impacts of creating OLMo language models is the most honest and comprehensive characterization I know of, including training, development (!) and inference costs. If you're at ICLR chat with @jacobcares.bsky.social & @clarana.bsky.social Sat morning 10-12:30!

Akhila Yerukola

Amanda Bertsch

We’re excited about Oolong as a challenging benchmark for information aggregation! Let us know which models we should benchmark next 👀 Paper: arxiv.org/abs/2511.02817 Dataset: huggingface.co/oolongbench Code: github.com/abertsch72/o... Leaderboard: oolongbench.github.io

7mo

I'm in Singapore for @iclr-conf.bsky.social ! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too

Come through! #492 in Hall 2!, 10am-12:30pm

Yes! tbh this method is probably much more immediately useful for helping one understand subtle differences between [models trained on] subtly different data subsets, vs a loftier goal of helping one find "the" best data mixture -- to anyone considering this method, please feel free to reach out :)

Emma Strubell

We've received multiple notes that NOAA research services (Office of Oceanic and Atmospheric Research) may go offline at midnight. @safeguardingdata.bsky.social is working on web archiving, but if others want to nominate on this, that might be good: digital2.library.unt.edu/nomination/G...

Apr 23, 2025

Apr 26, 2025

How can we better think and talk about human-like qualities attributed to language technologies like LLMs? In our #CHI2025 paper, we taxonomize how text outputs from cases of user interactions with language technologies can contribute to anthropomorphism. arxiv.org/abs/2502.09870 1/n

May 6, 2025

Amanda Bertsch

Apr 3, 2025

Mar 6, 2025