How good are LLMs at 🔭 scientific computing and visualization 🔭?
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Evaluating language model responses on open-ended tasks is hard! 🤔
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
We’ve started a podcast! @awsto.bsky.social and @samps.phd host “Current Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/
It's the season for PhD apps!! 🥧 🦃 ☃️ ❄️
Apply to Wisconsin CS to research
- Societal impact of AI
- NLP ←→ CSS and cultural analytics
- Computational sociolinguistics
- Human-AI interaction
- Culturally competent and inclusive NLP
with me!
lucy3.github.io/prospective-...
Sebastian Joseph
SIGPLAN
Manya Wadhwa
Lucy Li
News🗞️
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
Kanishka Misra
#COLM2025 was one of my favorite conferences -- a really high fraction of interesting papers and people, but small enough to see everything!
Thank you to the organizers for putting it together!