We’ve started a podcast! @awsto.bsky.social and @samps.phd host “Current Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/
How good are LLMs at 🔭 scientific computing and visualization 🔭?
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
News🗞️
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
#COLM2025 was one of my favorite conferences -- a really high fraction of interesting papers and people, but small enough to see everything!
Thank you to the organizers for putting it together!
Evaluating language model responses on open-ended tasks is hard! 🤔
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇