Inlay

//

Profile

Loading...

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

Jun 2, 2025

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

Apr 22, 2025

We’ve started a podcast! @awsto.bsky.social and @samps.phd host “Current Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/

Jun 2, 2025

It's the season for PhD apps!! 🥧 🦃 ☃️ ❄️ Apply to Wisconsin CS to research - Societal impact of AI - NLP ←→ CSS and cultural analytics - Computational sociolinguistics - Human-AI interaction - Culturally competent and inclusive NLP with me! lucy3.github.io/prospective-...

7mo

Sebastian Joseph

SIGPLAN

Manya Wadhwa

Lucy Li

News🗞️ I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘 Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)

Jun 2, 2025

Kanishka Misra

#COLM2025 was one of my favorite conferences -- a really high fraction of interesting papers and people, but small enough to see everything! Thank you to the organizers for putting it together!

8mo

sigplan.org

Current Continuation