Inlay

//

ProfileReplies

Loading...

Congratulations Kanishka!

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

It's the season for PhD apps!! 🥧 🦃 ☃️ ❄️ Apply to Wisconsin CS to research - Societal impact of AI - NLP ←→ CSS and cultural analytics - Computational sociolinguistics - Human-AI interaction - Culturally competent and inclusive NLP with me! lucy3.github.io/prospective-...

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

We’ve started a podcast! @awsto.bsky.social and @samps.phd host “Current Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/

📄 Read the full paper: CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation arxiv.org/abs/2504.15254 Dataset: github.com/anirudhkhatr... w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social. 🧵[6/6]