How good are LLMs at π scientific computing and visualization π?
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. π§΅
It's the season for PhD apps!! π₯§ π¦ βοΈ βοΈ
Apply to Wisconsin CS to research
- Societal impact of AI
- NLP ββ CSS and cultural analytics
- Computational sociolinguistics
- Human-AI interaction
- Culturally competent and inclusive NLP
with me!
lucy3.github.io/prospective-...
Weβve started a podcast! @awsto.bsky.social and @samps.phd host βCurrent Continuation,β a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/
π Read the full paper:
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
arxiv.org/abs/2504.15254
Dataset: github.com/anirudhkhatr...
w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social.
π§΅[6/6]