//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfileReplies









Loading...
Congratulations Kanishka!
How good are LLMs at πŸ”­ scientific computing and visualization πŸ”­? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧡
It's the season for PhD apps!! πŸ₯§ πŸ¦ƒ β˜ƒοΈ ❄️ Apply to Wisconsin CS to research - Societal impact of AI - NLP ←→ CSS and cultural analytics - Computational sociolinguistics - Human-AI interaction - Culturally competent and inclusive NLP with me! lucy3.github.io/prospective-...
Evaluating language model responses on open-ended tasks is hard! πŸ€” We introduce EvalAgent, a framework that identifies nuanced and diverse criteria πŸ“‹βœοΈ. EvalAgent identifies πŸ‘©β€πŸ«πŸŽ“ expert advice on the web that implicitly address the user’s prompt πŸ§΅πŸ‘‡
We’ve started a podcast! @awsto.bsky.social and @samps.phd host β€œCurrent Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/
πŸ“„ Read the full paper: CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation arxiv.org/abs/2504.15254 Dataset: github.com/anirudhkhatr... w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social. 🧡[6/6]