//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...








Loading...
🏆ONEBench accepted to ACL main! ✨ Stay tuned for the official leaderboard and real-time personalised benchmarking release! If you’re attending ACL or are generally interested in the future of foundation model benchmarking, happy to talk! #ACL2025NLP #ACL2025 @aclmeeting.bsky.social
🧠 Keeping LLMs factually up to date is a common motivation for knowledge editing. But what would it actually take to support this in practice at the scale and speed the real world demands? We explore this question and really push the limits of lifelong knowledge editing in the wild. 👇
Godsend
Excited to be in Vienna for #ACL2025 🇦🇹!You'll find @dziadzio.bsky.social and I by our ONEBench poster, so do drop by! 🗓️Wed, July 30, 11-12:30 CET 📍Hall 4/5 I’m also excited to talk about lifelong and personalised benchmarking, data curation and vision-language in general! Let’s connect!
Check out our newest paper! As always, it was super fun working on this with @prasannamayil.bsky.social
🧵1/10 Excited to share our #SIGGRAPH paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟 We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching #MLLM
Why More Researchers Should be Content Creators Just trying something new! I recorded one of my recent talks, sharing what I learned from starting as a small content creator. youtu.be/0W_7tJtGcMI We all benefit when there are more content creators!
🚨Great Models Think Alike and this Undermines AI Oversight🚨 New paper quantifies LM similarity (1) LLM-as-a-judge favor more similar models🤥 (2) Complementary knowledge benefits Weak-to-Strong Generalization☯️ (3) More capable models have more correlated failures 📈🙀 🧵👇
May 17, 2025
Apr 8, 2025
Feb 7, 2025
10mo
Feb 18, 2025
May 27, 2025
11mo
Feb 7, 2025
I'm in Nashville this week attending #CVPR2025. Excited to discuss post-training VLMs and diffusion models!
Adhiraj Ghosh
Lukas Thede
Jun 11, 2025
Adhiraj Ghosh
Adhiraj Ghosh
Thaddäus Wiedemer
Joschka Strüber @Tuebingen AI Center🇩🇪
Jia-Bin Huang
Niladri Shekhar Dutt
Shyamgopal Karthik
🚨Looking to test your foundation model on an arbitrary and open-ended set of capabilities, not explicitly captured by static benchmarks? 🚨 Check out ✨ONEBench✨, where we show how sample-level evaluation is the solution. 🔎 arxiv.org/abs/2412.06745
Dec 10, 2024
Adhiraj Ghosh