Inlay

Profile

🏆ONEBench accepted to ACL main! ✨ Stay tuned for the official leaderboard and real-time personalised benchmarking release! If you’re attending ACL or are generally interested in the future of foundation model benchmarking, happy to talk! #ACL2025NLP #ACL2025 @aclmeeting.bsky.social

🧠 Keeping LLMs factually up to date is a common motivation for knowledge editing. But what would it actually take to support this in practice at the scale and speed the real world demands? We explore this question and really push the limits of lifelong knowledge editing in the wild. 👇

Godsend

Excited to be in Vienna for #ACL2025 🇦🇹!You'll find @dziadzio.bsky.social and I by our ONEBench poster, so do drop by! 🗓️Wed, July 30, 11-12:30 CET 📍Hall 4/5 I’m also excited to talk about lifelong and personalised benchmarking, data curation and vision-language in general! Let’s connect!

Check out our newest paper! As always, it was super fun working on this with @prasannamayil.bsky.social

🧵1/10 Excited to share our #SIGGRAPH paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟 We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching #MLLM

Why More Researchers Should be Content Creators Just trying something new! I recorded one of my recent talks, sharing what I learned from starting as a small content creator. youtu.be/0W_7tJtGcMI We all benefit when there are more content creators!

🚨Great Models Think Alike and this Undermines AI Oversight🚨 New paper quantifies LM similarity (1) LLM-as-a-judge favor more similar models🤥 (2) Complementary knowledge benefits Weak-to-Strong Generalization☯️ (3) More capable models have more correlated failures 📈🙀 🧵👇

May 17, 2025

Apr 8, 2025

Feb 7, 2025

10mo

Feb 18, 2025

May 27, 2025

11mo

Feb 7, 2025

I'm in Nashville this week attending #CVPR2025. Excited to discuss post-training VLMs and diffusion models!

Adhiraj Ghosh

Lukas Thede

Jun 11, 2025

Adhiraj Ghosh

Thaddäus Wiedemer

Joschka Strüber @Tuebingen AI Center🇩🇪

Jia-Bin Huang

Niladri Shekhar Dutt

Shyamgopal Karthik

🚨Looking to test your foundation model on an arbitrary and open-ended set of capabilities, not explicitly captured by static benchmarks? 🚨 Check out ✨ONEBench✨, where we show how sample-level evaluation is the solution. 🔎 arxiv.org/abs/2412.06745

Dec 10, 2024

Adhiraj Ghosh