//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...


Loading...
Check out our new paper, investigating phenomena (hallucination, refusal, and sycophancy) both externally and internally! Showing a high correlation between the two!
Check out our new paper on evaluating LLM agents on their preference for achieving their goal and avoiding human harm, called ManagerBench👔
ManagerBench was accepted to #ICLR2026🎉 Check it out⬇️
3mo
8mo
4mo
Adi Simhi
Adi Simhi
Adi Simhi
ManagerBench was accepted to ICLR! @iclr-conf.bsky.social #ICLR2026 LLMs are still either unsafe, or completely harm avoidant - even when the harm affects furniture 🛋️ Check out our benchmark, online or in Rio 🇧🇷
4mo
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
How does an LLM’s past influence its future?🤔 In new work, led by @adisimhi.bsky.social, together with @fbarez.bsky.social @boknilev.bsky.social and Shay Cohen, we find conversational history creates a latent "geometric trap" which makes old habits e.g. hallucinations hard to break!
8mo
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
3mo
Martin Tutek
8mo
Martin Tutek
Martin Tutek
Martin Tutek