at://
/
app.bsky.feed.post
/
3ml5weyiies2j
sign in
All
4
Record
2
Post
1
PostEmbed
1
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Anthropic finds that they can use a weaker model to supervise a more capable model from sandbagging. Paper: Removing Sandbagging in LLMs by Training with Weak Supervision (arxiv.org/abs/2604.22082)
1mo
Sung Kim