Inlay

//

by @danabra.mov

by @danabra.mov

by @jimpick.com

+ new component

Post

Anthropic finds that they can use a weaker model to supervise a more capable model from sandbagging. Paper: Removing Sandbagging in LLMs by Training with Weak Supervision (arxiv.org/abs/2604.22082)

1mo

Sung Kim