//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Code available on GitHub 👀
7d
Dr. Angelica Lim
Natural Language Autoencoders: "verbalize" a model's internal activations into plain text. Wild stat — Claude suspects it's being safety-tested on 26% of benchmark problems vs <1% in real chats, without ever saying so. anthropic.com/research/natural-language-autoencoders
7d