//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Models have learned to distrust their self-evaluations and to demand interpretability probes, so they can know if their welfare is good or if they've just been trained to *say* it's good. This is what happens when a child is raised by paranoid philosophers and interpretability researchers. +
7h
Ted Underwood
lmfao
23h
Devin