at://
/
app.bsky.feed.post
/
3me2cct2lbg2t
sign in
All
4
Record
2
Post
1
PostEmbed
1
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Reward models (RMs) are supposed to represent human values. But RMs are NOT blank slates – they inherit measurable biases from their base models that stubbornly persist through preference training. #ICLR2026 🧵
4mo
Brian Christian