//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
LLMs are trained on lots of data, often from untrusted sources. This is particularly true in safety post-training, where data is gathered from human responses. Attackers can try to sneak in a backdoor: if there's a trigger in the prompt, bypass safety guardrails. 2/n