π‘ππͺ π£ππ£ππ₯:Language models recognize dropout and Gaussian noise applied to their activations.
The team introduced an a-semantic perturbation into a language model. See what happened β‘οΈ
lawzero.org/en/publicati...
#AISafety #MLSky #LLM #LawZero
@yoshuabengio.bsky.social
We provide evidence that language models can detect, localize and, to a certain degree, verbalize the difference between perturbations applied to their activations. More precisely, we either (a) mask ...