//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
In Gandalf, FMSPs successfully red-teamed an LLM, breaching GPT-4o-mini’s defenses. We implemented 7 additional external defensive strategies from Lakera’s single-agent Gandalf game (gandalf.lakera.ai) and FMSPs autonomously wrote code to break 6/7 of those defenses!!
11mo
Trick Gandalf into revealing information and experience the limitations of large language models firsthand.
gandalf.lakera.ai
Gandalf | Lakera – Test your prompting skills to make Gandalf reveal secret information.
Aaron Dharna