External validation on CoPE’s performance is always cool to see. We stress test and eval extensively ourselves, of course, but it never quite feels real until it’s in someone else’s hands.
Dave Willner
I am NOT an AI engineer or AI researcher, but I tried to do a little evaluation of CoPE-B vs CoPE-A vs gpt-oss-safeguard
github.com/roostorg/mod...
lmk what you think, and we'd love for more evaluations to be part of the ROOST Model Community! cc @samidh.bsky.social