Inlay

//

by @danabra.mov

by @dansshadow.bsky.social

by @jimpick.com

by @danabra.mov

by @dansshadow.bsky.social

by @katherine.computer

by @katherine.computer

by @dansshadow.bsky.social

by @danabra.mov

by @danabra.mov

by @danabra.mov

by @danabra.mov

by @dansshadow.bsky.social

by @danabra.mov

by @danabra.mov

StreamPlacePlaylist

by @katherine.computer

+ new component

Profile

Loading...

Loading...

To address issues with multiple-choice evaluation, we focus on open-ended evaluation with a simulated user. Annotation studies show strong correlation between LLM and human judgments of which action a model took in a given scenario, allowing us to automate open-ended evaluations.