Evaluating language model responses on open-ended tasks is hard! π€
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria πβοΈ.
EvalAgent identifies π©βπ«π expert advice on the web that implicitly address the userβs prompt π§΅π
Manya Wadhwa