Our new paper sets the stage for the biggest practical use case of model interpretability: stress testing and dataset development. All you need is interpretable linear features and simple geometry.
arxiv.org
Humans cannot always intuit what scenarios are most challenging to LLMs. Hoping to capture challenging edge cases, developers either design problems to be difficult for humans or curate extensive benc...