//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...




Loading...
Huge thanks to all my amazing collaborators: Renqing Cuomao, Daniil Yurshevich, Anna Sotnikova, Lonneke van der Plas, @abosselut.bsky.social πŸ“„ Paper: arxiv.org/abs/2604.03374 πŸ€— Benchmark: huggingface.co/datasets/mis... 🌐 Project: mete.is/cresowlve #NLP #LLM #AIResearch #Benchmark #Creativity
2mo
Mete
2mo
And it's not just *what* you know β€” it's *how* you think. 72% of puzzles require lateral thinking. Many involve analogy-making, abstraction, metaphors, jokes, and puns. Most questions combine 2+ creative reasoning strategies.
Mete
These aren't your typical trivia questions. CresOWLve spans 34 knowledge domains β€” from Literature to Astronomy to Art β€” covering 2,061 carefully curated puzzles across more than 26 cultures. Solving them demands *connecting facts across domains in non-obvious ways* πŸŒπŸ“š
LLMs can retrieve knowledge β€” but can they connect it in *creative* ways to solve problems? Introducing CresOWLve πŸ¦‰, a new benchmark that evaluates creative problem-solving over real-world knowledge, using puzzles that require multiple creative thinking strategies.πŸ‘‡
2mo
2mo
LLM performance? πŸ“‰ Non-thinking models under 30% (with CoT), most thinking models under 60%. πŸ“‰ Models perform up to 17% worse on creative vs. factual questions. Crucially, models *can* retrieve the relevant facts β€” they just fail to form the creative connection between them.
Mete
2mo
Mete
Mete
Creative problem-solving requires combining multiple cognitive abilities, including logical reasoning, lateral thinking, analogy-making, and commonsense knowledge, to discover insights that connect se...
arxiv.org
CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge