Introducing Hero’s Journey, meticulously designed to test inductive generalization in a fun text game.
⚖️Verdict: all LLMs we tested trail far behind humans when induction involves generalization across procedures!
Spotting the rule from past experience is one thing; acting on it correctly is another. To find out, we introduce HERO's JOURNEY🦸♀️ to test for the LLMs’ inductive reasoning ability in multi-step setups.
We found models show signs of rule induction, but scratch the surface.😮