(8/n) On real mouse learning data (IBL dataset), models that incorporate history (RNNGLM) predict held-out data substantially better than DNNGLM and classic RL baselines. The inferred learning rule reveals reward-history–dependent updates (larger after rewarded sequences).
(9/n) Taken together, we infer nonparametric, non-Markovian learning rules directly from de novo behavior.
The inferred rule exhibits reward-history–dependent modulation, suggesting animals integrate experience over multiple trials when updating policy.