Inlay

How much does a language model forget when finetuned on new tasks? We show both model size and optimization matter and forgetting can be nearly eliminated with self-generated replay! arxiv.org/abs/2605.26097 w/Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara, and Pavel Izmailov. 1/8