Inlay

🧵[7/n] 🔍 Potential Reasons 💡 We hypothesize that the in-distribution nature of training data is a key driver behind this sparsity 🧠 The model already "knows" a lot — RL just fine-tunes a small, relevant subnetwork rather than overhauling everything