๐งต[7/n]
๐ Potential Reasons
๐ก We hypothesize that the in-distribution nature of training data is a key driver behind this sparsity
๐ง The model already "knows" a lot โ RL just fine-tunes a small, relevant subnetwork rather than overhauling everything