Inlay

🧵[5/n]📊 🧪 Training the Subnetwork Reproduces Full Model 1️⃣ When trained in isolation, the sparse subnetwork recovers almost the exact same weights as the full model 2️⃣ achieves comparable (or better) end-task performance 3️⃣ 🧮 Even the training loss converges more smoothly