How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU!
Our latest work with NVIDIA introduces new CUDA kernels & data formats for faster inference and training of sparse transformer language models:
Blog: pub.sakana.ai/sparser-fast...
Sakana AI
For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead.
Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388
Thread:
Excited to share Sakana AI’s new #ICML2026 paper in collaboration with NVIDIA: "Sparser, Faster, Lighter Transformer Language Models" arxiv.org/abs/2603.23198
This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer LLMs:
🧵 Thread 👇
Excited to share Sakana AI’s new #ICML2026 paper in collaboration with NVIDIA: "Sparser, Faster, Lighter Transformer Language Models" arxiv.org/abs/2603.23198
This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer LLMs:
🧵 Thread 👇
Video
Video
hardmaru
hardmaru
hardmaru
Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026
arxiv.org/abs/2512.04388
What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs?
Thread: