Inlay

//

Profile

Loading...

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! Our latest work with NVIDIA introduces new CUDA kernels & data formats for faster inference and training of sparse transformer language models: Blog: pub.sakana.ai/sparser-fast...

1mo

Sakana AI

For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead. Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388 Thread:

Excited to share Sakana AI’s new #ICML2026 paper in collaboration with NVIDIA: "Sparser, Faster, Lighter Transformer Language Models" arxiv.org/abs/2603.23198 This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer LLMs: 🧵 Thread 👇

1mo

Video

1mo

Video

hardmaru

Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 arxiv.org/abs/2512.04388 What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? Thread:

1mo

Sakana AI