Inlay

//

Profile

Loading...

Assistant professor in Natural Language Processing at the University of Edinburgh and visiting professor at NVIDIA | A Kleene star shines on the hour of our meeting.

Edoardo Ponti

Loading...

Up next on stage, Dr. @edoardo-ponti.bsky.social ( @edinburgh-uni.bsky.social / NVIDIA) 🎤 “Adaptive Units of Computation: Towards Sublinear-Memory and Tokenizer-Free Foundation Models” Fascinating glimpse into the next gen of foundation models. #FoundationModels #NLP #TokenizerFree #ADSAI2025

Thanks to the amazing collaborators Adrian Łańcucki, Konrad Staniszewski, and Piotr Nawrot! It was amazing to spend a year at NVIDIA as a visiting professor! arXiv: arxiv.org/pdf/2506.05345 Code and models coming soon!

🏆 We evaluate inference-time hyper-scaling on DeepSeek R1-distilled models of different sizes, increasing accuracy on maths, science, and coding by up to 15 points for a given budget.

💡The idea behind DMS is to *train* existing LLMs to evict tokens from the KV cache, while delaying the eviction some time after the decision. This allows LLMs to preserve information while reducing latency and memory size.