Assistant professor in Natural Language Processing at the University of Edinburgh and visiting professor at NVIDIA | A Kleene star shines on the hour of our meeting.
Edoardo Ponti
Loading...
Up next on stage, Dr. @edoardo-ponti.bsky.social ( @edinburgh-uni.bsky.social / NVIDIA)
🎤 “Adaptive Units of Computation: Towards Sublinear-Memory and Tokenizer-Free Foundation Models”
Fascinating glimpse into the next gen of foundation models.
#FoundationModels #NLP #TokenizerFree #ADSAI2025
Thanks to the amazing collaborators Adrian Łańcucki, Konrad Staniszewski, and Piotr Nawrot!
It was amazing to spend a year at NVIDIA as a visiting professor!
arXiv: arxiv.org/pdf/2506.05345
Code and models coming soon!
🏆 We evaluate inference-time hyper-scaling on DeepSeek R1-distilled models of different sizes, increasing accuracy on maths, science, and coding by up to 15 points for a given budget.
💡The idea behind DMS is to *train* existing LLMs to evict tokens from the KV cache, while delaying the eviction some time after the decision.
This allows LLMs to preserve information while reducing latency and memory size.