Our new paper reformulates tokenisation as a linear program (LP), which we solve to get SOTA tokenisers 😁 As a bonus, this LP tells us how close to optimal any tokeniser is! Check it out 👇
w/ J. Tempus, @philipwitti.bsky.social, @craigschmidt.com, D. Komm
Paper: arxiv.org/abs/2605.22821