This was joint work with @vkastreva.bsky.social, @philipwitti.bsky.social, D. Komm! Violeta is a super smart student, who is definitely gonna do lots more interesting work :) It's her first paper, and it's also her birthday today 🥳 so follow her if you like this!
Paper: arxiv.org/abs/2511.15709
Recent works have shown that tokenisation is NP-complete. However, these works assume tokenisation is applied to inputs with unboundedly large alphabets -- an unrealistic assumption, given that in pra...