//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
@craigschmidt.com has a second paper using LPs for tokenisation coming out today as well! Check it out: bsky.app/profile/crai...
1mo
Tiago Pimentel
arxiv.org/abs/2605.22705 arxiv.org/abs/2605.22821 Happy Linear Programming for Tokenization day! I was involved with two separate papers that hit ArXiv yesterday, using LP's to find the vocabulary maximizing compression, depending on the kind of inference you want to use.
1mo
We introduce Tokenization with Split Trees (ToaST), a subword tokenization method that directly optimizes compression under a new recursive inference procedure. ToaST greedily splits each pretoken int...
arxiv.org
Tokenization with Split Trees
Craig Schmidt