//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
Unfortunately, we're withdrawing our paper "Tokenization with Split Trees" from arXiv. All our baseline tokenizers — BPE, WordPiece, and Unigram — were trained incorrectly because of a bug in the Hugging Face tokenizers library, so every comparison to ToaST in the paper is invalid.