//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
There are two different ways that the Huggingface Word Piece implementation can produce <UNK> tokens even with ByteLevel pretokenization. A nice blog post from Stéphan Tulkens talks about how to fix one of them, in response to a question of mine. stephantul.github.io/blog/better-...
9mo
Stéphan Tulkens' Blog
stephantul.github.io
Better Greedy Tokenizers: Handling WordPiece's [UNK] Problem
Craig Schmidt