The bug is Hugging Face tokenizers issue #2058 (github.com/huggingface/...): the library counts merge pairs in an i32 hash map, and once any pair's count crosses 2^31 − 1 = 2,147,483,647 the counter wraps to a negative value and the pair never gets selected as a merge.