For fans of Talkie-1930 and all the methodological questions raised by historical models, here's a new entrant into the field, TypewriterLM, trained up to 1913. Corpus, instruction-tuning datasets, and event dataset are released. arxiv.org/abs/2606.02991
We introduce TypewriterLM, a 7.24B History language model (LM) trained exclusively on English text predating 1913. Developing History LMs requires addressing challenges in data quality and availabilit...