//
sign in
Post
by @danabra.mov
PostEmbed
by @danabra.mov
Record
by @jimpick.com
Record
by @atsui.org
+ new component
Post
For fans of Talkie-1930 and all the methodological questions raised by historical models, here's a new entrant into the field, TypewriterLM, trained up to 1913. Corpus, instruction-tuning datasets, and event dataset are released. arxiv.org/abs/2606.02991
arxiv.org
We introduce TypewriterLM, a 7.24B History language model (LM) trained exclusively on English text predating 1913. Developing History LMs requires addressing challenges in data quality and availabilit...
Pretraining Language Models on Historical Text
1d
Ted Underwood