Also—hoping to implement genuine word sense disambiguation soon by combining this functionality with the LatinCy token vectors... getting there...
Made some LatinCy-based suggestions here—what else are people working with?
From Whitaker's release notes... "Permission is hereby freely given for any and all use of program and data." Amazingly open license, allowing us to build cool stuff for Latin. And thanks also to Martin Keegan for hosting the maintenance of Words since 2015, cf. mk270.github.io/whitakers-wo...
Teaching a course this fall at @isawnyu.bsky.social on text analysis for historical languages—course description here: diyclassics.github.io/isaw-f2026-g.... If you are at NYU (or a consortium program) and are interested, let me know.
#TeachAncient #DigiClass
Announcing—LatinCy Lexicon v0.1, a refactored version of Whitaker's Words that uses LatinCy annotations to disambiguate words/meanings. Can be added as a custom component to any LatinCy pipeline. github.com/latincy/lati... #digiclass #nlproc
Announcing—LatinCy Lexicon v0.1, a refactored version of Whitaker's Words that uses LatinCy annotations to disambiguate words/meanings. Can be added as a custom component to any LatinCy pipeline. github.com/latincy/lati... #digiclass #nlproc
Not only can we start using LatinCy annotations to disambiguate words, we can also use the WW word formation logic to generate paradigms for spaCy tokens...
Graduate course at ISAW (Fall 2026) introducing computational methods for historical-language research: Python-based text analysis & NLP via word embeddings, transformer models, and large language mod...
Announcing—LatinCy Lexicon v0.1, a refactored version of Whitaker's Words that uses LatinCy annotations to disambiguate words/meanings. Can be added as a custom component to any LatinCy pipeline. github.com/latincy/lati... #digiclass #nlproc
Patrick J. Burns
Announcing—LatinCy Lexicon v0.1, a refactored version of Whitaker's Words that uses LatinCy annotations to disambiguate words/meanings. Can be added as a custom component to any LatinCy pipeline. github.com/latincy/lati... #digiclass #nlproc
Patrick J. Burns
Whitaker's Words lexical data as LatinCy pipeline components for Latin NLP - latincy/latincy-lexicon
With v3.9, the LatinCy pipelines introduce a new set of human-readable, Latin-specific XPOS tags designed to surface useful linguistic information not easily derived from other UD annotations.
exploratoryphilology.org
Patrick J. Burns
Patrick J. Burns
✨ LatinCy v3.9 sm/md/lg/trf pipelines for SpaCy available ✨
- Improved tokenization and u/v norm
- New custom Latin-specific XPOS tags
- Better, more consistent lemma/morph coverage
huggingface.co/latincy/la_c...
#digiclass #nlproc
Patrick J. Burns
Question for the #digitalclassics or #digital #neolatin crowd (and @diyclassics in particular, I guess): What would be the best semantic embedding models for (Neo-)Latin currently available (not word embeddings but embeddings of text chunks)?
Recommended […]
[Original post on openbiblio.social]