//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
Postdoc @ University of Neuchâtel, Laboratory of Evolutionary Genetics 🦠 Working on methods to study pathogen evolution and epidemiology using pangenomics 🧬
Sam Horsfield









Loading...
Fast Set Operations for Compact k-mer Sets https://www.biorxiv.org/content/10.64898/2026.05.24.727514v1
We use a BART architecture, which is robust to extreme rearrangements such as contig reordering, to learn how the presence/absence and positions of genes impacts the presence/absence and positions of other genes in a pangenome. So naturally, we came up with the original name "PanBART". (2/8)
I did some public outreach last year - it was super rewarding, and made me realise the importance of grassroots science engagement. Not only does it inspire the next generation of scientists, but fosters public trust in science, something that is very important right now.
14d
Our new preprint is out! We train a transformer on gene order and gene content of bacterial pathogens, applying it to a range of epidemiological and evolutionary analyses (1/8) www.biorxiv.org/content/10.6...
1mo
We show that PanBART, like other deep-learning methods, is sensitive to "out-of-distribution" data. But this isn't necessarily a bad thing! We can leverage this sensitivity by using a measure of model confidence, "pseudolikelihoods" to identify new emergent lineages! (5/8)
Finally, we explore gene-gene epistasis, identifying a theorised, but previously unobserved, association between an iron-regulated bacteriocin and siderophore in E. coli. This same association is not identified by Spydrpick. (7/8)
Overall, we lay the groundwork for using transformer models in a whole host of epi analyses. PanBART is available on GitHub, and we provide scripts and workflows to help you to train it on your species of interest! (8/8) github.com/samhorsfield...
We also show that PanBART can be used to assign query genomes to existing lineages with high accuracy, and although it's not quite as good as Sketchlib, there are cases when PanBART performs much better, where Sketchlib embedding space is highly unstructured. (4/8)
1mo
PanBART can also be used to predict whether a genome will "take-up" a gene of interest. We are able to accurately identify E. coli lineages which are likely to gain an extended-spectrum antibiotic resistance gene, meaning we can predict which strains might become drug resistant! (6/8)
1mo
We show that PanBART can accurately represent a phylogeny, clustering genomes of the same lineage with high agreement with PopPUNK, and outperforming accessory-only Sketchlib, which represents population structure using gene presence/absence only, and ignoring gene order. (3/8)
1mo
1mo
1mo
1mo