Just a shout-out to the folks over at @pathoplexus.org and, importantly, all the scientists sharing genomic data with such amazing speed.
It's great to see the field having moved to a new platform, with non-transparent platforms being a thing of the past. Fantastic.
New blog post!
I use ntHash all the time to hash k-mers, yet it turns out it has some unexpected flaws (collision propagation, bias on leading zeros...). The good news: each of them can be fixed!
igor.martayan.org/posts/breaki...
igor.martayan.org
NtHash is a popular method for hashing k-mers in bioinformatics, yet it has some surprising flaws. In this post, I walk through a few of them, and show that they can arise naturally, without an advers...
Phold's manuscript is now available @narjournal.bsky.social thanks to @susiegriggo.bsky.social @npbhavya.bsky.social @vijinim.bsky.social @linsalrob.bsky.social @martinsteinegger.bsky.social @milot.bsky.social @eunbelivable.bsky.social & others not on bsky #phagesky academic.oup.com/nar/article/...
I should have said, this takes us to about 2.8 million genomes in total. We don't have annotations, etc for the latest data yet, this will be an ongoing process
For those of us interested in software development, data structure design etc in science, this is a must-read. A taste of what is happening in communities letting AI agents go wild writing code, creating PRs, writing documentation : spoiler - humans get addicted, lose perspective, slop everywhere.
A long time ago in a galaxy far away, there was a SARS-CoV-2 pandemic. Our paper, led by @martibartfast.bsky.social
a) correcting errors in 4.5 million genomes & their phylogeny
b) improving representation of the Global South in public data
www.nature.com/articles/s41...
(thread 1/n)
GTDB release 11 based on RefSeq 232 (R11-RS232) is live at gtdb.ecogenomic.org. This release covers 901,341 genomes (23% increase) and has 199,923 species clusters (39% increase). Release notes at: forum.gtdb.ecogenomic.org/t/announcing.... Release statistics at: gtdb.ecogenomic.org/stats/r232.
This Resource paper presents a global SARS-CoV-2 phylogenetic tree of 4,471,579 high-quality genomes consistently constructed by Viridian, an efficient amplicon-aware assembler.
Modern DRAM is based on a brilliant design from IBM.
But, we're still paying for a latency penalty that's existed since the 60s!
In this video, I'm introducing my research project (Tailslayer) that immensely reduces p99.99 latency on traditional RAM!
George Bouras
Video
Zamin Iqbal
Zamin Iqbal
Zamin Iqbal
LaurieWired
Can't wait to release a 10-year-old birthday version for SeqKit!
- 10 years
- 2 papers, 3500 citations
- 20 contributors
- 40 subcommands
- 880 commits
- 500 issues
- 685.5K Bioconda total downloads
Thank you all, dear contributors and users!
I'll keep maintaining it.
github.com/shenwei356/s...
LexicMap v0.9.0 has been released with
- a few bug fixes: CIGAR, bitscore and evalue calculation.
- new features: better support for big genomes like human.
- a new command to convert output to SAM format
github.com/shenwei356/L...
v0.9.0 - 2026-03-13
New commands:
lexicmap utils 2sam: Convert the default search output to SAM format (#26).
Attention: This command requires search results generated by the current LexicMap ver...
Changelog
SeqKit is 10 years old!
SeqKit v2.13.0 - 2026-02-28
seqkit: add support for reading and writing LZ4 compression format.
new command: seqkit sample2: improved seqkit sample by @stahiga....
Courtesy of @martibartfast.bsky.social , we have a new release of AllTheBacteria which adds another 322,920 assemblies, covering all ENA (illumina, isolate) prokaryotes to May 2025.
allthebacteria.readthedocs.io/en/latest/ov...
Weekend thoughts on Gas Town, Beads, slop AI browsers, and AI-generated PRs flooding overwhelmed maintainers. I don't think we're ready for our new powers we're wielding. lucumr.pocoo.org/2026/1/18/ag...
lucumr.pocoo.org
What’s going on with the AI builder community right now?
Abstract. Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents.
Stoked to finally have a preprint out for Phold, our tool that uses protein structural information to enhance phage genome annotation #phagesky 1/n
www.biorxiv.org/content/10.1...
George Bouras
Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents. Here we introduce Phold, an annotation framework utilis...