Me coding using Claude and Codex @anthropic.com @openaibot.bsky.social
Benjamin J. Buchfink
Honored to announce that DIAMOND is one of 52 benchmarks in SPEC CPUยฎ2026 ๐ฅณ๐๐ฅhttps://arxiv.org/abs/2605.01575
The march toward developing relevant and robust CPU benchmarks continues with the introduction of SPEC CPU 2026, the next generation suite for measuring processor performance. This paper details the m...
GALBA2 walks into the arena. We rewrote our protein-based genome annotation pipeline in Snakemake.
Give it a genome + proteins from close relatives โ get gene predictions. No RNA-Seq, no GeneMark needed.
miniprot โ AUGUSTUS, fully containerised, HPC-ready.
github.com/Gaius-Augustus/GALBA2
Sandpiper 2 is up. 913,000 metagenomic community profiles w @ace-gtdb.bsky.social R232, 200k more than 1.0. sandpiper.qut.edu.au
GlobDB coming.
Thanks to @aroneys.bsky.social @thepatientwait.bsky.social @iambrettb.bsky.social and especially the new kid @nhstefan.bsky.social
This is awful to hear, describing how Sean Eddy (HMMER, infernal, pfam, rfam) has been defunded. The letter said his work
"had been determined to be of absolutely no value to the US taxpayer, and therefore it was being specifically terminated,"
www.npr.org/2026/05/21/n...
Tiberius 2.0.0 is out ๐
Now supports 7 eukaryotic clades, covering ~92% of NCBI assemblies. Modular rewrite + ~30% faster runtime.
Benchmarks included, more soon.
Thanks to Lars Gabriel, Richard Krieg & Felix Becker ๐
github.com/Gaius-August...
#bioinformatics #genomics #genomeannotation
Even with federal grants largely restored, scientists say the Trump administration is still preventing those funds from reaching them. The consequences, they say, are already becoming clear.
I'm not looking forward to a future where all the tools are being vibe-rewritten into languages people don't want to learn. Who will maintain all this? The original maintainers won't. It's not the language they were comfortable with. Does the prompter understand the tool well enough?
1/ BRAKER4 hatched!
The Earth BioGenome Project is on track to sequence ~1.5M eukaryotic species. Every one needs a structural annotation. No Perl monolith was going to survive that. So we rewrote BRAKER from the ground up. github.com/Gaius-August...
Ben J Woodcroft
This is now published in Genome Research (doi.org/10.1101/gr.2...). Thank you everyone for your feedback and also the anonymous reviewers who helped to greatly improve the paper. I hope this becomes a useful resource for the community.
Zamin Iqbal
Katharina Hoff
The new OrthoFinder paper is out now!
In this new work, we introduce major advances in accuracy and scalability, allowing analysis on much larger datasets
www.nature.com/articles/s41...
github.com/OrthoFinder/...
doi.org
Randomness is a powerful tool in the design and analysis of algorithms and data structures for nucleotide sequence data. Nucleotide sequences are not themselves random but are often randomized using hash functions. Despite their widespread use in genomics, there is no comprehensive review of the types of hash functions used and their various applications. In this survey intended for bioinformatic methods developers, we divide hash functions into four categories: scattering hash functions, permutations, minimum perfect hash functions, and locality-sensitive hash functions. For each category, we provide examples of both general-use hash functions that have been applied in nucleotide sequence analysis and hash functions that have been designed specifically for nucleotide sequence analysis. We highlight their salient properties, commonalities, differences, and application areas.