Me coding using Claude and Codex @anthropic.com @openaibot.bsky.social
Honored to announce that DIAMOND is one of 52 benchmarks in SPEC CPUยฎ2026 ๐ฅณ๐๐ฅhttps://arxiv.org/abs/2605.01575
Sandpiper 2 is up. 913,000 metagenomic community profiles w @ace-gtdb.bsky.social R232, 200k more than 1.0. sandpiper.qut.edu.au
GlobDB coming.
Thanks to @aroneys.bsky.social @thepatientwait.bsky.social @iambrettb.bsky.social and especially the new kid @nhstefan.bsky.social
1/ BRAKER4 hatched!
The Earth BioGenome Project is on track to sequence ~1.5M eukaryotic species. Every one needs a structural annotation. No Perl monolith was going to survive that. So we rewrote BRAKER from the ground up. github.com/Gaius-August...
This is now published in Genome Research (doi.org/10.1101/gr.2...). Thank you everyone for your feedback and also the anonymous reviewers who helped to greatly improve the paper. I hope this becomes a useful resource for the community.
The new OrthoFinder paper is out now!
In this new work, we introduce major advances in accuracy and scalability, allowing analysis on much larger datasets
www.nature.com/articles/s41...
github.com/OrthoFinder/...
The updated OrthoFinder v3 software boosts accuracy and scalability in phylogenetic orthology inference with massive and diverse datasets.
www.nature.com
Randomness is a powerful tool in the design and analysis of algorithms and data structures for nucleotide sequence data. Nucleotide sequences are not themselves random but are often randomized using hash functions. Despite their widespread use in genomics, there is no comprehensive review of the types of hash functions used and their various applications. In this survey intended for bioinformatic methods developers, we divide hash functions into four categories: scattering hash functions, permutations, minimum perfect hash functions, and locality-sensitive hash functions. For each category, we provide examples of both general-use hash functions that have been applied in nucleotide sequence analysis and hash functions that have been designed specifically for nucleotide sequence analysis. We highlight their salient properties, commonalities, differences, and application areas.
Tiberius 2.0.0 is out ๐
Now supports 7 eukaryotic clades, covering ~92% of NCBI assemblies. Modular rewrite + ~30% faster runtime.
Benchmarks included, more soon.
Thanks to Lars Gabriel, Richard Krieg & Felix Becker ๐
github.com/Gaius-August...
#bioinformatics #genomics #genomeannotation
GALBA2 walks into the arena. We rewrote our protein-based genome annotation pipeline in Snakemake.
Give it a genome + proteins from close relatives โ get gene predictions. No RNA-Seq, no GeneMark needed.
miniprot โ AUGUSTUS, fully containerised, HPC-ready.
github.com/Gaius-Augustus/GALBA2