Inlay

This is now published in Genome Research (doi.org/10.1101/gr.2...). Thank you everyone for your feedback and also the anonymous reviewers who helped to greatly improve the paper. I hope this becomes a useful resource for the community.

Randomness is a powerful tool in the design and analysis of algorithms and data structures for nucleotide sequence data. Nucleotide sequences are not themselves random but are often randomized using hash functions. Despite their widespread use in genomics, there is no comprehensive review of the types of hash functions used and their various applications. In this survey intended for bioinformatic methods developers, we divide hash functions into four categories: scattering hash functions, permutations, minimum perfect hash functions, and locality-sensitive hash functions. For each category, we provide examples of both general-use hash functions that have been applied in nucleotide sequence analysis and hash functions that have been designed specifically for nucleotide sequence analysis. We highlight their salient properties, commonalities, differences, and application areas.

📢 We are thrilled to announce the keynote speakers for #RECOMB2026: 🧬 Sara Mostafavi 🧬 Manolis Kellis 🧬 Paul Medvedev 🧬 Alexandros Stamatakis Join us in Thessaloniki for inspiring talks from leaders in computational biology.

I've seen lots of AI rewrites in bioinformatics lately, and I’m concerned because LLMs can be confidently wrong. What do the best tool builders do to make sure their rewrites are correct? How can we tell if a rewrite is flawed? I interviewed 5 scientists to find out: youtu.be/0o2XnEBDxrI