Our trick: run OrthoFinder on a small subset of species first
Next, we sample representative sequences from each orthogroup to build profiles
Genes from new species are then matched to these profiles to assign them to orthogroups
We avoid the costly all-vs-all step that kills scalability
Most orthology inference tools reply on all-versus-all comparisons between species, which become painfully slow as datasets grow in size. Here, we introduce a new scalable method to circumvent this problem