Professor @hi.is in CS, head of rna seq data analysis at decode genetics (views are mine). Bioinformatician, epistemic trespasser, &c.
ps. I hate GTF files
Páll Melsted
Loading...
like dealing with sparse arrays, variable sized lists and things we would solve with dynamic memory allocation on the CPU.
For a large dataset (295M reads) the GPU version took 50 seconds. Just running zcat on the files takes 10 minutes! Decompression and parsing FASTQ is a major bottleneck . Instead of using kseq we moved this work parsing to the GPU which delivers amazing throughput.
The key insight (aside from the speed) is that we need to reconsider all of the algorithmic framework if we want to use GPUs for large scale processing of sequencing datasets. It's nontrivial but I hope this paper gives some insight into how it's possible.
ok, that figure looked fine in the preview but was transparent, here's a better version
The only downside to this work is that now I feel frustrated having to wait minutes for classical kallisto to finish
Contrary to regular scientific programming, where the state consist of a high dimensional vector and most operations are matrix multiplication, string processing does not have obvious programming paradigms that port easily to GPUs. And yet in this work we always wind up using the same tool over ...
Figure 1 shows the main results, we can run 30M paired end reads in under 10 seconds using a RTX 5090 NVIDIA GPU. The average speedup is 30x for smaller reads and that includes startup time, on average we can process about 3.6M paired end reads per second.
and over again. Prefix scan. If you are interested in learning more about this then read this paper www.cs.cmu.edu/~guyb/papers... by Guy Blelloch from 1990. It is one of the most clearly written papers I've read and it gives you the algorithmic building blocks to solve problems on GPUs
Páll Melsted
Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data.
The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv www.biorxiv.org/content/10.6... and github github.com/pachterlab/k...
Figure 1 shows they key result
Páll Melsted
Páll Melsted
Páll Melsted
Páll Melsted
Páll Melsted
Páll Melsted
Páll Melsted
Páll Melsted
𝗣𝗼𝘀𝘁𝗱𝗼𝗰 𝗮𝗻𝗱 𝗣𝗵𝗗 𝗽𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝘀 𝗶𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗚𝗲𝗻𝗼𝗺𝗶𝗰𝘀 / 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝗶𝗰 𝗕𝗶𝗼𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗰𝘀
I am currently recruiting for both:
🔹 Postdoc position
su.varbi.com/what:job/job...
🔹 PhD position
su.varbi.com/en/what:job/...
Please share with anyone who might be interested!