//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
Professor @hi.is in CS, head of rna seq data analysis at decode genetics (views are mine). Bioinformatician, epistemic trespasser, &c. ps. I hate GTF files
Páll Melsted









Loading...
like dealing with sparse arrays, variable sized lists and things we would solve with dynamic memory allocation on the CPU.
3mo
For a large dataset (295M reads) the GPU version took 50 seconds. Just running zcat on the files takes 10 minutes! Decompression and parsing FASTQ is a major bottleneck . Instead of using kseq we moved this work parsing to the GPU which delivers amazing throughput.
The key insight (aside from the speed) is that we need to reconsider all of the algorithmic framework if we want to use GPUs for large scale processing of sequencing datasets. It's nontrivial but I hope this paper gives some insight into how it's possible.
ok, that figure looked fine in the preview but was transparent, here's a better version
The only downside to this work is that now I feel frustrated having to wait minutes for classical kallisto to finish
Contrary to regular scientific programming, where the state consist of a high dimensional vector and most operations are matrix multiplication, string processing does not have obvious programming paradigms that port easily to GPUs. And yet in this work we always wind up using the same tool over ...
Figure 1 shows the main results, we can run 30M paired end reads in under 10 seconds using a RTX 5090 NVIDIA GPU. The average speedup is 30x for smaller reads and that includes startup time, on average we can process about 3.6M paired end reads per second.
3mo
3mo
3mo
3mo
3mo
3mo
and over again. Prefix scan. If you are interested in learning more about this then read this paper www.cs.cmu.edu/~guyb/papers... by Guy Blelloch from 1990. It is one of the most clearly written papers I've read and it gives you the algorithmic building blocks to solve problems on GPUs
Páll Melsted
Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data. The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv www.biorxiv.org/content/10.6... and github github.com/pachterlab/k... Figure 1 shows they key result
Páll Melsted
Páll Melsted
Páll Melsted
3mo
3mo
Páll Melsted
Páll Melsted
Páll Melsted
Páll Melsted
Páll Melsted
𝗣𝗼𝘀𝘁𝗱𝗼𝗰 𝗮𝗻𝗱 𝗣𝗵𝗗 𝗽𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝘀 𝗶𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗚𝗲𝗻𝗼𝗺𝗶𝗰𝘀 / 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝗶𝗰 𝗕𝗶𝗼𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗰𝘀 I am currently recruiting for both: 🔹 Postdoc position su.varbi.com/what:job/job... 🔹 PhD position su.varbi.com/en/what:job/... Please share with anyone who might be interested!
3mo