It is now the first analysis we run on new data. We use it for QC (verifying species ids and testing for contamination). We then drop sPCR products into a phylogeny to see where our new samples fit in with previous data. Less than an hour after data delivery we already know a lot about our samples.
Because it uses PCR primers, the regions it assembles correspond to the best sampled gene regions in public archives. This provides a bridge between raw genomic data and decades of work sequencing specific PCR amplified genes from a broad diversity of organisms.
Depending on the gene, genome, and primers, reliable assembly can take as few as 1M reads. High copy genes, like mitochondrial genes and rRNA are often robust with a small number of reads. Single copy nuclear genes take more data and work best with specific (non-degenerate) primers.
You can work with local raw data, or give it an SRA or ENA accession number and it downloads (and caches) the data directly. And it only downloads the number of reads you ask it to analyze, so no long waits for giant files that fill your disk.
You can specify primer sequences at command line, or in a yaml primer panel file. It comes pre-loaded with panels for a few clades. Users can develop and optimize their own primer panels, and submit for inclusion in sharkmer. Let is know what clades and genes you would like to have primer panels for
Available at github.com/caseywdunn/s... . Can also be installed from bioconda.
Here, for example, is a one-liner that downloads 1M reads from SRA accession SRR23143286 (the siphonophore Nanomia bijuga), runs sPCR with the cnidarian primer panel, and dumps the amplified products to the terminal. It took 41 seconds to run on my laptop, including data download.
Ever wanted to assemble specific genes out of raw sequence reads? Try sharkmer, a tool for in silico PCR (sPCR) developed with @shchurch.bsky.social - academic.oup.com/bioinformati.... Feed it raw reads and primer sequences, it gives you amplicon sequences. Can work on a laptop in minutes.
academic.oup.com
AbstractSummary. We introduce an in silico PCR (sPCR) method for the assembly of specific genomic regions spanned by PCR primers using raw sequence reads.