Reindeer: Efficient indexing of k-mer presence and abundance in sequencing datasets

Citation DataBioinformatics, ISSN: 1460-2059, Vol: 36, Issue: Suppl_1, Page: I177-I185

Publication Year2020

34
Citations
0
Usage
42
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
34
- Citation Indexes
  34
Captures
42
- Readers
  42

Article Description

Motivation: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. Results: We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ~4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances.

Bibliographic Details

DOI10.1093/bioinformatics/btaa487

PMID32657392

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85087880309&origin=inward; http://dx.doi.org/10.1093/bioinformatics/btaa487; http://www.ncbi.nlm.nih.gov/pubmed/32657392; https://academic.oup.com/bioinformatics/article/36/Supplement_1/i177/5870500; https://dx.doi.org/10.1093/bioinformatics/btaa487

AUTHOR(S)

Marchet, Camille; Iqbal, Zamin; Gautheret, Daniel; Salson, Mikaël; Chikhi, Rayan

PUBLISHER(S)

Oxford University Press (OUP)

TAG(S)

Mathematics; Biochemistry, Genetics and Molecular Biology; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know