PlumX Metrics
Embed PlumX Metrics

Cliffy: robust 16S rRNA classification based on a compressed LCA index

bioRxiv, ISSN: 2692-8205
2024
  • 0
    Citations
  • 0
    Usage
  • 0
    Captures
  • 1
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

  • Mentions
    1
    • News Mentions
      1
      • News
        1

Most Recent News

Cliffy: robust 16S rRNA classification based on a compressed LCA index

2024 JUN 13 (NewsRx) -- By a News Reporter-Staff News Editor at NewsRx Life Science Daily -- According to news reporting based on a preprint

Article Description

Taxonomic sequence classification is a computational problem central to the study of metagenomics and evolution. Advances in compressed indexing with the r-index enable full-text pattern matching against large sequence collections. But the data structures that link pattern sequences to their clades of origin still do not scale well to large collections. Previous work proposed the document array profiles, which use O(rd) words of space where r is the number of maximal-equal letter runs in the Burrows-Wheeler transform and d is the number of distinct genomes. The linear dependence on d is limiting, since real taxonomies can easily contain 10,000s of leaves or more. We propose a method called cliff compression that reduces this size by a large factor, over 250x when indexing the SILVA 16S rRNA gene database. This method uses Θ(r log d) words of space in expectation under a random model we propose here. We implemented these ideas in an open source tool called Cliffy that performs efficient taxonomic classification of sequencing reads with respect to a compressed taxonomic index. When applied to simulated 16S rRNA reads, Cliffy’s read-level accuracy is higher than Kraken2’s by 11-18%. Clade abundances are also more accurately predicted by Cliffy compared to Kraken2 and Bracken. Overall, Cliffy is a fast and space-economical extension to compressed full-text indexes, enabling them to perform fast and accurate taxonomic classification queries.

Bibliographic Details

Ahmed, Omar; Boucher, Christina; Langmead, Ben

Cold Spring Harbor Laboratory

Biochemistry, Genetics and Molecular Biology; Agricultural and Biological Sciences; Immunology and Microbiology; Neuroscience; Pharmacology, Toxicology and Pharmaceutics

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know