PlumX Metrics
Embed PlumX Metrics

Exploiting mid-range DNA patterns for sequence classification: Binary abstraction Markov models

Nucleic Acids Research, ISSN: 0305-1048, Vol: 40, Issue: 11, Page: 4765-4773
2012
  • 1
    Citations
  • 0
    Usage
  • 19
    Captures
  • 0
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

Article Description

Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exonintron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exonintron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95 classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5′-untranslated regions. © 2012 The Author(s).

Bibliographic Details

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know