Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce.
- Citation data:
BMC structural biology, ISSN: 1472-6807, Vol: 13 Suppl 1, Issue: SUPPL.1, Page: S3
- Publication Year:
- Repository URL:
- Biochemistry, Genetics and Molecular Biology; Prediction Program; MapReduce Framework; Flock House Virus; Chunk Method; Stripe Jack Nervous Necrosis Virus; Mathematics
Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment.