Comparative studies of differential gene callingusing RNA-Seq data

Publication Year:
2013
Usage 156
Downloads 139
Abstract Views 17
Repository URL:
http://digitalcommons.unl.edu/plantscifacpub/83
Author(s):
Zheng, Ximeng; Moriyama, Etsuko N.
Tags:
Life Sciences; Plant Biology; Plant Breeding and Genetics; Plant Pathology; Plant Sciences
article description
Background: With its massive amount of data, gene-expression profiling by RNA-Seq has many advantanges compared with microarray experiments. RNA-Seq analysis, however, is fundamentally different from microarray data analysis. Techniques developed for analyzing microarray data thus cannot be directly applicable for the digital gene expression data. Several statistical methods have been developed for identifying differentially expressed genes specifically from RNA-Seq data over the past few years.Results: In this study, we examined the performance of differential gene-calling methods using RNA-Seq data in practical situations. We focused on two representative methods: one parametric method, DESeq, and one nonparametric method, NOISeq. We examined their performance using both simulated and real datasets. Our simulation followed the RNA-Seq process and produced more realistic short read data. Both DESeq and NOISeq identified over-expressed genes more correctly than under-expressed genes. While DESeq was more likely to call longer genes as differentially expressed than shorter ones, NOISeq did not have such bias. When the underlying variation increased, both methods showed higher rates of false positives. When replicates were not available in the experiments, both methods showed lower rates of true positives and higher rates of false positives.Conclusions: The level of variation clearly affected the performance of both methods, showing the importance of understanding the variation in the data as well as having replications in RNA-Seq experiments. We showed that it is possible to obtain improved differential gene-calling results by combining the results obtained by the two methods. We suggested strategies to use these two methods individually or combined according to the characteristics of the data.