In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

Citation data:

Journal of proteomics, ISSN: 1876-7737, Vol: 150, Page: 170-182

Publication Year:
2017
Usage 210
Abstract Views 207
Link-outs 3
Captures 42
Readers 42
Social Media 87
Tweets 46
Shares, Likes & Comments 41
Citations 7
Citation Indexes 7
PMID:
27498275
DOI:
10.1016/j.jprot.2016.08.002
Author(s):
Audain, Enrique, Uszkoreit, Julian, Sachsenberg, Timo, Pfeuffer, Julianus, Liang, Xiao, Hermjakob, Henning, Sanchez, Aniel, Eisenacher, Martin, Reinert, Knut, Tabb, David L, Kohlbacher, Oliver, Perez-Riverol, Yasset Show More Hide
Publisher(s):
Elsevier BV
Tags:
Biochemistry, Genetics and Molecular Biology
Most Recent Tweet View All Tweets
article description
In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended.

This article has 0 Wikipedia mention.