ISSN: 0974-276X
Tamanna Sultana, Rick Jordan, James Lyons-Weiler
Correct identification of peptides and proteins in complex biological samples from proteomic mass-spec tra is a challenging problem in bioinformatics. The sensit ivity and specificity of identification algorithms depend on underlying scoring methods, some being more sensiti ve, and others more specific. F or high-throughput, auto- mated peptide identification, control over the algo rithm s performance in terms of trade-off between s ensitivity and specificity is desirable. Combinations of algorithms, called ‘consensus meth ods’, have been shown to pro- vide more accurate results than individual algorith ms. However, due to the proliferation of algorithms and their varied internal settings, a systematic understandin g of relative performance of individual and consens us meth- ods are lacking. We performed an in-depth analysis of various approaches to consensus scoring using known protein mixtures, and e valuated the performance of 2310 settings generated from consensus of three different search algorithms: Mascot, Sequest, and X!Tandem. O ur findings indicate that the union of Mascot, Seq uest, and X!Tandem performed well (considering overall ac curacy), and methods using 80-99.9% protein probabi lity and/or minimum 2 peptides and/or 0-50% minimum pept ide probability for protein identification performe d better (on average) among all consensus methods tes ted in terms of overall accuracy. The results also suggest method selection strategies to provide direct contr ol over sensitivity and specificity.