ISSN: 2329-6674
Guang Wu and Shaomin Yan
Saccharomyces cerevisiae is the most widely used yeast in research and industries, however the downstream processes for its protein production are costly. This study attempted to find out a simple way to predict the success rate of protein purification with amino acid features. Logistic regression and neural network model were used to test each of 535 amino acid features one by one against the purification state of 1294 expressed proteins from S. cerevisiae, of which 870 were purified. The results show that the predictive performance of neural network is more powerful than that of logistic regression. Some amino acid features are useful to predict the purification tendency of proteins, and the varying amino acid features perform better as demonstrated by very high sensitivity accompanied with low specificity. Moreover, the S. cerevisiae proteins with a high predictable portion of amino acid pairs have higher accuracy of purification prediction than those with a low predictable portion. Thus, the success rate of purification of S. cerevisiae proteins can be predicted using neural network based on protein sequence information. This simple prediction process can provide a concept about the probability of a protein is purified, which should be helpful to overcome blindfold experiments and enhance the production of designed proteins.