Feature Selection and Comparison of Classifiers for Protein Function Prediction

Authors

  • Bruno César dos Santos Pontifical Catholic University of Minas Gerais
  • Cora Silberschneider Pontifical Catholic University of Minas Gerais
  • Marcos W. Rodrigues Pontifical Catholic University of Minas Gerais
  • Cristiano L. N. Pinto Pontifical Catholic University of Minas Gerais
  • Cristiane N. Nobre Pontifical Catholic University of Minas Gerais
  • Luis E. Zárate Pontifical Catholic University of Minas Gerais

Abstract

Knowing the function of proteins is essential in several areas such as bioinformatics, agriculture, and others. The processes to determine protein function that is realized in laboratories are costly and require a long time to be done. Therefore, it is necessary to provide efficient computational models that aim to find the function of a protein. There are currently several kinds of researches that deal with the prediction problem of protein function. However, each of them presents a different methodology, employing different classifiers as well. Based on this problem, we propose a methodology using a multi-objective genetic algorithm with the classifier k-NN to select the best characteristics and then apply several classifiers such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest, and k-NN, in order to compare their performance in the same methodology. Our methodology found the best performance to be the Random Forest classifier, with F-Measure of 75.47%.

Downloads

Published

2019-12-30