Feature Selection and Comparison of Classifiers for Protein Function Prediction
Abstract
Knowing the function of proteins is essential in several areas such as bioinformatics, agriculture, and others. The processes to determine protein function that is realized in laboratories are costly and require a long time to be done. Therefore, it is necessary to provide efficient computational models that aim to find the function of a protein. There are currently several kinds of researches that deal with the prediction problem of protein function. However, each of them presents a different methodology, employing different classifiers as well. Based on this problem, we propose a methodology using a multi-objective genetic algorithm with the classifier k-NN to select the best characteristics and then apply several classifiers such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest, and k-NN, in order to compare their performance in the same methodology. Our methodology found the best performance to be the Random Forest classifier, with F-Measure of 75.47%.