Hierarchical and Non-Hierarchical Classification of Transposable Elements with a Genetic Algorithm

Authors

  • Gean Trindade Pereira Federal University of Sao Carlos
  • Ricardo Cerri Federal University of Sao Carlos

Keywords:

Genetic Algorithms, Hierarchical Classification, Machine Learning, Rule Induction, Transposable Elements

Abstract

In traditional classification an instance is assigned to one class within a small set of classes. However, there are more complex problems where an instance is simultaneously related to many classes hierarchically structured. These problems are known as Hierarchical Classification (HC), that has become an interesting alternative for a range of tasks, such as Text Categorization, Music Genre Classification and, most commonly, Bioinformatics problems. In Bioinformatics, a topic that has gained attention is the classification of Transposable Elements (TEs), which are DNA sequences capable of moving inside the genome. TEs has a great importance in the genetic variability of species, since they can modifying the functionality of host genes. Despite the research relevance, just a few tools perform its automatic classification and most of them do not use more elaborated strategies, like using Machine Learning to generate models from data. Moreover, the interpretability of these methods is still an issue. In this work, TEs classification is addressed as both a flat and HC problem using a new rule induction method based on Genetic Algorithms along with other classifiers. Thus, our main contributions are: (i) introduce a new interpretable HC method capable to classify TEs at multiple levels of its hierarchy, and (ii) analyze the power of non-hierarchical classifiers to correctly predict TEs leaf node classes, comparing them with the proposed method. As experiments showed, flat methods do not performed well for HC datasets, even ignoring the hierarchical relationships among classes. We believe this occurred due the high imbalance of these datasets, which is something that flat methods do not handle well, unlike HC ones. HC-GA overcame flat classifiers presenting promising results for multiple class levels including leaf node classes, even it was not originally designed for this purpose and considering the difficult of predicting lower classes in a hierarchy.

Author Biographies

  • Gean Trindade Pereira, Federal University of Sao Carlos
    Master's Candidate, Department of Computer Science
  • Ricardo Cerri, Federal University of Sao Carlos
    Assistant Professor, Department of Computer Science

Downloads

Published

2018-10-01