DAHC-tree: An Effective Index for Approximate Search in High-Dimensional Metric Spaces

Authors

  • Jurandy Almeida University of Campinas
  • Eduardo Valle University of Campinas
  • Ricardo da S. Torres University of Campinas
  • Neucimar J. Leite University of Campinas

Keywords:

clustering methods, database indexing, metric access methods, metric spaces, similarity search

Abstract

Similarity search in high-dimensional metric spaces is a key operation in many applications, such as multimedia databases, image retrieval, object recognition, and others. The high dimensionality of the data requires special index structures to facilitate the search. A problem regarding the creation of suitable index structures for high-dimensional data is the relationship between the geometry of the data and the organization of an index structure. In this paper, we study the performance of a new index structure, called Divisive-Agglomerative Hierarchical Clustering tree (DAHC-tree), which reduces the effects imposed by the above liability. DAHC-tree is constructed by dividing and grouping the data set into compact clusters. We perform a rigorous experimental design and analyze the trade-offs involved in building such an index structure. Additionally, we present extensive experiments comparing our method against state-of-the-art of exact and approximate solutions. The conducted analysis and the reported comparative test results demonstrate that our technique significantly improves the performance of similarity queries.

Downloads

Published

2010-09-09

Issue

Section

Regular Articles