Embedding k-Nearest Neighbor Queries into Relational Database Management Systems
Keywords:
k-Nearest neighbors, Relational model, Similarity algebra, Similarity searchAbstract
The ability to use, in a single query, both similarity and traditional select operators inside a Relational Database Management System (RDBMS) has proven to be a valuable tool, which provides a strong capability for better contextualizing queries involving complex data. However, the integration of both query operator classes is still in its early stages, as most works on similarity queries focus on the data structures and operations required to execute just the similarity retrieval operation, but few of them target problems on the integration itself. Among them, the k-Nearest Neighbor (k-NN) comparison operator gives ground to conflicting interpretations when composed with others. In this article we present a novel approach to remove dubiety when similarity-selections based on k-NN are embedded into a RDBMS, formalize the possible interpretations, and show how to resolve implementation issues that occur when executing them on database relations that include attributes queried by similarity. We also implemented a prototype for a RDBMS that also supports queries employing k-NN predicates and evaluated the proposed interpretations, measuring the performance of their respective operators, as well as the semantic implications. The proposed solution is versatile, being able to handle queries combining similarity-based and traditional predicates.