How Does the Spatial Data Redundancy Affect Query Performance in Geographic Data Warehouses?
Keywords:
benchmark, geographic data warehouse, performance evaluationAbstract
Geographic Data Warehouses (GDWs) are traditional data warehouses with spatial attributes that are used for defining spatial dimension tables, spatial measures and spatial hierarchies. Non-redundant spatial data warehouse schemas have been recognized as an essential issue in the GDW design.
Although the lack of spatial redundancy represents a gain in data storage, it implies in a need for performing expensive join operations to answer a given query that may refer to one or more query windows. In this paper, we investigate to what extent the separate storage of spatial and conventional data is recommended in GDW, according to increasing numbers of query windows.
We also investigate if the complexity of the spatial data (i.e. points versus polygons) influences the choice of storing spatial and conventional data in the same or in different dimension tables. Our experimental results indicated that if non-redundant spatial data are represented as point objects, an approach to avoid additional join costs by storing both point data and their descriptive data in a single table should be chosen. The results also showed that redundant GDW schemas introduce a severe drawback, as some spatial analytical queries cannot reuse previously fetched spatial data, impairing query performance.
Finally, based on the experimental results, we propose in this paper a set of guidelines for the design of logical GDW schemas, called ``Logical GDW Design Guidelines''.