Indexing Web Form Constraints
Abstract
Millions of online databases are available today on the Web that cover many diferent domains. These databases are accessible through forms and provide several useful services from searching for rental cars and airfares to used cars and genes. To leverage this information and locate online databases that are relevant for particular information needs, we have created a search engine that is specialized in forms that serve as the entry points to these databases. This search engine, however, only provides a keyword-based interface that greatly limits the kinds of queries that can be posed. In this paper, we study the problem of supporting structured queries over Web form collections. We formalize the problem of querying Web forms as satisfying constraints that hold between form attributes and their values, form metadata, as well as dependencies across distinct attributes. We also propose an indexing method that leverages the constraints to support eficient query processing. Because the proposed index extends traditional inverted indexes, our method can be easily combined with existing text searching tools. An experimental evaluation, where we compare query performance under the proposed index against diferent storage configurations using a relational database, shows that with our index, structured queries can be evaluated at a fraction of the time required by the relational database. We also show the higher precision of the result sets generated by our structured queries against traditional keyword-based queries over Web forms.