Cost model for estimating the performance of spatial joins using R-trees
Abstract
The development of a cost model for predicting the performance of spatial joins has been identified in the literature as an important and difficult problem. In this paper, we present the first cost model that can predict the performance of spatial joins using R-trees. Based on two existing R-trees (join targets), our model first estimates the number of expected I/Os for the join process by assuming a zero buffer size. Our method for this estimation extends the cost model for R-tree window queries (developed by Kamel and Faloutsos and by Pagel et al.) to also handle spatial joins (which are more complex). In the context of spatial join processing, this number of zero-buffer expected I/Os is not practical for performance prediction in a buffered environment. To model the buffer impact, we use an (exponential) distribution function to measure the probability that a buffer-less I/O would cause a page fault in a buffered environment. Based on this probability and the zero-buffer expected I/O cost, the estimated number of I/Os for an R-tree join can then be computed. The comparisons between the predictions from our cost model and the actual results from our experiments based on real GIS maps show that the average relative error ratio is about 10% with a maximum of about 20% for a wide range of buffer sizes. Therefore, our model is a useful tool for the query optimization of spatial join queries.