Near-optimal algorithms for shared filter evaluation in data stream systems
Abstract
We consider the problem of evaluating multiple overlapping queries defined on data streams, where each query is a conjunction of multiple filters and each filter may be shared across multiple queries. Efficient support for overlapping queries is a critical issue in the emerging data stream systems, and this is particularly the case when filters are expensive in terms of their computational complexity and processing time. This problem generalizes other well-known problems such as pipelined filter ordering and set cover, and is not only NP-Hard but also hard to approximate within a factor of o(log n) from the optimum, where n is the number of queries. In this paper, we present two near-optimal approximation algorithms with provably-good performance guarantees for the evaluation of overlapping queries. We present an edge-coverage based Greedy algorithm which achieves an approximation ratio of (1 + log(n) + log(α)), where n is the number of queries and α is the average number of filters in a query. We also present a randomized, fast and easily par-allelizable Harmonic algorithm which achieves an approximation ratio of 2,β, where β is the maximum number of filters in a query. We have implemented these algorithms in a prototype system, and evaluated their performance using extensive experiments in the context of multimedia stream analysis. The results show that our Greedy algorithm consistently outperforms other known algorithms under various settings and scales well as the numbers of queries and filters increase. Copyright 2008 ACM.