Canonical abstraction for outerjoin optimization
Abstract
Outerjoins are an important class of joins and are widely used in various kinds of applications. It is challenging to optimize queries that contain outerjoins because outerjoins do not always commute with inner joins. Previous work has studied this problem and provided techniques that allow certain reordering of the join sequences. However, the optimization of outerjoin queries is still not as powerful as that of inner joins. An inner join query can always be canonically represented as a sequence of Cartesian products of all relations, followed by a sequence of selection operations, each applying a conjunct in the join predicates. This canonical abstraction is very powerful because it enables the optimizer to use any join sequence for plan generation. Unfortunately, such a canonical abstraction for outerjoin queries has not been developed. As a result, existing techniques always exclude certain join sequences from planning, which can lead to a severe performance penalty. Given a query consisting of a sequence of inner and outer joins, we, for the first time, present a canonical abstraction based on three operations: outer Cartesian products, nullification, and best match. Like the inner join abstraction, our outerjoin abstraction permits all join sequences, and preserves the property of both commutativity and transitivity among predicates. This allows us to generate plans that are very desirable for performance reasons but that couldn't be done before. We present an algorithm that produces such a canonical abstraction, and a method that extends an inner-join optimizer to generate plans in an expanded search space. We also describe an efficient implementation of the best match operation using the OLAP funtionalities in SQL:1999. Our experimental results show that our technique can significantly improve the performance of outerjoin queries.