MAXENT: Consistent cardinality estimation in action

V. Markl; M. Kutsch; T.M. Tran; P.J. Haas; Nimrod Megiddo

doi:10.1145/1142473.1142586

SIGMOD 2006

Conference paper

01 Dec 2006

MAXENT: Consistent cardinality estimation in action

View publication

Abstract

When comparing alternative query execution plans (QEPs), a cost-based query optimizer in a relational database management system needs to estimate the selectivity of conjunctive predicates. To avoid inaccurate independence assumptions, modern optimizers try to exploit multivariate statistics (MVS) that provide knowledge about joint frequencies in a table of a relation. Because the complete joint distribution is almost always too large to store, optimizers are given only partial knowledge about this distribution. As a result, there exist multiple, non-equivalent ways to estimate the selectivity of a conjunctive predicate. To consistently combine the partial knowledge during the estimation process, existing optimizers employ cumbersome ad hoc heuristics. These methods unjustifiably ignore valuable information, and the optimizer tends to favor QEPs for which the least information is available. This bias problem yields poor QEP quality and performance. We demonstrate MAXENT, a novel approach based on the maximum entropy principle, prototyped in IBM DB2 LUW. We illustrate MAXENT's ability to consistently estimate the selectivity of conjunctive predicates on a per-table basis. In contrast to the DB2 optimizer's current ad hoc methods, we show how MAXENT exploits all available information about the joint column distribution and thus avoids the bias problem. For some complex queries against a real-world database, we show that MAXENT improves selectivity estimates by orders of magnitude relative to the current DB2 optimizer, and also show how these improved estimate influence plan choices as well as query execution times. Copyright 2006 ACM.

Conference paper

MAXENT: Consistent cardinality estimation in action

Abstract

Related

Avatar semantic search: A database approach to information retrieval

Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

ISOMER: Consistent histogram construction using query feedback

Recovery from "bad" user transactions