Publication
Informatik - Forschung und Entwicklung
Paper

A learning optimizer for a federated database management system

View publication

Abstract

Optimizers in modern DBMSs utilize a cost model to choose an efficient query execution plan (QEP) among all possible plans for a given query. The accuracy of the cost estimates depends heavily on accurate statistics about the underlying data. Outdated statistics or wrong assumptions in the underlying statistical model frequently lead to suboptimal selection of execution plans and thus to bad query performance. Federated systems require additional statistics on remote data to be kept on the federated DBMS in order to choose the most efficient execution plan when joining data from different data sources. Wrong statistics within a federated DBMS can cause not only suboptimal data access strategies but also unbalanced workload distribution as well as unnecessarily high network traffic and communication overhead. The maintenance of statistics in a federated DBMS is troublesome due to the independence of the remote DBMSs that might not expose their statistics or use different models and not collect all statistics needed by the federated DBMS. We present an approach that extends DB2's learning optimizer to automatically find flaws in statistics on remote data by extending its query feedback loop towards the federated architecture. We will discuss several approaches to get feedback from remote queries and present our solution that utilizes local query feedback and remote query feedback to retrieve information needed to compute statistics profiles. We provide a detailed performance study and analysis of our approach, and demonstrate in a case study a potential query execution speedup of orders of magnitude while only incurring a moderate overhead during query execution.

Date

Publication

Informatik - Forschung und Entwicklung

Authors

Topics

Share