Vladimir Ozerov | The Blog Pros

June 13, 2021

Memoization in Cost-based Optimizers

Query optimization is an expensive process that needs to explore multiple alternative ways to execute the query. The query optimization problem is NP-hard, and the number of possible plans grows exponentially with the query’s complexity. For example, a typical TPC-H query may have up to several thousand possible join orders, 2–3 algorithms per join, a couple of access methods per table, some filter/aggregate pushdown alternatives, etc. Combined, this could quickly explode the search space to millions of alternative plans.

This blog post will discuss memoization — an important technique that allows cost-based optimizers to consider billions of alternative plans in a reasonable time.

May 8, 2021

Rule-Based Query Optimization

The goal of the query optimizer is to find the query execution plan that computes the requested result efficiently. In this blog post, we discuss rule-based optimization - a common pattern to explore equivalent plans used by modern optimizers. Then we explore the implementation of several state-of-the-art rule-based optimizers. Then we analyze the rule-based optimization in Apache Calcite, Presto, and CockroachDB.

Transformations

A query optimizer must explore the space equivalent execution plans and pick the optimal one. Intuitively, plan B is equivalent to plan A if it produces the same result for all possible inputs.

May 6, 2021

Assembling a Query Optimizer with Apache Calcite

Introduction

Apache Calcite is a dynamic data management framework with SQL parser, optimizer, executor, and JDBC driver.

Many examples of Apache Calcite usage demonstrate the end-to-end execution of queries using JDBC driver, some built-in optimization rules, and the Enumerable executor. Our customers often have their own execution engines and JDBC drivers. So how to use Apache Calcite for query optimization only, without its JDBC driver and Enumerable executor?

May 1, 2021

Inside Presto Optimizer

Abstract

Presto is an open-source distributed SQL query engine for big data. Presto provides a connector API to interact with different data sources, including RDBMSs, NoSQL products, Hadoop, and stream processing systems. Created by Facebook, Presto received wide adoption by the open-source world (Presto, Trino) commercial companies (e.g., Ahana, Qubole).

Presto comes with a sophisticated query optimizer that applies various rewrites to the query plan. In this blog post series, we investigate the internals of Presto optimizer. In the first part, we discuss the optimizer interface and the design of the rule-based optimizer.

April 24, 2021

Custom Traits in Apache Calcite

Abstract

Physical properties are an essential part of the optimization process that allows you to explore more alternative plans.

Apache Calcite comes with convention and collation (sort order) properties. Many query engines require custom properties. For example, distributed and heterogeneous engines that we often see in our daily practice need to carefully plan the movement of data between machines and devices, which requires a custom property to describe data location.