query engine | The Blog Pros

June 13, 2021

Memoization in Cost-based Optimizers

Query optimization is an expensive process that needs to explore multiple alternative ways to execute the query. The query optimization problem is NP-hard, and the number of possible plans grows exponentially with the query’s complexity. For example, a typical TPC-H query may have up to several thousand possible join orders, 2–3 algorithms per join, a couple of access methods per table, some filter/aggregate pushdown alternatives, etc. Combined, this could quickly explode the search space to millions of alternative plans.

This blog post will discuss memoization — an important technique that allows cost-based optimizers to consider billions of alternative plans in a reasonable time.

May 8, 2021

Rule-Based Query Optimization

The goal of the query optimizer is to find the query execution plan that computes the requested result efficiently. In this blog post, we discuss rule-based optimization - a common pattern to explore equivalent plans used by modern optimizers. Then we explore the implementation of several state-of-the-art rule-based optimizers. Then we analyze the rule-based optimization in Apache Calcite, Presto, and CockroachDB.

Transformations

A query optimizer must explore the space equivalent execution plans and pick the optimal one. Intuitively, plan B is equivalent to plan A if it produces the same result for all possible inputs.

April 21, 2020

Client Oriented Dynamic Search Query Supporting Multiple Tables in Spring

Backdrop

To begin with, this an example primarily written in Springboot to leverage the benefits of Spring Data JPA. The main motive of this article to have a simple and common data search logic that applies to almost every table and is client-oriented. This article is heavily inspired by one from Eugen Paraschiv, I recommend going through his tutorials to learn Spring professionally.

Pre-Requisites for Getting Started

Java 8 is installed.
Any Java IDE (preferably STS or IntelliJ IDEA).
Basic understanding of Java and Spring-based web development along with Spring Data JPA.

I used Spring Initializer to add all the dependencies and create a blank working project with all my configurations. I used Maven as project build type and Java 8 as language, though this part is up to your choice as long as it is supported by spring. Below are my required dependencies which can easily be added from spring initializer.

June 20, 2019

SQL on Kafka With Presto (Video)

Presto is a state of the art Distributed SQL Query Engine for big data, enabling efficient querying on cold data and various data sources. With extended SQL language and features like geospatial queries, joins between different data sources (SQL to join data from HDFS, Elasticsearch, and Kafka anyone?), and the ability to run on containers and cheap servers, Presto is slowly becoming the standard ad-hoc querying engine for big data.

In this talk, we will present Presto and how it can be used with Kafka. We will discuss data architectures, Presto features and why is it so good for your data, and finally see how it can be leveraged to querying data from Kafka as well as executing a single SQL statement that joins data from Kafka on data from SQL, Cassandra, Elastic, and more.