ClickHouse: A Blazingly Fast DBMS With Full SQL Join Support — Under the Hood — Part 2

In our previous post, we reviewed the SQL JOIN types that are available in ClickHouse. As a reminder: ClickHouse comes with full SQL Join Support.

In this post, we’ll explore the internals of join execution in ClickHouse, so that you can optimize joins for queries used by your applications. Here, you’ll see how ClickHouse integrates these classical join algorithms into its query pipeline in order to execute the join types as fast as possible. 

ClickHouse: A Blazingly Fast DBMS With Full SQL Join Support

ClickHouse is an open-source real-time analytics database built and optimized for use cases requiring super-low latency analytical queries over large amounts of data. To achieve the best possible performance for analytical applications, combining tables in a data denormalization process is typical. Flattened tables help minimize query latency by avoiding joins at the cost of incremental ETL complexity, typically acceptable in return for sub-second queries. 

However, denormalizing data isn't always practical for some workloads, for instance, those coming from more traditional data warehouses. Sometimes, part of the source data for analytical queries needs to remain normalized. These normalized tables take less storage and provide flexibility with data combinations, but they require joins at query time for certain types of analysis.