Tips for High-Performance ClickHouse Clusters with S3 Object Storage

In our previous blog posts, we explained the various ways that ClickHouse can use S3 object storage. To keep things simple we generally focused on single-node operation. However, ClickHouse often runs in a cluster, and cluster operation poses some interesting questions regarding S3 usage. They include parallelizing data load across nodes, benefits of horizontal vs. vertical scaling, and avoiding unnecessary replication. 

In this article, we will discuss how ClickHouse clusters can be used with S3 efficiently thanks to two important new features: the ‘s3Cluster‘ table function and zero-copy replication. We hope our description will pave the way for more ClickHouse users to exploit scalable, inexpensive object storage in their deployments.