High Availability Kubernetes Monitoring Using Prometheus and Thanos

Introduction

The Need for Prometheus High Availability

Kubernetes adoption has grown multifold in the past few months and it is now clear that Kubernetes is the defacto for container orchestration. That being said, Prometheus is also considered an excellent choice for monitoring both containerized and non-containerized workloads. Monitoring is an essential aspect of any infrastructure, and we should make sure that our monitoring set-up is highly-available and highly-scalable in order to match the needs of an ever growing infrastructure, especially in the case of Kubernetes.

Therefore, today we will deploy a clustered Prometheus set-up which is not only resilient to node failures, but also ensures appropriate data archiving for future references. Our set-up is also very scalable, to the extent that we can span multiple Kubernetes clusters under the same monitoring umbrella.

Present Scenario

Majority of Prometheus deployments use persistent volume for pods, while Prometheus is scaled using a federated set-up. However, not all data can be aggregated using a federated mechanism, where you often need a mechanism to manage Prometheus configuration when you add additional servers.

The Solution

Thanos aims at solving the above problems. With the help of Thanos, we can not only multiply instances of Prometheus and de-duplicate data across them, but also archive data in a long term storage such as GCS or S3.

Implementation

Thanos Architecture

Upload Files to Google Cloud Storage with Python

Google Cloud is a suite of cloud-based services just like AWS from Amazon and Azure from Microsoft. AWS dominates the market with Azure but Google's not far behind. Google Cloud Platform or GCP is the third largest cloud computing platform in the world, with a share of 9% closely followed by Alibaba Cloud. 

Amazon undoubtedly leads the market with a share of 33% but GCP is showing tremendous spike with the growth rate of whooping 83% in 2019. GCP leads AWS on the cost front, though. Google has a lesser number of services to offer but maintains its position as one of the most cost-effective cloud platform. 

Introducing Wormhole: Fast Dockerized Presto and Alluxio Setups

Just like a real wormhole, this tool is all about speed.

This blog introduces Wormhole, an open-source Dockerized solution for deploying Presto and Alluxio clusters for blazing-fast analytics on file system (we use S3, GCS, OSS). When it comes to analytics, generally people are hands-on in writing SQL queries and love to analyze data that resides in a warehouse (e.g. MySQL database). But as data grows, these stores start failing and there arises a need for getting the faster results in the same or a shorter time frame. This can be solved by distributed computing and Presto is designed for that. When attached to Alluxio, it works even more, faster. That’s what Wormhole is all about.

You may also enjoy:  Alluxio Cluster Setup Using Docker

Here is the high-level architecture diagram of solution: