In this article, we are going to see how to deploy a Kafka cluster in Kubernetes using Strimzi and how to easily monitor it with Pixie.
Kubernetes Setup
We are going to use Google Cloud, but you can use any cloud provider or on-prem solution.
Tips, Expertise, Articles and Advice from the Pro's for Your Website or Blog to Succeed
Apache Kafka was open-sourced by LinkedIn in early 2011. Despite all the initial limitations, it was a huge success and it became the de-facto standard for streaming data. The performance, possibility to replay events and multiple consumers independently were some of the features which disrupted the streaming arena.
But Kafka has been also known for its difficult learning curve and difficulties with the operation. In my experience, both things are improved a lot in the last few years but the original gotchas remain:
One of the most important concepts for stream-processing frameworks is the concept of time. There are different concepts of time:
Apache Flink has excellent support for Event time processing, probably the best of the different stream-processing frameworks available. For more information, you can read Notions of Time: Event Time and Processing Time in the official documentation. If you prefer videos, Streaming Concepts and Introduction to Flink - Event Time and Watermarks is a good explanation.
This post is a gentle introduction to Apache Avro. After several discussions with Dario Cazas about what’s possible with Apache Avro, he did some research and summarized it in an email. I found myself looking for that email several times to forward it to different teams to clarify doubts about Avro. After a while, I thought it could be useful for others, and this is how this series of three posts was born.
In summary, Apache Avro is a binary format with the following characteristics: