3 Simple Ideas to Make Your Life Easier With Kafka

Apache Kafka was open-sourced by LinkedIn in early 2011. Despite all the initial limitations, it was a huge success and it became the de-facto standard for streaming data. The performance, possibility to replay events and multiple consumers independently were some of the features which disrupted the streaming arena.

But Kafka has been also known for its difficult learning curve and difficulties with the operation. In my experience, both things are improved a lot in the last few years but the original gotchas remain:

Setting up Kafka Cluster With Gluster-Block Storage

Red Hat AMQ Streams

Red Hat AMQ Streams is a massively-scalable, distributed, and high-performance data streaming platform based on the Apache ZooKeeper and Apache Kafka projects.

Kafka Bridge

AMQ Streams Kafka Bridge provides a Restful interface that allows HTTP-based clients to interact with a Kafka cluster.  Kafka Bridge offers the advantages of a web API connection to AMQ Streams, without the need for client applications to interpret the Kafka protocol.

How to Build your First Real-Time Streaming (CDC) System Part 1

Introduction

With the exponential growth of data and a lot of businesses moving online, it has become imperative to design systems that can act in real-time or near real-time to make any business decisions. So, after working on multiple backend projects through many years, I finally got to do build a real-time streaming platform. While working on the project, I did start experimenting with different tech stacks to deal with this. So, I am trying to share my learnings in a series of articles. Here is the first of them.

Target Audience

This post is aimed at engineers who are already familiar with microservices and Java and are looking to build their first real-time streaming pipeline. This POC is divided into 4 articles for the purpose of readability. They are as follows:

Apache Kafka Topics: Architecture and Partitions

What Is a Kafka Topic?

A Kafka topic is essentially a named stream of records. Kafka stores topics in logs. However, a topic log in Apache Kafka is broken up into several partitions. And, further, Kafka spreads those log’s partitions across multiple servers or disks. In other words, we can say a topic in Kafka is a category, stream name, or a feed.

In addition, we can say topics in Apache Kafka are a pub-sub style of messaging. Moreover, there can be zero to many subscribers called Kafka consumer groups in a Kafka topic. Basically, these topics in Kafka are broken up into partitions for speed, scalability, as well as size.