Apache Kafka Patterns and Anti-Patterns

Apache Kafka offers the operational simplicity of data engineers' dreams. A message broker that allows clients to publish and read streams of data — Kafka has an ecosystem of open-source components that, when combined together, help store, process, and integrate data streams with other parts of your system in a secure, reliable, and scalable manner. This Refcard dives into select patterns and anti-patterns spanning across Kafka Client APIs, Kafka Connect, and Kafka Streams, covering topics such as reliable messaging, scalability, error handling, and more.

Apache Kafka Essentials

Dive into Apache Kafka: Readers will review its history and fundamental components — Pub/Sub, Kafka Connect, and Kafka Streams. Key concepts in these areas are supplemented with detailed code examples that demonstrate producing and consuming data, using connectors for easy data streaming and transformation, performing common operations in KStreams, and more.

Top 5 Apache Kafka Use Cases for 2022

Apache Kafka and Event Streaming are two of the most relevant buzzwords in tech these days. Do you wonder about my predicted top 5 event streaming architectures and use cases for 2022 to set data in motion? Check out the following presentation and learn about the Kappa architecture, hyper-personalized omnichannel, multi-cloud deployments, edge analytics, and real-time cybersecurity. 

Some followers might notice that I did the same presentation a year ago about the top 5 event streaming use cases for 2021. My predictions for 2022 partly overlap with this session. That's fine. It shows that event streaming with Apache Kafka is a journey and evolution to set data in motion.

Visualize your Apache Kafka Streams using the Quarkus Dev UI

This article shows how you can visualize Apache Kafka Streams with reactive applications using the Dev UI in Quarkus. Quarkus, a Java framework, provides an extension to utilize the Kafka Streams API and also lets you implement stream processing applications based directly on Kafka.

Reactive messaging and Apache Kafka

With the rise of event-driven architectures, many developers are adopting reactive programming to write business applications. The requirements for these applications literally specify that they not be processed in real-time because end users don't really expect synchronous communication experiences through web browsers or mobile devices. Instead, low latency is a more important performance criterion, regardless of data volume or concurrent users.

Kafka-Streams – Tips on How to Decrease Re-Balancing Impact for Real-Time Event Processing On Highly Loaded Topics

Overview

Kafka Rebalance happens when a new consumer is either added (joined) into the consumer group or removed (left). It becomes dramatic during application service deployment rollout, as multiple instances restarted at the same time, and rebalance latency significantly increasing. During rebalance, consumers stop processing messages for some period of time, and, as a result, processing of events from a topic happens with some delay. Some business cases could tolerate rebalancing, meanwhile, others require real-time event processing and it's painful to have delays in more than a few seconds. Here we will try to figure out how to decrease rebalance for Kafka-Streams clients (even though some tips will be useful for other Kafka consumer clients as well).

Let's look at one existing use case. We have a micro-service with 45 application instances, that is deployed into Docker Kubernetes with configured up-scaling and down-scaling (based on CPU load). This service consumes a single topic using Kafka-Streams (actually there are more consuming topics, but let's concentrate on a single one), a topic with 180 partitions and traffic is 20000 messages per second. We use Kafka Streams configuration property, num.stream.threads = 4so a single app instance processes 4 partitions in 4 threads (45 instances with 4 threads per each, so actually it means each partition out of 180 is processed by its own thread). As a result, a consumer should handle around 110 messages per second from a single partition. In our case, processing of a single message takes around 5 milliseconds and the stream is stateless (processing - both CPU and IO intensive, some invocations into databases and REST calls to other micro-services). 

How to Use Quarkus and Java to Secure Kafka Streams

Today you will learn how to use Quarkus and Apache Kafka to create a scalable and secure web application. We will use Kafka Streams and a small Kafka cluster to take data from a server to a client application as a real-time stream. We will also be securing the Kafka cluster with SSL and SASL/JAAS password protection. Lastly, you will secure the Quakrus client application using OAuth 2.0 and OIDC with Okta as the OIDC provider.

The architects of both Apache Kafka and Quarkus designed them for use in scalable clusters. Quarkus is a container-first Kubernetes Java framework that you’ll use to create a scalable, Java-based REST service and client application. It’s a high-performing tool for serverless and microservice environments. The container-first design packages the runtime environment along with the compiled code, allowing you to tightly optimize both and avoid the unwelcome surprises that can come along with operating system updates on servers. Developers build Quarkus apps with Java standard technologies, such as JAX-RS for REST interfaces, JPA for data modeling and persistence, and CDI for dependency injection.

Streaming Machine Learning With Kafka-Native Model Server

Apache Kafka became the de facto standard for event streaming across the globe and industries. Machine Learning (ML) includes model training on historical data and model deployment for scoring and predictions. While training is mostly batch, scoring usually requires real-time capabilities at scale and reliability. Apache Kafka plays a key role in modern machine learning infrastructures. The next-generation architecture leverages a Kafka-native streaming model server instead of RPC (HTTP/gRPC) calls:

This blog post explores the architectures and trade-offs between three options for model deployment with Kafka: Embedded model into the Kafka application, model server and RPC, model server, and Kafka-native communication.

How to Build your First Real-Time Streaming (CDC) System Part 1

Introduction

With the exponential growth of data and a lot of businesses moving online, it has become imperative to design systems that can act in real-time or near real-time to make any business decisions. So, after working on multiple backend projects through many years, I finally got to do build a real-time streaming platform. While working on the project, I did start experimenting with different tech stacks to deal with this. So, I am trying to share my learnings in a series of articles. Here is the first of them.

Target Audience

This post is aimed at engineers who are already familiar with microservices and Java and are looking to build their first real-time streaming pipeline. This POC is divided into 4 articles for the purpose of readability. They are as follows:

Proper Kubernetes Health Check for a Kafka Streams Application

When you grass your cattle, you typically configure a health check to keep your herd alive. A very common livenessProbe  is about doing a GET request to an endpoint, and if the service replies with a  two hundred, then we’re fine, otherwise, the pod is destroyed and a new one is brought to live:

livenessProbe:
  httpGet:
    path: /health
    port: http


Real-Time Stream Processing With Apache Kafka Part 4: Use Case

In previous articles, we have gained the ground on understanding basic terminologies used in Kafka and Kafka-Streams. In this article, we set up a single node kafka cluster on our Windows machine. Now, based on the knowledge we have gained so far, let us try to build a use case.

Scenario

Consider a hypothetical fleet management company that needs a dashboard to get the insight of its day to day activities related to vehicles. Each vehicle in this fleet management company is fitted with a GPS based geolocation emitter, which emits location data containing the following information