Apache Kafka Essentials
Problems With Kafka Streams
Originally published December 11, 2017
Before diving straight into the main topic, let me introduce you to Kafka Streams first.
Top 5 Apache Kafka Use Cases for 2022
Apache Kafka and Event Streaming are two of the most relevant buzzwords in tech these days. Do you wonder about my predicted top 5 event streaming architectures and use cases for 2022 to set data in motion? Check out the following presentation and learn about the Kappa architecture, hyper-personalized omnichannel, multi-cloud deployments, edge analytics, and real-time cybersecurity.
Some followers might notice that I did the same presentation a year ago about the top 5 event streaming use cases for 2021. My predictions for 2022 partly overlap with this session. That's fine. It shows that event streaming with Apache Kafka is a journey and evolution to set data in motion.
Visualize your Apache Kafka Streams using the Quarkus Dev UI
This article shows how you can visualize Apache Kafka Streams with reactive applications using the Dev UI in Quarkus. Quarkus, a Java framework, provides an extension to utilize the Kafka Streams API and also lets you implement stream processing applications based directly on Kafka.
Reactive messaging and Apache Kafka
With the rise of event-driven architectures, many developers are adopting reactive programming to write business applications. The requirements for these applications literally specify that they not be processed in real-time because end users don't really expect synchronous communication experiences through web browsers or mobile devices. Instead, low latency is a more important performance criterion, regardless of data volume or concurrent users.
Kafka-Streams – Tips on How to Decrease Re-Balancing Impact for Real-Time Event Processing On Highly Loaded Topics
Overview
Kafka Rebalance happens when a new consumer is either added (joined) into the consumer group or removed (left). It becomes dramatic during application service deployment rollout, as multiple instances restarted at the same time, and rebalance latency significantly increasing. During rebalance, consumers stop processing messages for some period of time, and, as a result, processing of events from a topic happens with some delay. Some business cases could tolerate rebalancing, meanwhile, others require real-time event processing and it's painful to have delays in more than a few seconds. Here we will try to figure out how to decrease rebalance for Kafka-Streams clients (even though some tips will be useful for other Kafka consumer clients as well).
Let's look at one existing use case. We have a micro-service with 45 application instances, that is deployed into Docker Kubernetes with configured up-scaling and down-scaling (based on CPU load). This service consumes a single topic using Kafka-Streams (actually there are more consuming topics, but let's concentrate on a single one), a topic with 180 partitions and traffic is 20000 messages per second. We use Kafka Streams configuration property, num.stream.threads = 4
so a single app instance processes 4 partitions in 4 threads (45 instances with 4 threads per each, so actually it means each partition out of 180 is processed by its own thread). As a result, a consumer should handle around 110 messages per second from a single partition. In our case, processing of a single message takes around 5 milliseconds and the stream is stateless (processing - both CPU and IO intensive, some invocations into databases and REST calls to other micro-services).
How to Use Quarkus and Java to Secure Kafka Streams
Today you will learn how to use Quarkus and Apache Kafka to create a scalable and secure web application. We will use Kafka Streams and a small Kafka cluster to take data from a server to a client application as a real-time stream. We will also be securing the Kafka cluster with SSL and SASL/JAAS password protection. Lastly, you will secure the Quakrus client application using OAuth 2.0 and OIDC with Okta as the OIDC provider.
The architects of both Apache Kafka and Quarkus designed them for use in scalable clusters. Quarkus is a container-first Kubernetes Java framework that you’ll use to create a scalable, Java-based REST service and client application. It’s a high-performing tool for serverless and microservice environments. The container-first design packages the runtime environment along with the compiled code, allowing you to tightly optimize both and avoid the unwelcome surprises that can come along with operating system updates on servers. Developers build Quarkus apps with Java standard technologies, such as JAX-RS for REST interfaces, JPA for data modeling and persistence, and CDI for dependency injection.
Streaming Machine Learning With Kafka-Native Model Server
Apache Kafka became the de facto standard for event streaming across the globe and industries. Machine Learning (ML) includes model training on historical data and model deployment for scoring and predictions. While training is mostly batch, scoring usually requires real-time capabilities at scale and reliability. Apache Kafka plays a key role in modern machine learning infrastructures. The next-generation architecture leverages a Kafka-native streaming model server instead of RPC (HTTP/gRPC) calls:
This blog post explores the architectures and trade-offs between three options for model deployment with Kafka: Embedded model into the Kafka application, model server and RPC, model server, and Kafka-native communication.
AMQ Streams With Kafka Connect on Openshift
Have you ever heard about AMQ Streams? It is a Kafka Platform based on Apache Kafka and powered by Red Hat. A big advantage of AMQ Streams is that as all Red Hat tools, it is open source.
This article's about a common feature of Apache Kafka, called Kafka Connect. I'll explain how it works with OpenShift.
How to Build your First Real-Time Streaming (CDC) System Part 1
Introduction
With the exponential growth of data and a lot of businesses moving online, it has become imperative to design systems that can act in real-time or near real-time to make any business decisions. So, after working on multiple backend projects through many years, I finally got to do build a real-time streaming platform. While working on the project, I did start experimenting with different tech stacks to deal with this. So, I am trying to share my learnings in a series of articles. Here is the first of them.
Target Audience
This post is aimed at engineers who are already familiar with microservices and Java and are looking to build their first real-time streaming pipeline. This POC is divided into 4 articles for the purpose of readability. They are as follows:
Proper Kubernetes Health Check for a Kafka Streams Application
When you grass your cattle, you typically configure a health check to keep your herd alive. A very common livenessProbe
is about doing a GET
request to an endpoint, and if the service replies with a two hundred
, then we’re fine, otherwise, the pod is destroyed and a new one is brought to live:
livenessProbe:
httpGet:
path: /health
port: http
Real-Time Stream Processing With Apache Kafka Part 2: Kafka Stream API
This is the second part of the four parts series of articles. In the previous article, we introduced you to Apache Kafka. In this article, we will briefly discuss Kafka APIs with special attention given to Kafka's Streams API.
Kafka Terminologies
Before we have a deep dive in Kafka streams, here's a quick refresher on important concepts in Kafka.
Real-Time Stream Processing With Apache Kafka Part 4: Use Case
In previous articles, we have gained the ground on understanding basic terminologies used in Kafka and Kafka-Streams. In this article, we set up a single node kafka cluster on our Windows machine. Now, based on the knowledge we have gained so far, let us try to build a use case.
Scenario
Consider a hypothetical fleet management company that needs a dashboard to get the insight of its day to day activities related to vehicles. Each vehicle in this fleet management company is fitted with a GPS based geolocation emitter, which emits location data containing the following information