Apache Kafka | The Blog Pros

January 23, 2024

Integrating Rate-Limiting and Backpressure Strategies Synergistically To Handle and Alleviate Consumer Lag in Apache Kafka

Apache Kafka stands as a robust distributed streaming platform. However, like any system, it is imperative to proficiently oversee and control latency for optimal performance. Kafka Consumer Lag refers to the variance between the most recent message within a Kafka topic and the message that has been processed by a consumer. This lag may arise when the consumer struggles to match the pace at which new messages are generated and appended to the topic. Consumer lag in Kafka may manifest due to various factors. Several typical reasons for consumer lag are.

Insufficient consumer capacity.
Slow consumer processing.
High rate of message production.

Additionally, Complex data transformations, resource-intensive computations, or long-running operations within consumer applications can delay message processing. Poor network connectivity, inadequate hardware resources, or misconfigured Kafka brokers can eventually increase lag too.
In a production environment, it's essential to minimize lag to facilitate real-time or nearly real-time message processing, ensuring that consumers can effectively match the message production rate.

December 23, 2023

REST Clients With OpenFeign: How to Implement Them

In previous posts, we have seen how to implement distributed configuration and service discovery in Spring Cloud. In this article, we will discuss how to implement REST calls between microservices. We will implement REST clients with OpenFeign software.

Services Communication

There are two main styles of service communication: synchronous and asynchronous. The asynchronous type involves calls where the thread making the call does not wait for a response from the server and is not blocked. A typical protocol used for such scenarios is AMQP. This protocol involves messaging architectures, with message brokers like RabbitMQ and Apache Kafka.

December 19, 2023

How Lufthansa Uses Apache Kafka for Data Integration and Machine Learning

Aviation and travel are notoriously vulnerable to social, economic, and political events, as well as the ever-changing expectations of consumers. The coronavirus was just a piece of the challenge. This post explores how Lufthansa leverages data streaming powered by Apache Kafka as cloud-native middleware for mission-critical data integration projects and as data fabric for AI/machine learning scenarios such as real-time predictions in fleet management. An interactive conversation with Lufthansa as an on-demand video is added at the end as a highlight if you want to learn more.

Data Streaming in the Aviation Industry

The future business of airlines and airports will be digitally integrated into the ecosystem of partners and suppliers. Companies will provide more personalized customer experiences and be enabled by a new suite of the latest technologies, including automation, robotics, and biometrics.

December 8, 2023

Real-Time Advertising With Apache Kafka and Flink

An advertising platform requires real-time capabilities to provide dynamic targeting, ad personalization, ad fraud detection, budget allocation, and event-driven marketing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables context-specific advertising at any scale. Real-world success stories from Pinterest, Uber, Reddit, Unity, buzzkill, and TV-Insight show different solutions and architectures for serving ads in marketing campaigns, embedded into mobile apps, and as SaaS software products.

What Is a Digital Advertising Platform?

An advertising (ads) platform is a digital system or service that allows businesses and advertisers to create, manage, and optimize their advertising campaigns across various channels. These platforms provide tools and features to target specific audiences, allocate budgets, track performance, and measure the effectiveness of advertising efforts.

December 8, 2023

Building Robust Real-Time Data Pipelines With Python, Apache Kafka, and the Cloud

In today's highly competitive landscape, businesses must be able to gather, process, and react to data in real-time in order to survive and thrive. Whether it's detecting fraud, personalizing user experiences, or monitoring systems, near-instant data is now a need, not a nice-to-have.

However, building and running mission-critical, real-time data pipelines is challenging. The infrastructure must be fault-tolerant, infinitely scalable, and integrated with various data sources and applications. This is where leveraging Apache Kafka, Python, and cloud platforms comes in handy.

October 5, 2023

Apache Kafka as Mission Critical Data Fabric for GenAI

Apache Kafka serves thousands of enterprises as the mission-critical and scalable real-time data fabric for machine learning infrastructures. The evolution of Generative AI (GenAI) with large language models (LLM) like ChatGPT changed how people think about intelligent software and automation. This blog post explains the relationship between data streaming and GenAI and shows the enormous opportunities and some early adopters of GenAI beyond the buzz.

data streaming as data fabric for generative AI

Generative AI (GenAI) and Data Streaming

Let’s set the context first to have the same understanding of the buzzwords.

September 25, 2023

Causes and Remedies of Poison Pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts. Poison Pill scenarios are frequently underestimated and can arise if not properly accounted for. Neglecting to address them can result in severe disruptions to the seamless operation of an event-driven system.

The poison pill for various reasons:

August 21, 2023

Apache Kafka’s Built-In Command Line Tools

Several tools/scripts are included in the bin directory of the Apache Kafka binary installation. Even if that directory has a number of scripts, through this article, I want to highlight the five scripts/tools that I believe will have the biggest influence on your development work, mostly related to real-time data stream processing.

After setting up the development environment, followed by installation and configuration of either with single-node or multi-node Kafka cluster, the first built-in script or tool is kafka-topic.sh.

August 8, 2023

Apache Kafka as Workflow and Orchestration Engine

Business process automation with a workflow engine or BPM suite has existed for decades. However, using the data streaming platform Apache Kafka as the backbone of a workflow engine provides better scalability, higher availability, and simplified architecture. This blog post explores the concepts behind using Kafka for persistence with stateful data processing and when to use it instead or with other tools. Case studies across industries show how enterprises like Salesforce or Swisscom implement stateful workflow automation and orchestration with Kafka.

What Is a Workflow Engine for Process Orchestration?

A workflow engine is a software application that orchestrates human and automated activities. It creates, manages, and monitors the state of activities in a business process, such as the processing and approval of a claim in an insurance case, and determines the extra activity it should transition to according to defined business processes. Sometimes, such software is called an orchestration engine instead.

May 18, 2023

Apache Kafka vs. Message Queue: Trade-Offs, Integration, Migration

A message broker has very different characteristics and use cases than a data streaming platform like Apache Kafka. A business process requires more than just sending data in real time from a data source to a data sink. Data integration, processing, governance, and security must be reliable and scalable end-to-end across the business process. This blog post explores the capabilities of message brokers, the relation to the JMS standard, trade-offs compared to data streaming with Apache Kafka, and typical integration and migration scenarios. A case study explores the migration from IBM MQ to Apache Kafka. The last section contains a complete slide deck that covers all these aspects in more detail.

Message Broker vs. Apache Kafka -> Apples and Oranges

TL;DR: Message brokers send data from a data source to one or more data sinks in real-time. Data streaming provides the same capability but adds long-term storage, integration, and processing capabilities.

April 25, 2023

How To Install CMAK, Apache Kafka, Java 18, and Java 19 [Video Tutorials]

This article explains how to install the following in video tutorials:

CMAK (Cluster manager for Apache Kafka) or Kafka manager
Apache Kafka on Windows or Windows 10/11
Java 18 on Windows 10/11, JDK installation
Java 19 on Windows 10/11, JDK installation
Java JDK 19 on Amazon EC2 Instance or Linux operating system
Apache Kafka on Amazon EC2 Instance or Linux operating system

How to Install and Use CMAK (Cluster Manager for Apache Kafka) or Kafka Manager

Learn how to install and use CMAK. This is a tool for managing Apache Kafka clusters that allow us to view all the topics, partitions, numbers of offsets, and which are assigned to what and all topics, etc.

April 19, 2023

Apache Kafka in Java [Video Tutorials]: Architecture and Simple Consumer/Producer

In this video tutorial series on Apache Kafka in Java, explore:

Apache Kafka architecture and Apache Kafka components
Creating a simple producer and consumer in Java
Apache Kafka producer callbacks (producer with and without keys) in Java and Spring Boot

Apache Kafka Architecture and Components

This Apache Kafka designed for beginners takes a deep look at Kafka architecture and components.

April 18, 2023

Apache Kafka for Data Consistency

Real-time data beats slow data in almost all use cases. But as essential is data consistency across all systems, including non-real-time legacy systems and modern request-response APIs. Apache Kafka's most underestimated feature is the storage component based on the append-only commit log. It enables loose coupling for domain-driven design with microservices and independent data products in a data mesh. This blog post explores how Kafka enables data consistency with a real-world case study from financial services.

Apache Kafka = Real-Time Data Streaming

Real-time beats slow data. It is that easy in almost all use cases. Ask any executive or business person: What's better? If you can consume and use information now or later? The value of data goes down over time:

April 6, 2023

The Data Streaming Landscape

Data streaming is a new software category to process data in motion. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary stream processing engines like Apache Flink and SaaS offerings have emerged. And competitive technologies like Pulsar and Redpanda try to get market share. This blog post explores the data streaming landscape of 2023 to summarize existing solutions and market trends.

Data Streaming Is a New Software Category

Data-driven applications are the new black. This approach increases the business value as the overall goal by increasing revenue, reducing cost, reducing risk, or improving the customer experience.

April 4, 2023

Processing Time Series Data With QuestDB and Apache Kafka

Apache Kafka is a battle-tested distributed stream-processing platform popular in the financial industry to handle mission-critical transactional workloads. Kafka’s ability to handle large volumes of real-time market data makes it a core infrastructure component for trading, risk management, and fraud detection. Financial institutions use Kafka to stream data from market data feeds, transaction data, and other external sources to drive decisions.

A common data pipeline to ingest and store financial data involves publishing real-time data to Kafka and utilizing Kafka Connect to stream that to databases. For example, the market data team may continuously update real-time quotes for a security to Kafka, and the trading team may consume that data to make buy/sell orders. Processed market data and orders may then be saved to a time series database for further analysis.

March 24, 2023

Top 5 Data Streaming Trends for 2023

Data streaming is one of the most relevant buzzwords in tech to build scalable real-time applications in the cloud and innovative business models. Do you wonder about my predicted TOP 5 data streaming trends in 2023 to set data in motion? Check out the following presentation and learn what role Apache Kafka plays. Learn about decentralized data mesh, cloud-native lakehouse, data sharing, improved user experience, and advanced data governance.

Some followers might notice that this became a series with past posts about the top 5 data streaming trends for 2021 and the top 5 for 2022. Data streaming with Apache Kafka is a journey and evolution to set data in motion. Trends change over time, but the core value of a scalable real-time infrastructure as the central data hub stays.

March 8, 2023

When to Choose Redpanda Instead of Apache Kafka?

Data streaming emerged as a new software category. It complements traditional middleware, data warehouse, and data lakes. Apache Kafka became the de facto standard. New players enter the market because of Kafka’s success. One of those is Redpanda, a lightweight Kafka-compatible C++ implementation. This article explores the differences between Apache Kafka and Redpanda, when to choose which framework, and how the Kafka ecosystem, licensing, and community adoption impact a proper evaluation.

Data Streaming: A New Software Category

Data-driven applications are the new black. As part of this, data streaming is a new software category. If you don’t understand yet how it differs from other data management platforms like a data warehouse or data lake, check out the following article series:

January 26, 2023

Apache Kafka vs. Memphis.dev

What Is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform. Based on the abstraction of a distributed commit log, Kafka can handle a great number of events with functionality comprising pub/sub.

What Is Memphis.dev?

Memphis is a next-generation message broker.