Apache Kafka | The Blog Pros

January 17, 2023

A Cloud-Native SCADA System for Industrial IoT Built With Apache Kafka

Industrial IoT and Industry 4.0 enable digitalization and innovation. SCADA control systems are a vital component of IT/OT modernization. The SCADA evolution started with monolithic applications and moved to networked and web-based platforms. This blog post explores building the 5th generation: A cloud-native SCADA infrastructure with Apache Kafka. A real-world case study explores the journey of a German system operator for electricity to show how such a journey to open and scalable real-time workloads and edge-to-cloud integration progressed.

What Is a SCADA System?

Supervisory control and data acquisition (SCADA) is a control system architecture comprising computers, networked data communications, and graphical user interfaces for high-level supervision of machines and processes. It also covers sensors and other devices, such as programmable logic controllers, which interface with process plants or machinery.

December 5, 2022

When NOT To Choose Amazon MSK Serverless for Apache Kafka?

Apache Kafka became the de facto standard for data streaming. Various cloud offerings emerged and improved in the last years. Amazon MSK Serverless is the latest Kafka product from AWS. This blog post looks at its capabilities to explore how it relates to "the normal" partially managed Amazon MSK, when the serverless version is a good choice, and when other fully-managed cloud services like Confluent Cloud are the better option.

Disclaimer: I work for Confluent. While AWS is a strong strategic partner, it also offers the competitive offering Amazon MSK. This post is not about comparing every feature but explaining the concepts behind the alternatives. Read articles and docs from the different vendors to make your own evaluation and decision. View this post as a list of criteria to not forget important aspects in your cloud service selection.

November 21, 2022

Migration from Amazon SQS and Kinesis to Apache Kafka and Flink

Even digital natives — that started their business in the cloud without legacy applications in their own data centers — need to modernize their cloud-native enterprise architecture to improve business processes, reduce costs, and provide real-time information to their downstream applications. This blog post explores the benefits of an open and flexible data streaming platform compared to a proprietary message queue and data ingestion cloud services. A concrete example shows how DoorDash replaced cloud-native AWS SQS and Kinesis with Apache Kafka and Flink.

Message Queue and ETL vs. Data Streaming With Apache Kafka

A message queue like IBM MQ, RabbitMQ, or Amazon SQS enables sending and receiving of messages. This works great for point-to-point communication. However, additional tools like Apache NiFi, Amazon Kinesis Data Firehose, or other ETL tools are required for data integration and data processing.

November 13, 2022

Request-Response With REST/HTTP vs. Apache Kafka

Request-response communication with REST / HTTP is simple, well-understood, and supported by most technologies, products, and SaaS cloud services. Contrarily, data streaming with Apache Kafka is a fundamental change to process data continuously. HTTP and Kafka complement each other in various ways. This post explores the architectures and uses cases to leverage request-response together with data streaming in the control plane for management or in the data plane for producing and consuming events.

Request-Response (HTTP) Versus Data Streaming (Apache Kafka)

Prior to discussing the relationship between HTTP/REST and Apache Kafka, let's explore the concepts behind both. Traditionally, request-response and data streaming are two different paradigms.

September 27, 2022

Build Your Own Social Media Analytics with Apache Kafka

Apache Kafka is more than just a messaging broker. It has a rich ecosystem of different components. There are connectors for importing and exporting data, different stream processing libraries, schema registries, and a lot more. In this talk, Senior Principal Software Engineer at Red Hat Jakub Scholz shows how to use Kafka to read data from social networks such as Twitter, process them, and use machine learning to analyze them — with everything running on top of Kubernetes.

Jakub Scholz:

September 24, 2022

When To Use Request-response With Apache Kafka?

How can I do request-response communication with Apache Kafka? That's one of the most common questions I get regularly. This blog post explores when (not) to use this message exchange pattern, the differences between synchronous and asynchronous communication, the pros and cons compared to CQRS and event sourcing, and how to implement request-response within the data streaming infrastructure.

Message Queue Patterns in Data Streaming With Apache Kafka

Before I go into this post, I want to make you aware that this content is part of a blog series about “JMS, Message Queues, and Apache Kafka”:

September 3, 2022

Error Handling via Dead Letter Queue in Apache Kafka

Recognizing and handling errors is essential for any reliable data streaming pipeline. This blog post explores best practices for implementing error handling using a Dead Letter Queue in Apache Kafka infrastructure. The options include a custom implementation, Kafka Streams, Kafka Connect, the Spring framework, and the Parallel Consumer. Real-world case studies show how Uber, CrowdStrike, and Santander Bank build reliable real-time error handling at an extreme scale.

Apache Kafka became the favorite integration middleware for many enterprise architectures. Even for a cloud-first strategy, enterprises leverage data streaming with Kafka as a cloud-native integration platform as a service (iPaaS).

August 9, 2022

Apache Kafka in Crypto and Finserv for Cybersecurity and Fraud Detection

The insane growth of the crypto and fintech market brings many unknown risks and successful cyberattacks to steal money and crypto coins. This post explores how data streaming with the Apache Kafka ecosystem enables real-time situational awareness and threat intelligence to detect and prevent hacks, money loss, and data breaches. Enterprises stay compliant with the law and keep customers happy in any innovative Fintech or Crypto application.

The Insane Growth of Crypto and Fintech Markets

The crypto and fintech markets are growing like crazy. Not every new crypto coin or blockchain is successful. Only a few fintech like Robinhood in the US or Trade Republic in Europe are successful. In the last months, the crypto market has been a bear market (writing this in April 2022).

July 30, 2022

Open API and Omnichannel with Apache Kafka in Healthcare

IT modernization and innovative new technologies change the healthcare industry significantly. This blog series explores how data streaming with Apache Kafka enables real-time data processing and business process automation. Real-world examples show how traditional enterprises and startups increase efficiency, reduce cost, and improve the human experience across the healthcare value chain, including pharma, insurance, providers, retail, and manufacturing. This is part five: Open API and Omnichannel. Examples include Care.com and Invitae.

Blog Series - Kafka in Healthcare

Many healthcare companies leverage Kafka today. Use cases exist in every domain across the healthcare value chain. Most companies deploy data streaming in different business domains. Use cases often overlap. I tried to categorize a few real-world deployments into different technical scenarios and added a few real-world examples:

July 24, 2022

Comparison: JMS Message Queue vs. Apache Kafka

Comparing JMS-based message queue (MQ) infrastructures and Apache Kafka-based data streaming is a widespread topic. Unfortunately, the battle is an apple-to-orange comparison that often includes misinformation and FUD from vendors. This article explores the differences, trade-offs, and architectures of JMS message brokers and Kafka deployments. Learn how to choose between JMS brokers like IBM MQ or RabbitMQ and open-source Kafka or serverless cloud services like Confluent Cloud.

Motivation: The Battle of Apples vs. Oranges

I have to discuss the differences and trade-offs between JMS message brokers and Apache Kafka every week in customer meetings. What annoys me most is the common misunderstandings and (sometimes) intentional FUD in various blogs, articles, and presentations about this discussion.

July 16, 2022

Machine Learning and Data Science With Kafka in Healthcare

IT modernization and innovative new technologies change the healthcare industry significantly. This blog series explores how data streaming with Apache Kafka enables real-time data processing and business process automation. Real-world examples show how traditional enterprises and startups increase efficiency, reduce cost, and improve the human experience across the healthcare value chain, including pharma, insurance, providers, retail, and manufacturing. This is part five: Machine Learning and Data Science. Examples include Recursion and Humana.

Blog Series - Kafka in Healthcare

July 7, 2022

7 Reasons to Choose Apache Pulsar over Apache Kafka

So why did we build our messaging service using Apache Pulsar?

At DataStax, our mission is to empower developers to build cloud-native distributed applications by making cloud-agnostic, high-performance messaging technology easily available to everyone. Developers want to write distributed applications or microservices but don’t want the hassle of managing complex message infrastructure or getting locked into a particular cloud vendor. They need a solution that just works. Everywhere.

June 22, 2022

Apache Kafka Patterns and Anti-Patterns

Apache Kafka offers the operational simplicity of data engineers' dreams. A message broker that allows clients to publish and read streams of data — Kafka has an ecosystem of open-source components that, when combined together, help store, process, and integrate data streams with other parts of your system in a secure, reliable, and scalable manner. This Refcard dives into select patterns and anti-patterns spanning across Kafka Client APIs, Kafka Connect, and Kafka Streams, covering topics such as reliable messaging, scalability, error handling, and more.

June 22, 2022

Legacy Modernization and Hybrid Cloud with Kafka in Healthcare

IT modernization and innovative new technologies change the healthcare industry significantly. This blog series explores how data streaming with Apache Kafka enables real-time data processing and business process automation. Real-world examples show how traditional enterprises and startups increase efficiency, reduce cost, and improve the human experience across the healthcare value chain, including pharma, insurance, providers, retail, and manufacturing. This is part two: Legacy modernization and hybrid multi-cloud. Examples include Optum / UnitedHealth Group, Centene, and Bayer.

Blog Series - Kafka in Healthcare

June 18, 2022

Streaming ETL with Apache Kafka in the Healthcare Industry

IT modernization and innovative new technologies change the healthcare industry significantly. This blog series explores how data streaming with Apache Kafka enables real-time data processing and business process automation. Real-world examples show how traditional enterprises and startups increase efficiency, reduce cost, and improve the human experience across the healthcare value chain, including pharma, insurance, providers, retail, and manufacturing. This is part three: Streaming ETL. Examples include Babylon Health and Bayer.

Blog Series - Kafka in Healthcare

June 3, 2022

Apache Kafka in the Healthcare Industry

Healthcare - A Broad Spectrum of Very Different Domains

Health care is the maintenance or improvement of health via the prevention, diagnosis, treatment, amelioration, or cure of disease, illness, injury, and other physical and mental impairments.

May 26, 2022

The Definitive Guide to Building a Data Mesh With Event Streams

Data mesh. This oft-talked-about architecture has no shortage of blog posts, conference talks, podcasts, and discussions. One thing that you may have found lacking is a concrete guide on precisely how to get started building your own data mesh implementation. We have you covered. In this blog post, we’ll show you how to build a data mesh using event streams, highlighting our design decisions, and the key benefits and challenges you’ll need to consider along the way. In fact, we’ll go one better: we’ve built a data mesh prototype for you to check out on your own to see what this would look like in action, or fork to bootstrap a data mesh for your own organization.

Data mesh is technology agnostic so there are a few different ways you can go about building one. The canonical approach is to build the mesh using event streaming technology that provides a secure, governed, real-time mechanism for moving data between different points in the mesh.

May 19, 2022

Apache Kafka Essentials

Dive into Apache Kafka: Readers will review its history and fundamental components — Pub/Sub, Kafka Connect, and Kafka Streams. Key concepts in these areas are supplemented with detailed code examples that demonstrate producing and consuming data, using connectors for easy data streaming and transformation, performing common operations in KStreams, and more.