kafka apache | The Blog Pros

April 8, 2022

Kafka Event Exchange Between Local and Azure

While it may not be a daunting task to set up Kafka on a local machine or within a particular network and produce/consume messages, people do face challenges when they try to make it work across the network.

Let’s consider a hybrid scenario where your software solution is distributed across two different platforms (say AWS and Azure or on-premise and Azure), and there is a need to route messages from Kafka cluster hosted on one platform to the one hosted on another. This could be a valid business scenario wherein you are trying to consolidate your solution with one cloud platform, and in the interim, you need to have this routing in place till you complete your migration. Even in the long term, there may be a need to maintain solution across multiple platforms for various business and technical reasons.

January 29, 2022

Building an Enterprise CDC Solution

Introduction

This article is a follow-up to the Data Platform: Building an Enterprise CDC Solution, where Miguel García and I described:

Several Change Data Capture (CDC) use cases and common scenarios in an enterprise platform
A proposal using Debezium (as log-based CDC) to capture data from the relational databases, and Kafka as a channel that enables several consumers to propagate data changes for different use cases.

One of the common scenarios for this solution consists of data replication from OLTP Database to OLAP Database (from the operational database to the data warehouse).

June 1, 2021

Fast JMS for Apache Pulsar: Modernize and Reduce Costs with Blazing Performance

Written by: Chris Bartholomew

DataStax recently announced the availability of Fast JMS for Apache Pulsar, a JMS 2.0 API. By combining the industry-standard Java Messaging Service (JMS) API with the cloud-native and horizontally scalable Apache Pulsar™ streaming platform, DataStax is providing a powerful way to modernize your JMS infrastructure, improve performance, and reduce costs. Fast JMS is open source and is included in DataStax’s Luna Streaming Enterprise support of Apache Pulsar.

May 16, 2021

3 Simple Ideas to Make Your Life Easier With Kafka

Apache Kafka was open-sourced by LinkedIn in early 2011. Despite all the initial limitations, it was a huge success and it became the de-facto standard for streaming data. The performance, possibility to replay events and multiple consumers independently were some of the features which disrupted the streaming arena.

But Kafka has been also known for its difficult learning curve and difficulties with the operation. In my experience, both things are improved a lot in the last few years but the original gotchas remain:

April 8, 2021

Online Machine Learning: Into the Deep

Introduction

Online Learning is a branch of Machine Learning that has obtained a significant interest in recent years thanks to its peculiarities that perfectly fit numerous kinds of tasks in today’s world. Let’s dive deeper into this topic.

What Exactly Is Online Learning?

In traditional machine learning, often called batch learning, the training data is first gathered in its entirety and then a chosen machine learning model is trained on said data: the resulting model is then deployed to make predictions on new unseen data.

February 12, 2021

Intro To Apache Kafka: How Kafka Works

Introduction

We recently published a series of tutorial videos and tweets on the Apache Kafka^® platform. So now you know there’s a thing called Kafka, but before you put your hands to the keyboard and start writing code, you need to form a mental model of what the thing is. These videos give you the basics you need to know to have the broad grasp on Kafka necessary to continue learning and eventually start coding. This article summarizes those videos.

Events

Pretty much all of the programs you’ve ever written respond to events of some kind: the mouse moving, input becoming available, web forms being submitted, bits of JSON being posted to your endpoint, the sensor on the pear tree detecting that a partridge has landed on it, etc. Kafka encourages you to see the world as sequences of events, which it models as key-value pairs. The key and the value have some kind of structure, usually represented in your language’s type system, but fundamentally they can be anything. Events are immutable, as it is (sometimes tragically) impossible to change the past.

December 15, 2020

Coupling Schema Registry (Confluent) With Multi-Broker Apache Kafka Cluster

This article aims to explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment). The Confluent is an integrated platform bundle with Apache Kafka and multiple different components starting from ksqlDB for stream processing, numerous connectors (Database, File, AWS, Azure, Google, etc), Schema Registry, Control Center, etc. Please click here to know more about the Confluent Platform.

In short, Schema Registry preserves a versioned history of all schemas, provides multiple compatibility settings, allows the evolution of schemas, etc. It supports Avro, JSON Schema, and Protobuf schemas. Can read here about the importance of Schema Registry on Kafka Based Data Pipelines

November 4, 2020

Kafka Monitoring via Prometheus-Grafana

Hi guys,

Today I will explain how to configure Apache Kafka Metrics in Prometheus - Grafana and give information about some of the metrics.

June 8, 2020

Change Data Captures CDC from MySQL Database to Kafka with Kafka Connect and Debezium

Introduction

Debezium is an open-source project developed by Red Hat which aims to simplify this process by allowing you to extract changes from various database systems (e.g. MySQL, PostgreSQL, MongoDB) and push them to Kafka

Debezium Connectors

Debezium has a library of connectors that capture changes from a variety of databases and produce events with very similar structures, making it easier for the applications to consume and respond to the events regardless of where the changes originated. Debezium currently have the following connectors

December 11, 2019December 18, 2019

Is Kafka the Next Big Thing in the Banking and Financial Sector?

Nowadays, businesses are seeking innovative methods to digitally transform themselves by utilizing key technologies to promote business intelligence and increase profitability. The terms "insights" and "data" carry much significance and are a crucial aspect to enhancing the customer experience. Technologies, such as Kafka, benefit organizations across industries in manifold methods.

With its real-time streaming data architecture and real-time analytics feature, Kafka is certainly the talk of the town. According to recent statistics, over one-third of Fortune 500 companies have leveraged Kafka.

November 28, 2019

Kafka Administration and Monitoring UI Tools

Kafka itself comes with command line tools that can perform all necessary administrative tasks. But, those tools aren’t very convenient because they are not integrated into one tool, and you need to run a different tool for different tasks. Moreover, it is getting difficult to work with them when your clusters grow large, or when you have several clusters.

So, today, we will cover some GUI alternatives.

August 20, 2019

Apache Kafka: Basic Setup and Usage With Command-Line Interface

In this article, we are going to learn basic commands in Kafka. With these commands, we will be able to gain basic knowledge of how to run Kafka Broker and produce and consume messages, topic details, and offset details.

Just note that this is a standalone setup in order to get an overview of basic setup and functionality using the command-line interface.

August 6, 2019

Real-Time Stream Processing With Apache Kafka Part 2: Kafka Stream API

This is the second part of the four parts series of articles. In the previous article, we introduced you to Apache Kafka. In this article, we will briefly discuss Kafka APIs with special attention given to Kafka's Streams API.

Kafka Terminologies

Before we have a deep dive in Kafka streams, here's a quick refresher on important concepts in Kafka.

August 2, 2019

Real-Time Stream Processing With Apache Kafka Part 4: Use Case

In previous articles, we have gained the ground on understanding basic terminologies used in Kafka and Kafka-Streams. In this article, we set up a single node kafka cluster on our Windows machine. Now, based on the knowledge we have gained so far, let us try to build a use case.

Scenario

Consider a hypothetical fleet management company that needs a dashboard to get the insight of its day to day activities related to vehicles. Each vehicle in this fleet management company is fitted with a GPS based geolocation emitter, which emits location data containing the following information

July 26, 2019August 6, 2019

Real-Time Stream Processing With Apache Kafka Part One

Today, with the rise of IoT and Smart Devices, we are generating data at an unprecedented speed. With distributed computing, data is generated somewhere and processed somewhere else. Sensors or UI on devices capture some data (manual or automated) as an event and send it to some other unit for processing. This happens continuously.

These events may be processed at a fixed rate or in bursts, resulting in a stream of events. This process is known as an Event Stream. In most scenarios, these events are generated at a very high speed (seconds or even milliseconds). So, we need to process these event streams at the same or higher processing rate.

May 13, 2019May 16, 2019

Kafka Producer Delivery Semantics

This article is a continuation of Part 1, Kafka Technical Overview and Part 2, Kafka Producer Overview articles. Let's look into different delivery semantics and how to achieve them using producer and broker properties.

Delivery Semantics

Based on broker and producer configuration, all three delivery semantics— “at most once”, “at least once” and “exactly once” — are supported.