apache kafka tutorial | The Blog Pros

December 10, 2021

Microservices Communication With Apache Kafka

When developing a new product, the first thing that comes to mind is how to structure code. There is a battle going on in this regard: monolithic vs microservices. It has been going on for a while because software developers and company owners are still trying to figure out which architectural style is ideal for their applications. The scalability, efficiency, and competitiveness of the product are determined by the strategy used – monolithic vs microservices. While monolithic systems have been used for a while, microservices are a comparatively modern form of software system structure. Indeed, a slew of technologies emerged under the DevOps mindset, allowing us to design scalable, distributed systems based on microservices.

One of the main advantages of a microservice architecture is that it makes it simpler to select the technological stack (programming languages, databases, etc.) that is most suited for the desired functionality (service) rather than being forced to take a more conventional, one-size-fits-all approach.

August 3, 2020

How to Use Protobuf With Apache Kafka and Schema Registry

Since Confluent Platform version 5.5, Avro is no longer the only schema in town. Protobuf and JSON schemas are now supported as first-class citizens in Confluent universe. But before I go on explaining how to use Protobuf with Kafka, let’s answer one often-asked question:

Why Do We Need Schemas?

When applications communicate through a pub-sub system, they exchange messages and those messages need to be understood and agreed upon by all the participants in the communication. Additionally, you would like to detect and prevent changes to the message format that would make messages unreadable for some of the participants.

August 20, 2019

Apache Kafka In Action

Challenges and Limitations in Messaging

Messaging is a fairly simple paradigm for the transfer of data between applications and data stores. However, there are multiple challenges that are associated with it:

Limited scalability due to a broker becoming a bottleneck.
Strained message brokers due to bigger message size.
Consumers being in a position to consume the messages at a reasonable rate.
Consumers exhibiting non-fault tolerance by making sure that messages consumed are not gone forever.

Messaging Limitations Due to:

High Volume

Messaging applications are hosted on a single host or a node. As a result, there is a possibility of the broker becoming the bottleneck due to a single host or local storage.

March 8, 2019

Apache Kafka With Scala Tutorial

Before the introduction of Apache Kafka, data pipelines used to be very complex and time-consuming. A separate streaming pipeline was needed for every consumer. You can see the complexity of it with the help of the below diagram.

Apache Kafka solved this problem and provided a universal pipeline that is fault-tolerant, scalable, and simple to use. There is now a single pipeline needed to cater to multiple consumers, which can be also seen with the help of the below diagram.

February 13, 2019

Kafka Logging With the ELK Stack

Kafka and the ELK Stack — usually these two are part of the same architectural solution, Kafka acting as a buffer in front of Logstash to ensure resiliency. This article explores a different combination — using the ELK Stack to collect and analyze Kafka logs.

Apache Kafka Load Testing Using JMeter

In simple words, Apache Kafka is a hybrid of a distributed database and a message queue. In order to process terabytes of information, many large companies use it. Also, for its features, Kafka is widely popular. For example, a company like LinkedIn uses it to stream data about user activity, while the company like Netflix uses it for data collection and buffering for downstream systems like Elasticsearch, Amazon EMR, Mantis, and many more.

Moreover, let’s throw light on some features of Kafka that are important for Kafka load testing:

January 21, 2019

Building a Real-Time Bike-Share Data Pipeline with StreamSets, Kafka and MapD

In this post, we will use the Ford GoBike Real-Time System, StreamSets Data Collector, Apache Kafka, and MapD to create a real-time data pipeline of bike availability in the Ford GoBike bike share ecosystem. We’ll walk through the architecture and configuration that enables this data pipeline and share a simple auto-updating dashboard within MapD Immerse.

High-Level Architecture

The high-level architecture consists of two data pipelines; one pipeline polls the GoBike system and publishes that data to Kafka. The other pipeline consumes from Kafka using Data Collector, transforms the data, then writes the data to MapD: