Testing Schema Registry: Spring Boot and Apache Kafka With JSON Schema

Introduction

Apache Kafka is the most widely used reliable and scalable eventing platform. It helps in providing the base for a fast, scalable, fault-tolerant, event-driven microservice architecture. Any type of data can be published into a Kafka topic and can be read from. Kafka does not provide an out-of-the-box schema validation system and hence is prone to junk data being fed into a topic.

Confluent has devised a concept of a schema registry that can be used to implement type checking of messages in any Kafka installation. The Schema Registry needs the message schema to be registered against a topic and it enforces that only messages conforming to the schema are sent to the topic.

Streaming Data From Files Into Multi-Broker Kafka Clusters

There are multiple ways to ingest data streams into the Apache Kafka topic and subsequently deliver to various types of consumers who are hooked to the topic. The stream of data that collects continuously from the topic by consumers, passes through multiple data pipelines and then stream processing engines like Apache Spark, Apache Flink, Amazon Kinesis, etc and eventually landed upon the real-time applications to deliver a final data-driven decision. From finances, manufacturing, insurance, telecom, healthcare, commerce, and more, real-time applications are becoming the best solution for organizations to take immediate action, gain insights from the updated data. In the present day, Apache Kafka shapes the central nervous system that brings data from all aspects of the business to the large information operational hubs where choices are made.

The text files contain unformatted ASCII text and are commonly used for the storage of information. Each line of the file represents a data record and can be updated continuously to store. Every insert of a new line or lines on the text file can be considered as new data insertion on the file. Henceforth, every addition of a new line or lines on the text file continuously either by humans or applications (no modification on the already inserted line)and subsequently moves or sends to a different location can be considered as data streaming from the file. Every addition of a new line or row in the text file can be analyzed continuously by exporting the new line/lines to the Kafka topic and importing them by consumers that hooks up with the topic.

A Gentle (and Practical) Introduction to Apache Avro (Part 1)

This post is a gentle introduction to Apache Avro. After several discussions with Dario Cazas about what’s possible with Apache Avro, he did some research and summarized it in an email. I found myself looking for that email several times to forward it to different teams to clarify doubts about Avro. After a while, I thought it could be useful for others, and this is how this series of three posts was born.

In summary, Apache Avro is a binary format with the following characteristics:

Coupling Schema Registry (Confluent) With Multi-Broker Apache Kafka Cluster

This article aims to explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment). The Confluent is an integrated platform bundle with Apache Kafka and multiple different components starting from ksqlDB for stream processing, numerous connectors (Database, File, AWS, Azure, Google, etc), Schema Registry, Control Center, etc. Please click here to know more about the Confluent Platform.

In short, Schema Registry preserves a versioned history of all schemas, provides multiple compatibility settings, allows the evolution of schemas, etc. It supports Avro, JSON Schema, and Protobuf schemas. Can read here about the importance of Schema Registry on Kafka Based Data Pipelines

Replacing Confluent schema registry with Red Hat Integration

Along with the enhancements for Apache Kafka-based environments, Red Hat announced the Red Hat Integration service registry to help teams to govern their services schemas. Developers can now use the registry to query for the schemas and artifacts required by each service endpoint or register and store new structures for future use.

Registry for event-driven architecture

Red Hat Integration's service registry, based on the Apicurio project registry, provides a way to decouple the schema used to serialize and deserialize Kafka messages with the applications that are sending/receiving them. The service registry is a store for schema (and API design) artifacts providing a REST API and a set of optional rules for enforcing content validity and evolution. The registry handles data formats like Apache Avro, JSON Schema, Google Protocol Buffers (protobuf), as well as OpenAPI and AsyncAPI definitions.

Contract-First Development — the Event-Driven Way!

Introduction

Contract first application development is not limited to synchronized RESTFul API calls. With the adoption of event-driven architecture, more developers are demanding a way to set up contracts between these asynchronous event publishers and consumers. Sharing what data format that each subscriber is consuming, what data format is used from the event publisher, in a way OpenAPI standards is going to be very helpful.

But in the asynchronous world, it is ever more complex, not only do you need to be caring about the data schema, there are different protocols, serializing, deserializing mechanisms, and various support libraries. In fact, there are talks on AsyncAPI. But I am going to show you that what can be done today,  is to use ONE of the most commonly used solutions in building EDA, “Kafka”. And show how to do Contract First Application Development using Camel + Apicurio Registry.