How to Reduce Cloud Cost by 99% for EDA Kafka Applications

While the cloud offers great convenience and flexibility, the operational cost for applications deployed in the cloud can sometimes be significant. This article shows a way to substantially reduce operating costs in latency-sensitive Event-Driven Architecture (EDA) Java applications by migrating from Kafka to Chronicle Queue open-source, a more resource-efficient and lower-latency queue implementation.

What Is EDA?

An EDA application is a distributed application where events (in the form of messages or DTOs) are produced, detected, consumed, and reacted to. Distributed means it might run on different machines or the same machine but in separate processes or threads. The latter concept is used in this article whereby messages are persisted in queues.

Deployment of Low-Latency Solutions in the Cloud

Traditionally, companies with low-latency requirements deployed to bare-metal servers, eschewing the convenience and programmability of virtualization and containerization in an effort to squeeze maximum performance and minimal latency from “on-premises” (often co-located) hardware.

More recently, these companies are increasingly moving to public and private “cloud” environments, either for satellite services around their tuned low-latency/high-volume (LL/HV) systems or in some cases for LL/HV workloads themselves.  

How Does Kafka Perform When You Need Low Latency?

Most Kafka benchmarks appear to test high throughput but not low latency. Apache Kafka was traditionally used for high throughput rather than latency-sensitive messaging, but it does have a low-latency configuration. (Mostly setting linger.ms=0 and reducing buffer sizes). In this configuration, you can get below 1-millisecond latency a good percentage of the time for modest throughputs.

Benchmarks tend to focus on clustering Kafka in a high-throughput configuration. While this is perhaps the most common use case, how does it perform if you need lower latencies?

Kafka vs Chronicle for Microservices

Apache Kafka is a common choice for inter-service communication. Kafka facilitates the parallel processing of messages and is a good choice for log aggregation. Kafka also declares to be low latency, high throughput. However, is Kafka fast enough for many microservices applications?

When I wrote Open Source Chronicle Queue my aim was to develop a messaging framework with microsecond latencies. In this article, I will describe why Kafka does not scale in terms of throughput as easily as Chronicle Queue for microservices applications. 

Visualizing Delay as a Distance

In order to illustrate the difference, let me start with an analogy. Light travels through optic fiber and copper at about two-thirds the speed of light in a vacuum, so to appreciate very short delays, they can be visualized as the distance a signal can travel in the time. This can really matter when you have machines in different data centers.

How to Develop Event-Driven Architectures

Last month, I wrote an article on open-source Chronicle Wire that discusses how we could serialize an application’s state into different message formats.

Now in this article, I'm going to look at how we can use open-source Chronicle Queue and Chronicle Wire to structure applications to use event-driven architecture (EDA). EDA is a design pattern in which decoupled components (often microservices) can asynchronously publish and subscribe to events.