Benchmarking AWS Graviton2 and gp3 Support for Apache Kafka

With the release of AWS’s Graviton2 (ARM) instances and gp3 disks, I immediately wanted to explore the potential opportunity for anyone using Apache Kafka. My team and I embarked on a journey to understand the changes required for Kafka users to be able to provision AWS Graviton2 instances paired with gp3 disks.  

Previously we’d only used Java 11 (OpenJDK) to run the Kafka service on x86 instances. As part of this change, we also shifted our internal environment to use Amazon Corretto. Amazon Corretto is used internally by AWS; it has built-in performance enhancements, security fixes, and is compatible with Java SE standards. Furthermore, Amazon Corretto reportedly has a performance benefit over OpenJDK distributions when operating in ARM architecture especially for network-intensive applications, of which Kafka is one. 

Navigating the Distributed Data Pipelines: An Overview and Guide for Your Performance Management Strategy

This article is featured in the new DZone Guide to Big Data: Volume, Variety, and Velocity. Get your free copy for insightful articles, industry stats, and more!

There are more than 10,000 enterprises across the globe that rely on a data stack that is made up of multiple distributed systems. While these enterprises, which span a wide range of verticals — finance, healthcare, technology, and more — build applications on a distributed big data stack, some are not fully aware of the performance management challenges that often arise. This piece will provide an overview of what a modern big data stack looks like, then address the requirements at both the individual application level of these stacks (as well as holistic clusters and workloads), and explore what type of architecture can provide automated solutions for these complex environments.