Apache Kafka Essentials

Dive into Apache Kafka: Readers will review its history and fundamental components — Pub/Sub, Kafka Connect, and Kafka Streams. Key concepts in these areas are supplemented with detailed code examples that demonstrate producing and consuming data, using connectors for easy data streaming and transformation, performing common operations in KStreams, and more.

Getting Started With Observability for Distributed Systems

To net the full benefits of a distributed system, applications' underlying architectures must achieve various company-level objectives including agility, velocity, and speed to market. Implementing a reliable observability strategy, plus the right tools for your specific business requirements, will give teams the insights needed to properly operate and manage their entire distributed ecosystem on an ongoing basis.

This Refcard covers the three pillars of observability — metrics, logs, and traces — and how they not only complement an organization's monitoring efforts but also work together to help profile, interpret, and optimize system-wide performance.

Distributed SQL Essentials

Distributed SQL databases combine the resilience and scalability of a NoSQL database with the full functionality of a relational database. In this Refcard, we explore the fundamentals of distributed SQL, including architecting for availability, handling schema design challenges, using JSON and columnar indexes, as well as assessing approaches to replication.

Getting Started With Distributed SQL

In recent years, NoSQL distributed databases have become common, as they are built from the ground up to be distributed. Yet they force difficult design choices, such as choosing availability over consistency, data integrity, and ease of query, to meet their applications’ need for scale. This Refcard serves as a reference to the key characteristics of distributed SQL databases, how functionality compares across database offerings, and the criteria for designing a proof of concept.

Navigating the Distributed Data Pipelines: An Overview and Guide for Your Performance Management Strategy

This article is featured in the new DZone Guide to Big Data: Volume, Variety, and Velocity. Get your free copy for insightful articles, industry stats, and more!

There are more than 10,000 enterprises across the globe that rely on a data stack that is made up of multiple distributed systems. While these enterprises, which span a wide range of verticals — finance, healthcare, technology, and more — build applications on a distributed big data stack, some are not fully aware of the performance management challenges that often arise. This piece will provide an overview of what a modern big data stack looks like, then address the requirements at both the individual application level of these stacks (as well as holistic clusters and workloads), and explore what type of architecture can provide automated solutions for these complex environments.

Design Exercise: Distributing (Consistent) Data at Scale

The Question

Before I start discussing this topic, I want to talk a bit about the speed of light. That pesky limit basically means that there is an inherent lag in passing information between any two points in space. In your daily life, you can mostly ignore it. The human brain is far too slow to perceive it, and even if you are working with computers, you can usually ignore the speed of light for anything less than about 500 miles.

But the speed of light is merely the hard upper limit of our ability to send information from one location to another. In practice, the lag time between any two computers connected to a network is much higher. In fact, if you are a gamer, you are very well acquainted with that fact.