Sampling Strategies in Distributed Tracing: A Comprehensive Guide

If you are running a distributed system where each request talks to more than a couple of services, databases, and a queuing system, pinpointing the cause of an issue is not a trivial affair. The complexity increases as the number of services increases, as east-west traffic goes up, as teams get split up, and as data tends to eventual consistency. There are a plethora of tools aiming to solve this problem to various degrees. Perhaps the most critical tooling in this workflow is distributed tracing. We have even argued that a well-implemented distributed tracing solution might subsume logging into its fold.

Yet, in reality, tracing end-to-end request flows has been an afterthought in most companies, as postulated in this article. One of the biggest challenges plaguing widespread adoption is the sheer volume of trace data. Capturing, storing, indexing, and querying from this massive dataset will not only impact performance and add significant noise but also break the bank :)

Can Distributed Tracing Replace Logging?

Monitoring, Logging, and Tracing are often highlighted as the three fundamental pillars of a contemporary Observability framework. Conventional wisdom suggests that all three pieces of technology are equally critical and have their own place in the Observability stack. As more of the world shifts to the cloud, containerized, and distributed systems, will one of these pillars end up becoming more critical than the other?

We predict that this will most likely be the case. In this article, we compare the roles of two of these Observability pillars, Distributed Tracing vs. Logging, and see which best suits the needs of an increasingly cloud-native world.