Measuring Service Performance: The Whys and Hows

I enjoy improving application performance. After all, the primary purpose of computers is to execute tasks efficiently and swiftly. When you consider the fundamentals of computing, it seems almost magical — at its core, it involves simple arithmetic and logical operations, such as addition and comparison of binary numbers. Yet, by rapidly performing countless such operations, computers enable us to enjoy video games, watch endless videos, explore the vast expanse of human knowledge and culture, and even unlock the secrets of the universe and life itself. That is why optimizing applications is so important — it allows us to make better use of our most precious resource: time. To paraphrase a famous quote:

In the end, it’s not the number of years you live that matter, but the number of innovations that transpire within those years.

The Art of Being Ready: Reliability in Extreme Conditions

When it comes to online services, uptime is crucial, but it’s not the only thing to consider. Imagine running an online store — having your site available 99.9% of the time might sound good, but what if that 0.1% of downtime happens during the holiday shopping season? That could mean losing out on big sales. And what if most of your customers are only interested in a few popular items? If those pages aren’t available, it doesn’t matter that the rest of your site is working fine.

Sometimes, being available during peak moments can make or break your business. It’s not just e-commerce — a small fraction of airports handle most of the air traffic, just a tiny minority of celebrities are household names, and only a handful of blockbuster movies dominate the box office each year. It’s the same distribution pattern everywhere.

Isolating Noisy Neighbors in Distributed Systems: The Power of Shuffle-Sharding

Effective resource management is essential to ensure that no single client or task monopolizes resources and causes performance issues for others. Shuffle-sharding is a valuable technique to achieve this. By dividing resources into equal segments and periodically shuffling them, shuffle-sharding can distribute resources evenly and prevent any client or task from relying on a specific segment for too long. This technique is especially useful in scenarios with a risk of bad actors or misbehaving clients or tasks. In this article, we'll explore shuffle-sharding in-depth, discussing how it balances resources and improves overall system performance.


Before implementing shuffle-sharding, it's important to understand its key dimensions, parameters, trade-offs, and potential outcomes. Building a model and simulating different scenarios can help you develop a deeper understanding of how shuffle-sharding works and how it may impact your system's performance and availability. That's why we'll explore shuffle-sharding in more detail, using a Colab notebook as our playground. We'll discuss its benefits, limitations, and the factors to consider before implementing it. By the end of this post, you'll have a better idea of what shuffle-sharding can and can't do and whether it's a suitable technique for your specific use case.

FIFO vs. LIFO: Which Queueing Strategy Is Better for Availability and Latency?

As an engineer, you probably know that server performance under heavy load is crucial for maintaining the availability and responsiveness of your services. But what happens when traffic bursts overwhelm your system? Queueing requests is a common solution, but what's the best approach: FIFO or LIFO? In this post, we'll explore both strategies through a simple simulation in Colab, allowing you to see the impact of changing parameters on system performance. Comparing the pros and cons of each approach helps to build an understanding of the trade-offs and make better calls about queueing strategies, improving your engineering skills in the process. After all, as the saying goes: "I hear and I forget, I see and I remember, I do and I understand."


To compare the performance of FIFO (which processes requests in the order they are received) and LIFO (which prioritizes the most recent requests) queueing strategies, we'll build a simple model using a Client that generates requests and a Server that handles them. You can find the details in this Colab. The essential characteristics of the model are:

Navigating the Benefits and Risks of Request Hedging for Network Services

Tail latency is a persistent challenge for network services, with unpredictable spikes in response times due to factors such as CPU wait times and network congestion. While cost-effectiveness is often achieved through the use of shared resources, this can lead to a compromise in user experience. 

In this blog post, we examine the technique of request hedging as a solution to this problem. By understanding its benefits and limitations, we aim to provide insights into when and how this technique can be effectively utilized to build more predictable services.

SRE vs AWS DevOps: A Personal Experience Comparison

With hands-on experience in AWS DevOps and Google SRE, I’d like to offer my insights on the comparison of these two systems. Both have proven to be effective in delivering scalable and reliable services for cloud providers. However, improper management can result in non-functional teams and organizations. In this article, I’ll give a brief overview of AWS DevOps and Google SRE, examine when they work best, delve into potential pitfalls to avoid, and provide tips for maximizing the benefits of each.


DevOps is a widely used term with multiple interpretations. In this article, I’ll focus on AWS DevOps, which, according to the AWS blog, merges development and operations teams into a single unit. Under this model, engineers work across the entire application lifecycle, from development to deployment to operations. They possess a wide range of skills rather than being limited to a specific function.

Writing a Modern HTTP(S) Tunnel in Rust

Learn how to write performant and safe apps quickly in Rust. This post guides you through designing and implementing an HTTP Tunnel, and covers the basics of creating robust, scalable, and observable applications.

Rust: Performance, Reliability, Productivity

About a year ago, I started to learn Rust. The first two weeks were quite painful. Nothing compiled; I didn’t know how to do basic operations; I couldn’t make a simple program run. But step by step, I started to understand what the compiler wanted. Even more, I realized that it forces the right thinking and correct behavior.