Distributed Systems: Consistent Hashing

Welcome to the distributed systems series. In this article, we are going to learn about consistent hashing and its usage in distributed systems. Why consistent hashing is important and how it plays a role in designing distributed systems such as databases, cache, etc. Let’s first understand what is hashing and how it is used to distribute data across machines. Then, we will understand what is consistent hashing.

Hashing

Hashing is a technique that generates a unique ID for an object. A simple example would be the hashcode function in Java, which returns a unique ID for an immutable object. This returned ID is used to choose the bucket from an array of buckets for storage and retrieval. In order for this hashing function to return the correct value, the object or key that we use to hash should be immutable. This is how hashing works in Java to store and retrieve the value in the HashMap data structure. If you know how hashmap works, the concept is pretty similar in distributed systems. In distributed systems, we have an array of machines to store the data, and we have to decide which machines should hold the specific data. The following diagram explains how hashing is used to store {key, value} data on different machines.

Distributed Systems: CAP Theorem

Welcome to the Distributed Systems series. In this article, we will learn and understand what the CAP theorem is. CAP stands for consistency, availability, and partition tolerance. When we talk about the CAP theorem, we mostly talk about distributed systems. First, let’s understand what a distributed system Is. A distributed system is a system that is made up of multiple processes that run on a single machine or multiple machines. In this lecture, we will learn about the CAP theorem from a distributed system perspective using a simple database analogy.

What Is the CAP Theorem?

CAP theorem states that in a Distributed System, while network partition occurs, we can only choose either consistency or availability. This was coined by Eric Brewer to understand distributed systems. CAP stands for consistency, availability, and partition tolerance.