Distributed Systems: Consistent Hashing

Welcome to the distributed systems series. In this article, we are going to learn about consistent hashing and its usage in distributed systems. Why consistent hashing is important and how it plays a role in designing distributed systems such as databases, cache, etc. Let’s first understand what is hashing and how it is used to distribute data across machines. Then, we will understand what is consistent hashing.

Hashing

Hashing is a technique that generates a unique ID for an object. A simple example would be the hashcode function in Java, which returns a unique ID for an immutable object. This returned ID is used to choose the bucket from an array of buckets for storage and retrieval. In order for this hashing function to return the correct value, the object or key that we use to hash should be immutable. This is how hashing works in Java to store and retrieve the value in the HashMap data structure. If you know how hashmap works, the concept is pretty similar in distributed systems. In distributed systems, we have an array of machines to store the data, and we have to decide which machines should hold the specific data. The following diagram explains how hashing is used to store {key, value} data on different machines.