Implementing Locks in a Distributed Environment

As we know, locks are generally used to monitor and control access to shared resources by multiple threads simultaneously. They basically protect data integrity and atomicity in concurrent applications, i.e., only one thread at a time can acquire a lock on a shared resource, which otherwise is not accessible. But a lock in a distributed environment is more than just a mutex in a multi-threaded application. It is more complicated because the lock now has to be acquired across all the nodes, whereas any of the nodes in the cluster or the network can fail.

Here is the user story that we're going to consider to explain scenarios in the rest of this article. The application takes data in the user’s preferred format and converts it into a standardized format, like PDF, that can be uploaded to a government portal. There are two different micro-services of the application which do these things: Transformer and Rules Engine. We use Cassandra for persistence and Kafka as a message queue. Also, please note that the user request, once accepted, returns immediately. Once the PDF is generated, the user is notified about it asynchronously. This is achieved in a sequence of steps as follows:

Distributed Locks Are Dead, Long Live Distributed Locks

"Distributed locks aren't real," "Anyone who's trying to sell you a distributed lock is selling you sawdust and lies." This may sound rather bleak, but it doesn't say that locking, itself, is impossible in a distributed system: It's just that some like to remind us that all of the system's components must participate in the protocol. This blog post is the story of how we implemented a distributed locking protocol that gives your components a straightforward way of joining in.

The coordinating component of the protocol is an object that extends the semantics of java.util.concurrent.locks.Lock. We call it FencedLock, following the naming used in Martin Kleppmann's 2016 post "How to Do Distributed Locking."