Maven Troubleshooting, Unstable Builds, and Open-Source Infrastructure

Donald Knuth famously wrote that Premature Optimization is the root of all evil. I, for one, believe that all evil comes from spuriously failing builds. Nothing steals my confidence in a project as quickly as unstable builds alternating between green and red for no reason. This is a story about unstable builds and troubleshooting. More importantly, this story is written to thank all contributors to basic software infrastructure — the infrastructure we all use and take for granted.

xkcd comic - Someday ImageMagick will finally break for good and we'll have a long period of scrambling as we try to reassemble civilization from the rubble.

Surprise in Logs

Upon logging into Azure Pipelines to review the logs of multiple failed builds, I mentally braced myself for a potentially arduous troubleshooting session. I suspected that a race condition was the culprit that caused non-deterministic outcomes. Therefore, I was surprised to discover the actual reasons for the recent build failures. They were all similar to this:

UUID: Coordination-Free Unique Keys

Let’s build an IoT application with weather sensors deployed around the globe. The sensors will collect data and we store the data along with the IDs of the sensors. We’ll run multiple database instances, and the sensors will write to the geographically closest database. All databases will regularly exchange data, so all the databases will eventually have data from all the sensors.

We need each sensor to have a globally unique ID. How can we achieve it? For example, we could run a service assigning sensor IDs as a part of the sensor installation procedure. It would mean additional architectural complexity, but it's doable. Sensor IDs are immutable, so each sensor needs to talk to the ID service only once - right after the installation. That’s not too bad.

LockSupport.parkNanos() Under the Hood and the Curious Case of Parking (Part I)

When a colleague of mine was running some experiments, he noticed LockSupport.parkNanos() would either return almost immediately or in roughly 50 microseconds steps. In other words, calling LockSupport.parkNanos(10000) would not return after 10 microseconds but roughly after 50 μs. LockSupport.parkNanos(55000) would not return after 55 μs but roughly after 100 μs, etc. The 50 μs step was present way too consistently to be a coincidence. I was curious about what was causing it, and because I love exploring how stuff works under the hood, I decided to have a closer look.

Reproducer in Java

The first step was easy: write a reproducer. I re-used code from my older and somewhat-related experiment and just added a new runner: