Chronicle Services: Low Latency Java Microservices Without Pain

Low Latency?

In computing, latency is defined as the length of time to perform some task. This could be the time it takes to respond to an interrupt from hardware or the time it takes for a message sent by one component to be available to its recipient.

In many cases, latency is not seen as a primary non-functional concern when designing an application, even when considering performance. Most of the time, after all, computers seem to do their work at speeds that are well beyond human perception, typically using scales of milliseconds, microseconds, or even nanoseconds. The focus is often more on throughput - a measure of how many events can be handled within a given time period. However, basic arithmetic tells us that if a service can handle an event with low latency (for example, microseconds), then it will be able to handle far more events within a given time period, say 1 second, than a service with millisecond event handling latency. This can allow us to avoid, in many cases, the need to implement horizontal scaling (starting new instances) of a service, a strategy that introduces significant complexity into an application and may not even be possible for some workloads.

Java and Low Latency

I have lost count of the number of times I have been told that Java is not a suitable language in which to develop applications where performance is a major consideration. My first response is usually to ask for clarification on what is actually meant by “performance” as two of the most common measures - throughput and latency, sometimes conflict with each other, and approaches to optimise for one may have a detrimental effect on the other. 

Techniques exist for developing Java applications that match, or even exceed, the performance requirements of applications that have been built using languages more traditionally used for this purpose. However, even this may not be enough to get the best performance from a latency perspective. Java applications still have to rely on the Operating System to provide access to the underlying hardware. Typically latency-sensitive (often called “Real Time”) applications operate best when there is almost direct access to the underlying hardware, and the same applies to Java. In this article, we will introduce some approaches that can be taken when we want to have our applications utilise system resources most effectively.