Garbage Collection Tuning Success Story: Reducing Young Gen Size

When you tune Garbage collection performance, you are not only improving Garbage collection pause time but also the overall application’s response time and reducing cloud computing cost. Recently, we helped to tune the Garbage collection behavior of a popular application. Just by making a minor change, it resulted in a dramatic improvement. Let’s discuss this garbage collection tuning success story in this post.

Garbage Collection KPIs

There is a famous saying that "you can’t optimize something that you can’t measure." When it comes to garbage collection tuning, there are only 3 primary Key Performance Indicators (KPIs) that you should be focusing upon:

Steal CPU Time — ‘st’ Time in Top

CPU consumption in Unix/Linux operating systems is studied using 8 different metrics: User CPU time, System CPU time, Nice CPU time, Idle CPU time, Waiting CPU time, Hardware Interrupt CPU time, Software Interrupt CPU time, Stolen CPU time. In this article let us study ‘Steal (or Stolen) CPU time.’

What is ‘Steal’ CPU Time?

‘Steal time’ (also known as ‘Stolen’ time) is relevant only in cloud environments (like AWS) or VMWare environments, where multiple virtual machines will be run on one underlying physical host. In such circumstances, CPU resources will be shared amongst the multiple virtual machines. The hypervisor is a technology that will distribute the underlying physical host’s CPU resources and other resources amongst the virtual machines.

nice CPU Time – ‘ni’ Time in top

CPU consumption in Unix/Linux operating systems are studied using 8 different metrics: User CPU time, System CPU time, nice CPU time, Idle CPU time, Waiting CPU time, Hardware Interrupt CPU time, Software Interrupt CPU time, and Stolen CPU time. In this article, let's study ‘nice CPU time’.

What Is ‘nice’ CPU Time?

To understand ‘nice’ CPU time, one must first understand ‘nice’. When there is a CPU contention (i.e., multiple processes contend for CPU cycles), the process with higher priority are given more chances to run. In Unix/Linux operating systems, processes are launched with a priority 0 (default). However, using the ‘nice’ command, super users (like ‘root’) can set the priority for the processes as shown below:

Load Average: An Indicator for Only CPU Demand?

Load Average‘ is an age-old metric reported in various operating systems. It’s often assumed as a metric to indicate the CPU demand only. However, that is not the case. ‘Load Average’ not only indicates CPU demand, but also the I/O demand (i.e., network read/write, file read/write, disk read/write). To prove this theory, we conducted this simple case study.

Load Average Study

To validate this theory, we leveraged BuggyApp. BuggyApp is an open-source Java project that can simulate various performance problems. When you launch BuggyApp with the following arguments, it will cause high disk I/O operations on the host.

Different CPU Times: Unix/Linux ‘top’

CPU consumption in Unix/Linux operating systems is studied using eight different metrics: User CPU time, System CPU time, nice CPU time, Idle CPU time, Waiting CPU time, Hardware Interrupt CPU time, Software Interrupt CPU time, Stolen CPU time. Let’s review each of the CPU time in this article.

User CPU Time and System CPU Time

In order to understand ‘user’ CPU Time, one should understand ‘system’ CPU time as well, since they go hand in hand. User CPU time is the amount of time the processor spends in running your application code. System CPU Time is the amount of time the processor spends in running the operating system(i.e., kernel) functions connected to your application. Let’s say your application is manipulating the elements in an array; then, it will be accounted as ‘user’ CPU time. Let’s say your application is making network calls to external applications. To make network calls, it has to read/write data into socket buffers which is part of the operating system code. This will be accounted as ‘system’ CPU time. To learn how to resolve high ‘user’ CPU time, refer to this article. To learn how to resolve high ‘system’ CPU time, refer to this article.