Java Micro Benchmarking: Improve Code Performance and Identify Bottlenecks
Discover the essentials of Java micro benchmarking to measure the performance of your application code or libraries. Learn how to detect bottlenecks, optimize throughput, and enhance efficiency under heavy load, ensuring your Java application remains fast and responsive.
Java - Micro Benchmark
Java Benchmarking
Benchmarking is a technique to check the performance of the application or of a section of application code in terms of throughput, average time taken, etc. In Java benchmarking, we check the performance of application code or of a library being used. This benchmarking helps in identifying any bottleneck in our code in case of heavy load or to identify the performance degradation of the application.
Importance of Java Benchmarking
Application performance is a very important attribute of any application. Poorly written code can lead to an unresponsive application, high memory usage, which can lead to poor user experience and even render the application unusable. To contain such issues, benchmarking an application is of utmost importance.
A Java Developer should be able to find issues in an application and fix them using the benchmarking technique which can help identify the sluggish code.
Java Benchmarking Techniques
There are multiple techniques that developers deploy to benchmark the application code, system, library, or any component. Following are a few of the important techniques:
Benchmarking Using Start/End Time
This technique is easy and applicable on a small set of code, where we get the start time in nanoseconds before calling a code and then after executing the method, we again get the time. Now using the difference of end time and start time gives an idea of how much time the code is taking. This technique is simple but is not reliable as performance can vary based on many factors like garbage collectors in action, and any system process running during that time.
Code
// get the start time
long startTime = System.nanoTime();
// execute the code to be benchmarked
long result = operations.timeTakingProcess();
// get the end time
long endTime = System.nanoTime();
// compute the time taken
long timeElapsed = endTime - startTime;
Example
Following example shows a running example to demonstrate the above concept.
Code
package com.tutorialsarena;
public class Operations {
public static void main(String[] args) {
Operations operations = new Operations();
// get the start time
long startTime = System.nanoTime();
// execute the code to be benchmarked
long result = operations.timeTakingProcess();
// get the end time
long endTime = System.nanoTime();
// compute the time taken
long timeElapsed = endTime - startTime;
System.out.println("Sum of 100,000 natural numbers: " + result);
System.out.println("Elapsed time: " + timeElapsed + " nanoseconds");
}
// get the sum of first 100,000 natural numbers
public long timeTakingProcess() {
long sum = 0;
for(int i = 0; i < 100000; i++ ) {
sum += i;
}
return sum;
}
}
Let us compile and run the above program, this will produce the following result:
Output
Sum of 100,000 natural numbers: 4999950000
Elapsed time: 1111300 nanoseconds
Benchmarking Using Java MicroBenchmark Harness (JMH)
Java Microbenchmark Harness (JMH) is a powerful benchmarking API developed by the OpenJDK community to check the performance of a code. It provides a simple annotation-based approach to get benchmark data of a method/class with minimal coding required from the developer's end.
Step 1:
Annotate the class/method to be benchmarked.
Code
@Benchmark
public long timeTakingProcess() {
}
Step 2:
Prepare the Benchmark options and run using Benchmark Runner.
Code
// prepare the options
Options options = new OptionsBuilder()
.include(Operations.class.getSimpleName()) // use the class whose method is to be benchmarked
.forks(1) // create the fork(s) which will be used to run the iterations
.build();
// run the benchmark runner
new Runner(options).run();
Dependencies
In order to use the benchmarking library, we need to add the following dependencies in pom.xml
in case of a Maven project.
Code
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.35</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.35</version>
</dependency>
Following is the complete code of the pom.xml
used to run the above benchmarking example.
Code
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.tutorialsarena</groupId>
<artifactId>test</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>test</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.35</jmh.version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<configuration>
<source>17</source>
<target>17</target>
<annotationProcessorPaths>
<path>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
${jmh.version}</version>
</path>
</annotationProcessorPaths>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>benchmarks</finalName>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</build>
</project>
Complete Benchmarking Code Example
Following is the complete code of the class which is used to run the above benchmarking example.
Code
package com.tutorialsarena.test;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
public class Operations {
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(Operations.class.getSimpleName())
.forks(1)
.build();
new Runner(options).run();
}
// get the sum of first 100,000 natural numbers
@Benchmark
public long timeTakingProcess() {
long sum = 0;
for(int i = 0; i < 100000; i++ ) {
sum += i;
}
return sum;
}
}
Let us compile and run the above program, this will produce the following result:
Code
# JMH version: 1.35
# VM version: JDK 21.0.2, Java HotSpot(TM) 64-Bit Server VM, 21.0.2+13-LTS-58
# VM invoker: C:\Program Files\Java\jdk-21\bin\java.exe
# VM options: -Dfile.encoding=UTF-8 -Dstdout.encoding=UTF-8 -Dstderr.encoding=UTF-8 -XX:+ShowCodeDetailsInExceptionMessages
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.tutorialsarena.test.Operations.timeTakingProcess
# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration 1: 33922.775 ops/s
# Warmup Iteration 2: 34104.930 ops/s
# Warmup Iteration 3: 34519.419 ops/s
# Warmup Iteration 4: 34535.636 ops/s
# Warmup Iteration 5: 34508.781 ops/s
Iteration 1: 34497.252 ops/s
Iteration 2: 34338.847 ops/s
Iteration 3: 34355.355 ops/s
Iteration 4: 34105.801 ops/s
Iteration 5: 34104.127 ops/s
Result "com.tutorialsarena.test.Operations.timeTakingProcess":
34280.276 ±(99.9%) 660.293 ops/s [Average]
(min, avg, max) = (34104.127, 34280.276, 34497.252), stdev = 171.476
CI (99.9%): [33619.984, 34940.569] (assumes normal distribution)
# Run complete. Total time: 00:01:40
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise extra caution when trusting the results, look into the generated code to check the benchmark still works, and factor in a small probability of new VM bugs. Additionally, while comparisons between different JVMs are already problematic, the performance difference caused by different Blackhole modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.
Benchmark Mode Cnt Score Error Units
Operations.timeTakingProcess thrpt 5 34280.276 ± 660.293 ops/s