JMH is a Java Micro-benchmarking Harness written by the Oracle/Sun Performance Engineering team (with Shipilev leading the effort on this tool). It's actively developed and maintained, gaining wide recognition, and considered by many to be a fine tool for the job.
The framework consists of a few main groups of functionality:
- Benchmark code generation, driven by annotations. The generated classes and all their dependencies get packaged in an all-in-one runnable jar.
- Benchmark runner, supporting single threaded, multi-threaded and thread groups to run your benchmark code on. The runner also supports running multiple repeated launches of the same benchmarks etc.
- Pluggable profilers
- Multi-language support
- Reporting formats - JSON, CSV, Summary
- Programmable API
- "Introduction to JMH": JMH 0.1 was released to the unsuspecting public in March 2013. I was using this early version, so some of the API has changed since, but not much. This post briefly discusses the challenges of Java benchmarks and moves on to demonstrate a JMH benchmark of UTF encoding implementations and contrast it with a hand rolled and a Caliper versions of the same benchmark. The post then explores some of the command line options and profilers. The focus in this post is on the amazing amount of value (i.e. ways to analyse and run your benchmark) you get for a relatively small effort with JMH. The samples have since been updated (the code outpaces the documentation). This was in fact the second post for which I used JMH, the first being a comparison of merging queue implementations, at the time of writing JMH was still unannounced so it is not mentioned in the post but the experiment may still serve as an example.
- "Writing (symmetric) Multi-Threaded Benchmarks": Symmetric (or uniform) in the sense that all threads execute the same method, potentially on the same state. In this article I look at using JMH to benchmark the effects of false-sharing. The post further explores the State annotation and the use of shared (i.e. not Thread) state for writing uniform multi-threaded benchmarks. I also demonstrate some of the useful outputs JMH has for this purpose. The analysis of run to run variance is instructive and an important reminder to not skimp on the number of forked runs.
- Writing Asymmetric Multi-Threaded Benchmarks: Asymmetric in the sense that different threads play different roles and call different methods. I have written a few posts in this area, reflecting my interest in concurrent data-structures and in particular lock free queues:
- "Java Concurrent Counters By Numbers" and "Advice For the Concurrently Confused" both look at the effect of multiple readers and writers on counters increment/get cost. The benchmarks/posts demonstrate the use of thread groups and the @Group annotation and relevant command line options. Some consideration is given to the potentially confusing affects of asymmetry on results.
- Concurrent Queue Latency (Part 1, and Part 2): This is an exploration of queue based inter thread ping pong latency. For the most part the same features are demonstrated as above, but this example shows some complex use of benchmark State and thread groups. Computing RTT is more suitable for JMH than point to point latency and the post demonstrate one way of achieving that.
- All the posts demonstrate the importance of thread pinning for stability of results in a multi-core/socket/threaded environment.
- There are other queue related JMH benchmarks to be found in JCTools (my lock free queues and concurrency tools effort).
- "The Volatile Read Surprise": Considering the volatile read effect on nano-benchmarks and some exploration of how Blackholes work. Also discussing nano-benchmarks and the importance of doubt (when in doubt, doubt yourself).
- "Disassembling a JMH Nano-Benchmark": A look at a JMH benchmark through the assembly glasses and an attempt at providing readers with some basic orientation around the typical assembly code they might see in the JMH context.
- "The Escape of ArrayList.iterator()": Using JMH as a compiler optimization test tool and as a handy experiment lab for the comparison of profilers. In particular demonstrating how some profilers break escape analysis when profiling memory allocation. This is contrasted here with the black box, crude but reliable gc profiler which is part of JMH.
Shipilev's post on JMH deserve a section of their own. I highlight the demonstration of particular JMH features, but all the posts offer much to learn on performance analysis and benchmarking methodology. In that sense these posts are well worth reading even if you have no intention to use JMH, but want to see a master at work.
- A nano-benchmark example: comparing volatile increments to plain increments and demonstrating consideration and analysis. Also demonstrating a CPU backoff method which it argues to make a more 'real-world' benchmark of a nano-benchmark. The rant/discussion on visualizing results and composability of benchmark results is valuable advice and insight.
- A uniform multi-threaded benchmark example: By calling nanoTime from multiple threads the benchmark is used to examine the scalability of nanoTime. The benchmark code itself is very simple, but the insight to the thought process is hugely valuable. The post also demonstrates the importance of examining results from multiple platforms to support meaningful analysis.
- A demonstration of how JMH can be used from the API and how a benchmark can be packaged to provide a measurement utility, used in this context to collect results from many volunteers on their own hardware.
- In the "Omission Considerations" section there is an in depth explanation of the way measurement and setup work in JMH to try and give accurate measurement for multi-threaded benchmarks and the challenges therein.
- In the "Steady State Considerations" section there is a discussion of benchmarks which measure a mutable quantity of work and some suggestions on how you can use JMH in this context.
- "Java vs. Scala: Divided We Fail": This post demonstrates the JMH support for Scala and the use of the stack and perfasm profilers.
- Shipilev also uses JMH to demonstrate the effects of modifications to the JMM on performance in 2 interesting posts ("All Fields Are Final" and "All Accesses Are Atomic"). No further JMH features are explored but the demonstration of principles is well worth the read.
- There's allot of benchmarking and JMH related content from presentations and such here.
Other people have written posts on JMH (happy to add others, just let me know):
- Michael Nitschinger wrote a step by step example and the comments contain valuable dialogue with Shipilev on the validity of the benchmark.
- Richard Warburton wrote a post on using JMH to examine the effects of mega-morphism on method invocation costs.
- Java Performance.info: 2 reference posts offering some coverage of framework basics and a nice detailed coverage of the packaged profilers. This is more cheat sheet than example benchmarks stuff.
- Daniel Mitterdorfer posted a series of articles covering some background motivations for JMH (first 3 posts), and some examples (noop benchmark and a uniform mutithreaded benchmark).
Use The SOURCE, Luke!Finally, as pointed out in all of these posts as well as my own, the jmh-samples offer important instruction and advice on proper usage and pitfalls.
- A seminal post by Shipilev on the mechanical-sympathy group contrasting JMH and Caliper offers some insight into the considerations which went into the building of JMH.