Monday 1 December 2014

The Escape of ArrayList.iterator()

{This post assumes some familiarity with JMH. For more JMH related content start at the new and improved JMH Resources Page and branch out from there!}
Escape Analysis was a much celebrated optimisation added to the JVM in Java 6u23:
"Based on escape analysis, an object's escape state might be one of the following:
  • GlobalEscape – An object escapes the method and thread. For example, an object stored in a static field, or, stored in a field of an escaped object, or, returned as the result of the current method.
  • ArgEscape – An object passed as an argument or referenced by an argument but does not globally escape during a call. This state is determined by analyzing the bytecode of called method.
  • NoEscape – A scalar replaceable object, meaning its allocation could be removed from generated code.
After escape analysis, the server compiler eliminates scalar replaceable object allocations and associated locks from generated code. The server compiler also eliminates locks for all non-globally escaping objects. It does not replace a heap allocation with a stack allocation for non-globally escaping objects." - from Java 7 performance enhancements 
Alas, proving an object never escapes is a difficult problem, and many people feel they cannot rely on this optimisation to kick in and "do the right thing" for them. Part of the problem is that there is no easy way to discover if a particular allocation has been eliminated (on a debug OpenJDK build one can use -XX:+UnlockDiagnosticVMOptions -XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations).
The EscapeAnalysis skepticism leads some people to go as far as claim that the JIT compiler fails to eliminate the iterator allocation of collections, which are everywhere:

Buy why skepticize what you can analyze?

Theory: even simple iterators do not benefit from Escape Analysis

So lets give this some thought. I generally think the best of the JVM developer bunch. Sure they may miss here and there, they're just human after all, but we are talking about a good strong bunch of engineers. I tend to assume that if there is a simple case for an optimization that is worth while than they have not failed to capitalize on it. I would therefore be quite surprised if indeed iterators do not benefit from escape analysis as they seem quite natural candidates, particularly in the syntactic sugar case, but even in the explicit case. Still, my trust in the JVM engineers is no evidence, how can I prove this works for a given setup?
  1. Use a debug build... I invite the readers to try this method out, didna do it.
  2. Use a memory profiler, like the one packed with VisualVM or YourKit
  3. Setup an experiment to observe before/after effect of desired optimization. Use -XX:-+DoEscapeAnalysis and examine gc logs to observe effect.
  4. Look at the assembly...

Experiment: Observe a loop over an array list

Reach for your tool belts and pick the swiss army knife that is JMH. Here's the benchmark I will use for this:
This is one of them times when I'm not particularly concerned with the performance of this bit of code as such, but JMH makes a good crucible for java code. The same effort that went into correct measurement enables the examination of code in an isolated manner. This makes JMH a good tool for testing the JIT compiler.

Measurement: Profile the experiment

I want to plug the experiment into a profiler, so I set the number of iterations to 1000 and get into it man, profiling, doing it you know, like a... like a sex machine, can I count it all? Here's YourKit reporting:

Ouch! that array list iterator is right there at the top! all that trust I put in the JVM developers? GONE! Let's get a second opinion from JVisualVM, just to be sure:

Oh me, oh my, this is bad...
Finally, with tears in our eyes, let us try see what Java Mission Control has to say:

How curious! the iterator allocation is gone! but JMC is outnumbered 2 to 1 here. Could it be that some profilers are pooping their pants?

Measurement: Measure with -XX:+/-DoEscapeAnalysis

Sometimes we get lucky and the optimization we want to examine comes with a handy flag to turn it on and off. We expect escape analysis to remove the iterator allocation and thus leave us with a benchmark which generates no garbage. We are also fortunate that JMH comes with the handy GC profiler which simply examines the GC JMX bean to inform us if any collections happened. Running the benchmark with a short list size and a small heap should trigger plenty of young generation GC cycle in each measurement iteration. Lo and behold, with escape analysis on:
$java -jar target/microbenchmarks.jar -f 1 -i 10 -wi 10 -p size=1000 \
 -jvmArgs="-Xms64m -Xmx64m -XX:+DoEscapeAnalysis" -prof gc ".*.sumIteratorOverList"
Benchmark                                           (size)    Score    Error   Units
sumIteratorOverList                                   1000  816.367 ± 17.768   ns/op
sumIteratorOverList:@gc.count.profiled                1000    0.000 ±    NaN  counts
sumIteratorOverList:@gc.count.total                   1000    0.000 ±    NaN  counts
sumIteratorOverList:@gc.time.profiled                 1000    0.000 ±    NaN      ms
sumIteratorOverList:@gc.time.total                    1000    0.000 ±    NaN      ms


$java -jar target/microbenchmarks.jar -f 1 -i 10 -wi 10 -p size=1000 \
 -jvmArgs="-Xms64m -Xmx64m -XX:-DoEscapeAnalysis" -prof gc ".*.sumIteratorOverList"
Benchmark                                           (size)    Score    Error   Units
sumIteratorOverList                                   1000  940.567 ± 94.156   ns/op
sumIteratorOverList:@gc.count.profiled                1000   19.000 ±    NaN  counts
sumIteratorOverList:@gc.count.total                   1000   42.000 ±    NaN  counts
sumIteratorOverList:@gc.time.profiled                 1000    9.000 ±    NaN      ms
sumIteratorOverList:@gc.time.total                    1000   27.000 ±    NaN      ms
Now that is what we hoped for, escape analysis saves the day! Please note above numbers are from running on my laptop using Oracle Java 8u20, I was not aiming for measurement accuracy, just wanted to verify the compilation/GC aspects. Laptops are good enough for that.

WTF Profilers? Why you lie?

There's a big difference in how JVisualVM/YourKit work and how JMC work, in particular:
  • JVisualVM/YourKit: treat the JVM as a black box and profile via the JVMTI interfaces and bytecode injection. Work on all JVMs.
  • Java Mission Control: use internal JVM counters/reporting APIs only available to the Oracle JVM (so can't profile OpenJDK/Zing/IBM-J9 etc)
So why should this get in the way of escape analysis? Searching through the OpenJDK source code we can spot the following:
void C2Compiler::compile_method(ciEnv* env, ciMethod* target, int entry_bci) {
  assert(is_initialized(), "Compiler thread must be initialized");

  bool subsume_loads = SubsumeLoads;
  bool do_escape_analysis = DoEscapeAnalysis &&!env->jvmti_can_access_local_variables();
This explains a similar issue, but not this one. The above code is relevant for debuggers and other tools relying on stepping through code and examining variable values.
This issue is the result of some instrumentation trapping the reference, at least for JVisualVM. We can look at the assembly to be certain, this is the iterator non-allocation before we connect JVisualVM to the process:
Note how the iterator is initialized with no allocation actually taking place (note also how line 4 annotation is a big fat lie, this is actually getting modCount from the list). Method is recompiled to same when we attach the agent. And here's what happens when we turn on memory profiling: The iterator is now allocated (see 'new' keyword at line 18 above).We can see that the instrumenting agent added a traceObjAlloc call into the iterator constructor (line 52 above). JVisualVM is open source, so we can dig into the code of the above trace method and the instrumentation code, I leave it as an exercise to the reader. If I was a better man I might have a go at fixing it, maybe another day.

Summary

  • EscapeAnalysis works, at least for some trivial cases. It is not as powerful as we'd like it, and code that is not hot enough will not enjoy it, but for hot code it will happen. I'd be happier if the flags for tracking when it happens were not debug only.
  • Profilers can kill EscapeAnalysis. In the example above we have very little code to look at, so it is easy to cross examine. In the real world you'd be profiling a much larger codebase, looking for allocation elimination opportunities, I suggest you have a healthy doubt in your profiler.
  • Java Mission Control is my current favourite profiler for the Oracle JVM. It's a shame the underlying APIs are not made public and standard so that tool makers can rely on them for all JVMs. Perhaps one day at least part of these APIs will be made public.
Thanks Peter Lawrey for reviewing this post!