Tuesday 25 February 2014

Unsafe Pointer Chasing: Running With Scissors

Love running? Love scissors? I know just the thing for you! Following on from recent discussion on the Mechanical Sympathy mailing list I see an anti pattern worth correcting in the way people use Unsafe. I say correcting as I doubt people are going to stop, so they might as well be made aware of the pitfalls. This pattern boils down to a classic concurrency bug:

Q: "But... I not be doing no concurrency or nuffin' guv"
A: Using Unsafe to gain a view of on-heap addresses is concurrent access by definition.

Unsafe address: What is it good for?

Absolutely nothing! sayitagain-huh! I exaggerate, if it was good for nothing it would not be there, let's look at the friggin manual:
As we can see the behaviour is only defined if we use the methods together, and by that I mean that get/putAddress are only useful when used with an address that is within a block of memory allocated by allocateMemory. Now undefined is an important word here. It means it might work some of the time... or it might not... or it might crash your VM. Let's think about this.

Q: What type of addresses are produced by allocateMemory?
A: Off-Heap memory addresses -> unmanaged memory, not touched by GC or any other JVM processes

The off-heap addresses are stable from the VM point of view. It has no intention of running around changing them, once allocated they are all yours to manage and if you cut your fingers in the process or not is completely in your control, this is why the behaviour is defined. On-Heap addresses on the other hand are a different story.

Playing With Fire: Converting An Object Ref to An Address

So imagine you just had to know the actual memory address of a given instance... perhaps you just can't resist a good dig under the hood, or maybe you are concerned about memory layout... Here's how you'd go about it:
Now... you'll notice the object ref needs a bit of cuddling to turn into an address. Did I come up with such devilishly clever code myself? No... I will divulge a pro-tip here:
If you are going to scratch around the underbelly of the JVM, learn from as close to the JVM as you can -> from the JDK classes, or failing that, from an OpenJDK project like JOL (another Shipilev production)
In fact, the above code could be re-written to:
Now that we have the address what can we do with it? Could we use it to copy the object? maybe we could read or modify the object state? NO! we can but admire it's numerical beauty and muse on the temperamental values waiting at the other end of that address. The value at the other end of this address may have already been moved by GC...

Key Point: On-Heap Addresses Are NOT Stable

Consider the fact that at any time your code may be paused and the whole heap can be moved around... any address value you had which pointed to the heap is now pointing to a location holding data which may be trashed/outdated/wrong and using that data will lead to a funky result indeed. Also consider that this applies to class metadata or any other internal accounting managed by the JVM.
If you are keen to use Unsafe in the heap, use object references, not addresses. I would urge you not to mix the 2 together (i.e. have object references to off-heap memory) as that can easily lead to a very confused GC trying to chase references into the unknown and crashing your VM.

Case Study: SizeOf an Object (Don't do this)

This dazzling fit of hackery cropped up first (to my knowledge) here on the HighScalability blog:
This is some sweet macheta swinging action :-). The dude who wrote this is not suggesting it is safe, and only claims it is correct on a 32bit VM. And indeed, it can work and passes cursory examination. The author also states correctly that this will not work for arrays and that with some corrections this can be made to work for 64 bit JVMs as well. I'm not going to try and fix it for 64 bit JVMs, though most of the work is already done in the JOL code above. The one flaw in this code that cannot be reliably fixed is that it relies on the native Klass address (line 6) to remain valid long enough for it to chase the pointer through to read the layout helper (line 8). Spot the similarity to the volatile bug above?
This same post demonstrates how to forge references from on-heap objects to off-heap 'objects' which in effect let you cast a pointer to a native reference to an object. It goes on to state that is a BAD IDEA, and indeed it can easily crash your VM when GC comes a knocking (but it might not, I didn't try).

Case Study: Shallow Off-Heap Object Copy (Don't do this)

Consider the following method of making an off-heap copy of an object (from here, Mishadof's blog):
We see the above is using the exact same method for computing size as demonstrated above. It's getting the on-heap object address (limited correctness, see addresses discussion above) than copying the object off-heap and reading it back as a new object copy... Calling the Unsafe.copyMemory(srcAddress, destAddress, length) is inviting the same concurrency bug discussed above. A similar method is demonstrated in the HighScalability post, but  there the copy method used is Unsafe.copyMemory(srcRef, srcOffset, destRef, destOffset, length). This is important as the reference using method is not exposed to the same concurrency issue.
Both are playing with fire ofcourse by converting off-heap memory to objects. Imagine this scenario:
  • a copy of object A is made which refers to another object B, the copy is presented as object C
  • object A is de-referenced leading to A and B being collected in the next GC cycle
  • object C is still storing a stale reference to B which is no managed by the VM
What will happen if we read that stale reference? I've seen the VM crash in similar cases, but it might just give you back some garbage values, or let you silently corrupt some other instance state... oh, the fun you will have chasing that bugger down...

Apologies

I don't mean to present either of the above post authors as fools, they are certainly clever and have presented interesting findings for their readers to contemplate without pretending their readers should run along and build on their samples. I have personally commented on some of the code on Mishadof's post and admit my comments were incomplete in identifying the issues discussed above. If anything I aim to highlight that this hidden concurrency aspect can catch out even the clever.
Finally, I would be a hypocrite if I told people not to use Unsafe, I end up using it myself for all sorts of things. But as Mr. Maker keeps telling us "Be careful, because scissors are sharp!"

2 comments:

  1. a short demonstration with code - http://fasterjava.blogspot.com.au/2014/03/chasing-pointers-with-scissors-or.html

    ReplyDelete

Note: only a member of this blog may post a comment.