Unsafe.{get,put}-X-Unaligned performance

Tue Mar 10 19:02:05 UTC 2015

I've been measuring the performance after this patch, and as you
might expect it's always much better with UseUnalignedAccesses.

However, we can sometimes get performance regressions, albeit in some
fairly contrived cases.

I have a test which repeatedly loads a {long,int,short} at some random
offset in a ByteBuffer, XORs some random value into it, and stores the
result back in the same place.  This ByteBuffer is 1k long, so fits
nicely into L1 cache.

The old algorithm always loads and stores a long as 8 bytes.

The new algorithm does an N-way branch, always loading and storing
subwords according to their natural alignment.  So, if the address is
random and the size is long it will access 8 bytes 50% of the time, 4
shorts 25% of the time, 2 ints 12.5% of the time, and 1 long 12.5% of
the time.  So, for every random load/store we have a 4-way branch.

The new algorithm is slightly slower because of branch misprediction.

old:  2.17 IPC, 0.08% branch-misses, 91,965,281,215 cycles
new:  1.23 IPC, 6.11% branch-misses, 99,925,255,682 cycles

...but it executes fewer instructions so we're only talking about
some 10% slowdown.  I think this is the worst case (or something
close to the worst case) for the new algorithm.

So, I think we're OK performance-wise.

John: I'm waiting for an answer to my question here before I submit
a webrev for approval.

http://mail.openjdk.java.net/pipermail/panama-dev/2015-March/000099.html

Andrew.