Superword - Aligning arrays
John Rose
John.Rose at Sun.COM
Wed Feb 4 21:40:45 PST 2009
On Feb 4, 2009, at 8:47 PM, James Walsh wrote:
> Let's say the heap is at 0x1000. I have a 5 float array that I want
> strongly aligned. I create a dummy 20 byte array leaving the heap
> at 0x1014. I create the 5 float array with a 12 byte header. The
> data in the array oop is aligned on the 16byte boundary for SIMD and
> the whole oop is a nice round 32 bytes which is MinObjAlignment
> aligned. Unfortunately the heap is now at 0x1034 which will assert
> on the next allocation.
Oh, I was forgetting the effect of aligning the array base; thanks for
the clear example.
This gets back to our exchange on Monday about option #3, which is
allowing array bases to be (relatively) unaligned, and just requiring
SIMD code (incl. optimized loops) to know about the oddity.
It's easier than fixing arrays so that their bases are more strongly
aligned. I think that would require lots of changes in lots of
places, more places than the relatively simple hack of inserting dummy
objects.
Let me try some examples with a new float or double array, where you
want to align the first possible element (0 or 1) mod 16....
Since the JVM alignment is always 8 but your vector alignment is 16,
your decision is always between doing nothing and inserting an 8-byte
dummy object (new Object() would do it nicely). On a 64-bit JVM,
there is no such thing as an 8-byte object, so you are stuck with
inserting a 24-byte dummy object (new Object[0] works great).
Case #0F: heap HWM is 0 mod 16
0x1000: array mark, klass
0x1008: array length, array[0] (fixed odd leading element)
0x1010: array[1..2]
0x1018: array[3..4]
Case #1F: heap HWM is 8 mod 16 (need a dummy)
0x1008: dummy mark, klass
0x1010: array mark, klass
0x1018: array length, array[0] (fixed odd leading element)
0x1020: array[1..2]
0x1028: array[3..4]
In both cases, a[1..4] is a strongly aligned f4 vector, and a[0] is a
fixed odd leading element. (Option #3.)
Although it is annoying for loops to deal with the odd leading
element, the logic almost certainly folds into more general logic
which deals with vector operations that start at non-zero offsets from
the array base.
In the case of a long array, there is no odd leading element:
Case #0D: heap HWM is 0 mod 16
0x1000: array mark, klass
0x1008: array length, padding
0x1010: array[0]
0x1018: array[1]
Case #1D: heap HWM is 8 mod 16 (need a dummy)
0x1008: dummy mark, klass
0x1010: array mark, klass
0x1018: array length, padding
0x1020: array[0]
0x1028: array[1]
In the case of a 64-bit JVM with compressed oops, the array header is
16 bytes and the base is always 64-bit aligned:
Case #0F/C: heap HWM is 0 mod 16
0x1000: array mark (64 bits)
0x1008: array klass, length
0x1010: array[0..1]
0x1018: array[2..3]
Case #1F/C: heap HWM is 8 mod 16 (need a dummy)
0x1008: dummy mark, klass
0x1010: array mark (64 bits)
0x1018: array klass, length
0x1020: array[0..1]
0x1028: array[2..3]
In the case of a 64-bit JVM with a full-sized oops, the array header
is 24 bytes and the base is always 64-bit aligned.
Case #0F/X: heap HWM is 0 mod 16 (need a dummy)
0x1000: dummy mark
0x1008: dummy klass
0x1010: dummy body (e.g., array of length zero)
0x1018: array mark
0x1020: array klass
0x1028: array length, padding
0x1030: array[0..1]
0x1038: array[2..3]
Case #1F/X: heap HWM is 8 mod 16
0x1008: array mark
0x1010: array klass
0x1018: array length, padding
0x1020: array[0..1]
0x1028: array[2..3]
See arrayOopDesc::base_offset_in_bytes for all the grody details about
array layout.
It is worthwhile scrounging up a a one-word, 8-byte pseudo-object in
case #0F/X, if it already exists. There might already be such a thing
somewhere in the GC code.
-- John
More information about the hotspot-dev
mailing list