RFR: 8139457: Relax alignment of array elements [v65]

Wed Feb 21 20:19:07 UTC 2024

On Wed, 21 Feb 2024 11:36:34 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details.
>> 
>> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements.
>> 
>> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16.
>> 
>> Testing:
>>  - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390)
>>  - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390)
>>  - [x] tier1 (x86_64, x86_32, aarch64, riscv)
>>  - [x] tier2 (x86_64, aarch64, riscv)
>>  - [x] tier3 (x86_64, riscv)
>
> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Update copyright years
>  - Remove duplicate code

These changes look correct. Tried to look around the code for indirect assumptions surrounding the `klass_offset` and the `length_offset` w.r.t. the `base_offset`. It seem to work out. (But some of this implicit assumptions might want to be cleaned up in the future.)

I only looked at changes since my last comments, seems like my nits from then have been fixed.

There are preexisting issues with regards to naming (and their meaning) which this change exacerbates.

I do not believe this should be a blocker (done in future RFEs). However the more code movement there is with regards to this, the more I feel this needs to be overhauled. (Lilliput shakes this up even more.)

I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) `header_size_in_bytes`. (Similarly to how it is done for the `instanceOop` where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).

This seems preferable than adding these extra alignment shims in-between the header and body/payload/data initialization. (I also tried moving the alignment fix into the body initialization, but it seems a little bit messier in the implementation.)

Maybe something similar for copying and cloning. But there are already so much shims and patching code surrounding copying and cloning. E.g. in ZGC (depending on UseCompressClassPointers) we have to sub 1 from the length node in C2 to apply the proper barriers. This is required because when C2 takes out the base offset it does not know the element type, so it starts the clone from the word aligned (up) length offset (which may or may not start in the header). It would be a larger project to overhaul, but we have seen a couple of bugs related to this.

Some things that I think we should always try and strive for is when introducing new code (that talks about heap memory sizes/offsets):
 * Any named property/variable ending in `_size_in_bytes`/`_offset_in_bytes` is not required to be word aligned.
 * Any named property/variable ending in just `_size`/`_offset` must be in words. And their name should not lie. i.e. `aligned_header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize;` instead of `header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize;` unless `header_size * HeapWordSize == header_size_in_bytes()` is invariant.
Again a pragmatic choice, as it seems to be the predominant choice in hotspot.
An absolutely wonderful change would be to add (and use/enforce) typed size_t enum classes for `ByteSize` and `WordSize`. (We already have these in the code base but they use 32-bit int and only `ByteSize` sees any use). The idea is then that you can just use `_size`/`_offset` suffixes with the correct types.

<details>
<summary>Some thoughts on future naming</summary>
This is just one way that things could be named.

Note: `padding` and `klass_gap` may be 0 here.

arrayOop: After This PR
<pre>
  Layout:
|<---------------header-------------->|<---------------payload---------------->|
|<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->|
  Names:
|<-header_size_in_bytes-->|
|<-------base_offset_in_bytes-------->|
|<----------------------object_size/object_size_in_bytes---------------------->|
</pre>

The `***` just means that the alignment will end up somewhere here.
`base_offset_in_bytes + elements_size_in_bytes == aligned_base_offset + aligned_element_size` is invariant.
arrayOop: Potential Future
<pre>
  Layout:
|<--------header--------->|<---------------------payload---------------------->|
|<-mark/klass->|<-length->|<-padding->|<---------elements--------->|<-padding->|
  Names:
|<-header_size_in_bytes-->|
|<-------base_offset_in_bytes-------->|<--elements_size_in_bytes-->|
|<--aligned_header_size---****************>| // Word Aligned (UP) May include elements
|<--aligned_base_offset---****************>| // Word Aligned (UP) May include elements
                                      |<****-aligned_element_size->| // Word Aligned (Down)
|<----------------------object_size/object_size_in_bytes---------------------->|
|<-Header Initialization--***************>|
                                      |<****-Body Initialization-->|
</pre>

instanceOop: Current
<pre>
  Layout:
|<--------header-------->|<-header/body->|<---------------body---------------->|
|<------mark/klass------>|<--klass_gap-->|
                         |<------------------fields/padding------------------->|
  Names:
|<---header_size/header_size_in_bytes--->|
|<-base_offset_in_bytes->|
|<----------------------object_size/object_size_in_bytes---------------------->|
</pre>

Not sure what a good ambiguous name for the field and/or padding should be.
The `***` has a similar role.
instanceOop: Potential Future
<pre>
  Layout:
|<--------header-------->|<-----------------------body------------------------>|
|<------mark/klass------>|<------------------fields/padding------------------->|
                         |<--klass_gap-->|
   Names:
|<-header_size_in_bytes->|
|<-base_offset_in_bytes->|<------------------X_size_in_bytes------------------>|
|<----------------------object_size/object_size_in_bytes---------------------->|
|<--aligned_header_size---**************>| // Word Aligned (UP) May include elements
|<--aligned_base_offset---**************>| // Word Aligned (UP) May include elements
                         |<***************-----------aligned_X_size----------->|
|<-Header Initialization--**************>|
                         |<***************--------Body Initialization--------->|
</pre>
</details>

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/11044#pullrequestreview-1894323275