RFR: 8139457: Relax alignment of array elements [v65]

Thu Feb 22 13:19:04 UTC 2024

On Wed, 21 Feb 2024 20:15:55 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> These changes look correct. Tried to look around the code for indirect assumptions surrounding the `klass_offset` and the `length_offset` w.r.t. the `base_offset`. It seem to work out. (But some of this implicit assumptions might want to be cleaned up in the future.)
> 
> I only looked at changes since my last comments, seems like my nits from then have been fixed.
> 
> There are preexisting issues with regards to naming (and their meaning) which this change exacerbates.
> 
> I do not believe this should be a blocker (done in future RFEs). However the more code movement there is with regards to this, the more I feel this needs to be overhauled. (Lilliput shakes this up even more.)

Yes, there is still much to do. I just don't want to overload this PR with clutter.

> I know @albertnetymk already touched on this but some thoughts on the unclear boundaries between the header and the data. My feeling is that the most pragmatic solution would be to have the header initialization always initialize up to the word aligned (up) `header_size_in_bytes`. (Similarly to how it is done for the `instanceOop` where the klass gap gets initialized with the header, even if it may be data.) And have the body initialization do the rest (word aligned to word aligned clear).
> 
> This seems preferable than adding these extra alignment shims in-between the header and body/payload/data initialization. (I also tried moving the alignment fix into the body initialization, but it seems a little bit messier in the implementation.)

This seems indeed the most pragmatic way to deal with it. In the (very) long run, I hope we can settle on a single fixed object layout (e.g. 4-byte header, 4-byte length, then payload for arrays and 4-byte headers, then payload for objects), until then we need to live with the shims. Aligning initialization on word-boundaries makes sense, though.

> Maybe something similar for copying and cloning.

Ugh, yes. But not in this PR ;-)

> Some things that I think we should always try and strive for is when introducing new code (that talks about heap memory sizes/offsets):
> 
> * Any named property/variable ending in `_size_in_bytes`/`_offset_in_bytes` is not required to be word aligned.
> * Any named property/variable ending in just `_size`/`_offset` must be in words. And their name should not lie. i.e. `aligned_header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize;` instead of `header_size = align_up(header_size_in_bytes(),HeapWordSize) / HeapWordSize;` unless `header_size * HeapWordSize == header_size_in_bytes()` is invariant.
>   Again a pragmatic choice, as it seems to be the predominant choice in hotspot.
>   An absolutely wonderful change would be to add (and use/enforce) typed size_t enum classes for `ByteSize` and `WordSize`. (We already have these in the code base but they use 32-bit int and only `ByteSize` sees any use). The idea is then that you can just use `_size`/`_offset` suffixes with the correct types.

Agree with all of that.

> Some thoughts on future naming

Agree on that, too.

Also, another thing that should be cleaned-up/gotten right is arrayOopDesc::max_array_length() (use sane return type, not int32_t, fix maximum lengths to be less pessimistic, esp on 32bit platforms). But not here, and perhaps not until we're done with the various planned layout changes.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1959435914