RFR: 8139457: Array bases are aligned at HeapWord granularity [v38]

Mon Jul 3 18:00:27 UTC 2023

On Mon, 3 Jul 2023 13:24:39 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details.
>> 
>> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements.
>> 
>> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16.
>> 
>> Testing:
>>  - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390)
>>  - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390)
>>  - [x] tier1 (x86_64, x86_32, aarch64, riscv)
>>  - [x] tier2 (x86_64, aarch64, riscv)
>>  - [x] tier3 (x86_64, riscv)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Build fixes

Another round of reviews:

src/hotspot/cpu/arm/c1_MacroAssembler_arm.hpp line 50:

> 48: 
> 49:   void allocate_object(Register obj, Register tmp1, Register tmp2, Register tmp3,
> 50:                        int header_size_in_bytes, int object_size,

I don't see the related change in `C1_MacroAssembler::allocate_object` definition. At very least, it affects the assert that compares with `object_size` (in words?).

src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp line 383:

> 381:   // Zero first 4 bytes, if start offset is not word aligned.
> 382:   if (!is_aligned(base_offset_in_bytes, BytesPerWord)) {
> 383:     assert(is_aligned(base_offset_in_bytes, BytesPerInt), "weird alignment");

"must be 4-byte aligned"?

src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 201:

> 199:   int start_offset_in_bytes = hdr_size_in_bytes;
> 200:   if (!is_aligned(start_offset_in_bytes, BytesPerWord)) {
> 201:     assert(is_aligned(start_offset_in_bytes, BytesPerInt), "must be 32-bit-aligned");

"must be 4-byte aligned"?

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 315:

> 313:   // fields will be inconsistent. This could cause a refinement
> 314:   // thread to calculate the object size incorrectly.
> 315:   Copy::fill_to_bytes(new_obj, oopDesc::base_offset_in_bytes(), 0);

Since this is a humongous allocation anyway, can we accept `fill_to_words` that would write 4 bytes past the object header, touching the object data? Keeps the code in the similar shape then.

src/hotspot/share/gc/parallel/mutableNUMASpace.cpp line 27:

> 25: #include "precompiled.hpp"
> 26: #include "gc/parallel/mutableNUMASpace.hpp"
> 27: #include "gc/shared/collectedHeap.inline.hpp"

Is this still needed? There no other changes in this compilation unit.

src/hotspot/share/gc/shared/collectedHeap.cpp line 412:

> 410:   int payload_offset = arrayOopDesc::base_offset_in_bytes(T_INT);
> 411:   if (!is_aligned(payload_offset, HeapWordSize)) {
> 412:     assert(is_aligned(payload_offset, BytesPerInt), "base offset must be 32-bit-aligned");

"must be 4-byte aligned"?

src/hotspot/share/gc/shared/collectedHeap.cpp line 444:

> 442: 
> 443:   const size_t payload_size_bytes = words * HeapWordSize - arrayOopDesc::base_offset_in_bytes(T_INT);
> 444:   assert(is_aligned(payload_size_bytes, BytesPerInt), "must be int aligned");

"must be 4-byte aligned"?

src/hotspot/share/gc/shared/jvmFlagConstraintsGC.cpp line 343:

> 341:     JVMFlag::printError(verbose,
> 342:                         "MinTLABSize (" SIZE_FORMAT " bytes) must be "
> 343:                         "less than or equal to ergonomic TLAB maximum (" SIZE_FORMAT " words)\n",

Wait, let's not mix "bytes" and "words" in the same message. Users do not readily know what is the HeapWordSize on the machine, or even how bytes and words might be related.

Is there a reason to even change these? Does the original code overflow?

src/hotspot/share/gc/shared/memAllocator.cpp line 378:

> 376:   assert(mem != nullptr, "cannot initialize null object");
> 377:   assert(_word_size * HeapWordSize >= hdr_size_bytes, "unexpected object size");
> 378:   Copy::fill_to_bytes((char*)mem + hdr_size_bytes, _word_size * HeapWordSize - hdr_size_bytes);

Performance-sensitive path, let's keep `Copy::fill_to_aligned_words` and compensate for misalignment with a single store, as we do everywhere else.

src/hotspot/share/gc/shared/memAllocator.cpp line 403:

> 401:   assert(_length >= 0, "length should be non-negative");
> 402:   if (_do_zero) {
> 403:     ArrayKlass* array_klass = ArrayKlass::cast(_klass);

Do you need this `array_klass`? If this is for asserts, then it should be done as assert.

src/hotspot/share/gc/shared/memAllocator.hpp line 81:

> 79:   HeapWord* mem_allocate(Allocation& allocation) const;
> 80: 
> 81:   MemRegion obj_memory_range(oop obj) const {

I don't see any usages of `obj_memory_range` anywhere in current sources. I think it was removed by [JDK-8171221](https://bugs.openjdk.org/browse/JDK-8171221). I think I'll just remove it in a separate PR: [JDK-8311249](https://bugs.openjdk.org/browse/JDK-8311249).

src/hotspot/share/gc/shared/threadLocalAllocBuffer.inline.hpp line 44:

> 42:     // successful thread-local allocation
> 43: #ifdef ASSERT
> 44:     Copy::fill_to_words(obj, size, badHeapWordVal);

So, this is safe after CMS removal, or for some other reason? I think this and other two removals near `Skip mangling the space corresponding to the object header` comment should be done in a separate PR to avoid cluttering this one.

src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 1998:

> 1996:               //     this area, and we still need to initialize it
> 1997:               if (DEBUG_ONLY(true ||) !ZeroTLAB) {
> 1998:                 size_t hdr_size = instanceOopDesc::base_offset_in_bytes();

For Zero case, I think we can just `Copy::fill_to_words` the entire object: the overhead of dealing with header math is likely comparable to the overhead of additional header writes and the benefits the word-sized clearing brings. This can probably be separated in another PR that deals with zapping better.

src/hotspot/share/oops/arrayOop.cpp line 32:

> 30: // overflow. We also need to make sure that this will not overflow a size_t on
> 31: // 32 bit platforms when we convert it to a byte size.
> 32: int32_t arrayOopDesc::max_array_length(BasicType type) {

Looks like used from some hot paths. If you don't want it in header, can it be in `.inline.hpp` instead?

src/hotspot/share/oops/arrayOop.hpp line 76:

> 74:   }
> 75: 
> 76:   // Header size computation.

Any reasons why this is moved? Visibility change? Would it be uglier to do in place?

src/hotspot/share/opto/memnode.cpp line 2098:

> 2096:       // test to (klass != Serializable && klass != Cloneable).
> 2097:       assert(Opcode() == Op_LoadI, "must load an int from _layout_helper");
> 2098:       jint min_size = Klass::instance_layout_helper(checked_cast<jint>(heap_word_size(oopDesc::base_offset_in_bytes())), false);

This is not the first time we do the whole `checked_cast<jint>(heap_word_size(oopDesc::base_offset_in_bytes())` dance. Maybe it should be a helper method straight in `oopDesc` then?

src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1289:

> 1287:     // zero out fields (but not the stack)
> 1288:     const size_t hs = oopDesc::base_offset_in_bytes();
> 1289:     Copy::fill_to_bytes((char*)mem + hs, vmClasses::StackChunk_klass()->size_helper() * HeapWordSize - hs);

Likely performance-sensitive path. Can we do the same "adjust if only aligned by 4 bytes, then do aligned words copy", as in other places?

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Array.java line 68:

> 66:         return !VM.getVM().isCompressedOopsEnabled();
> 67:       }
> 68:     }

I think `isCompressedOopsEnabled` already does the right thing, so:

Suggestion:

    if (type == BasicType.T_OBJECT || type == BasicType.T_ARRAY) {
      return !VM.getVM().isCompressedOopsEnabled();
    }

test/hotspot/gtest/oops/test_objArrayOop.cpp line 42:

> 40:     { 256,        false,  true,     32 }, // 20 byte header, 4 byte oops
> 41:     { 256,        true,   false,    32 }, // 16 byte header, 8 byte oops
> 42:     { 256,        true,   true,     32 }, // 16 byte header, 4 byte oops

Misses the comment ", 256-byte align"? Maybe we should at least add 16-byte alignment to cover the most frequent `ObjectAlignmentInBytes` override.

test/hotspot/jtreg/gtest/ArrayTest.java line 2:

> 1: /*
> 2:  * Copyright (c) 2022 Amazon.com Inc. or its affiliates. All rights reserved.

Here and in other files, copyright header is not in our usual form. :)

test/hotspot/jtreg/gtest/ArrayTest.java line 25:

> 23:  */
> 24: 
> 25: /*

The file should probably be called `ArrayTests.java` to match another new `ObjArrayTests.java`?

test/hotspot/jtreg/gtest/ArrayTest.java line 34:

> 32:  * @modules java.base/jdk.internal.misc
> 33:  *          java.xml
> 34:  * @run main/native GTestWrapper --gtest_filter=arrayOop::base_offset -XX:+UseCompressedClassPointers -XX:+UseCompressedOops

In all test configs here: why `arrayOop::base_offset`, and not  just `arrayOop`? This would miss running new test cases. Note that `ObjArrayTests.java` does class-only filter already.

-------------

PR Review: https://git.openjdk.org/jdk/pull/11044#pullrequestreview-1511336722
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1250985281
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1250994093
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1250998877
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251003812
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251002799
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251008534
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251010657
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251142484
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251143425
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251144713
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251147524
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251130202
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251132354
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251158897
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251156823
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251137566
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251139426
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251027722
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251018122
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251014984
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251014320
PR Review Comment: https://git.openjdk.org/jdk/pull/11044#discussion_r1251013282