RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v42]
Volodymyr Paprotski
duke at openjdk.org
Tue Oct 15 22:44:35 UTC 2024
On Tue, 15 Oct 2024 10:47:55 GMT, Roman Kennke <rkennke at openjdk.org> wrote:
>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>>
>> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.
>>
>> Main changes:
>> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
>> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
>> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
>> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
>> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
>> - Arrays will now store their length at offset 8.
>> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co...
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix aarch64.ad
Finished reviewing `src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp`, line by line and comparing old snippets that got merged into the new function: looks good to me, every (new) case handled
Only have some minor comments about comments.
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 414:
> 412: // to the valid haystack bytes on the stack.
> 413: {
> 414: const Register haystack = rbx;
Keep `rax` as index for clarity? Although it is really used as a temp..
const Register index = rax;
const Register haystack = rbx;
copy_to_stack(haystack, haystack_len, false, index , XMM_TMP1, _masm);
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1568:
> 1566: assert((COPIED_HAYSTACK_STACK_SIZE == 64), "Must be 64!");
> 1567:
> 1568: // Copy incoming haystack onto stack
Old comment was slightly more precise. Move here. i.e.
`// Copy incoming haystack onto stack (haystack <= 32 bytes)`
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1634:
> 1632:
> 1633:
> 1634: // Copy the small (< 32 byte) haystack to the stack. Allows for vector reads without page fault
Just to be pedantic, its `(<=32)` - this function also handles 32bytes case.
- line 401:
__ cmpq(haystack_len, 0x20);
__ ja(L_bigSwitchTop);
- though line 293 (`highly_optimized_short_cases`) only seems to route16-byte cases here:
```__ cmpq(haystack_len_p, isU ? 8 : 16);```
src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1659:
> 1657: Label L_moreThan8, L_moreThan16, L_moreThan24, L_adjustHaystack;
> 1658:
> 1659: assert(arrayOopDesc::base_offset_in_bytes(isU ? T_CHAR : T_BYTE) >= 8,
If we had to also optimize for header-size 16, it might be possible to remove one jump here. Looks correct for either size.
-------------
PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2370735887
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802041876
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802044880
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802088545
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802073195
More information about the hotspot-gc-dev
mailing list