RFR: 8334060: Implementation of Late Barrier Expansion for G1

Tue Jun 18 18:40:13 UTC 2024

On Tue, 18 Jun 2024 10:44:17 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 328:
> 
>> 326:   __ jcc(is_single_region, done);
>> 327: 
>> 328:   Assembler::Condition is_new_val_null = generate_new_val_null_test(masm, new_val);
> 
> I actually kind of like the previous style that (almost) all "asm" is at the same level -- the newly introduced helper functions hinder the flow, IMO.

Note that if we want to optimize the barrier code layout (see the [JEP description](https://openjdk.org/jeps/475), *Candidate optimizations* sub-section), splitting the assembly of each barrier in at least two blocks is necessary, since we need to separate the inline from the out-of-line (barrier stub) code. And since the assembly code has to be split into multiple functions anyway, I think it makes sense to group the code by logical blocks (different barrier tests, queue insertion, etc.), as proposed in this changeset. This also improves code reuse, e.g. the same `generate_queue_insertion` implementation is used for the pre- and post-barriers.
If you still think there is value in grouping together the blocks that can be grouped together (e.g. `generate_single_region_test` + `generate_new_val_null_test` + `generate_card_young_test`), I can prototype the refactoring and let the G1 maintainers decide which alternative is more readable/maintainable.

> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 479:
> 
>> 477: 
>> 478:   if (g1_can_remove_pre_barrier(kit, &kit->gvn(), adr, access.type(), adr_idx)) {
>> 479:     barriers ^= G1C2BarrierPre;
> 
> Is it possible to rewrite this method to remove `^=`? This method first construct both pre/post barriers and remove if needed. I wonder if the logic will be cleaner if we track if pre/post barrier can be removed using two `bool` and construct the actual barrier in the end using two `bool` -- this way, we can avoid mutating the `barriers` var.

Thanks for the suggestion, I will try it out.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644895421
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644896332