RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17]

Kim Barrett kbarrett at openjdk.org
Sat Sep 7 04:15:14 UTC 2024


On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail.
>> 
>> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold:
>> 
>> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and
>> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work.
>> 
>> ## Summary of the Changes
>> 
>> ### Platform-Independent Changes (`src/hotspot/share`)
>> 
>> These consist mainly of:
>> 
>> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early;
>> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and
>> - temporary support for porting the JEP to the remaining platforms.
>> 
>> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed.
>> 
>> ### Platform-Dependent Changes (`src/hotspot/cpu`)
>> 
>> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms.
>> 
>> #### ADL Changes
>> 
>> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy.
>> 
>> #### `G1BarrierSetAssembler` Changes
>> 
>> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ...
>
> Roberto Castañeda Lozano has updated the pull request incrementally with one additional commit since the last revision:
> 
>   s390 port : late barrier expansion

I've reviewed the non-compiler GC changes.  I've looked over the compiler changes,
but can't claim to have reviewed them.  I've also reviewed the x64 changes, and
looked over the aarch64 changes.

src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176:

> 174:   __ jcc(Assembler::zero, runtime);                           // jump to runtime if index == 0 (full buffer)
> 175:   // The buffer is not full, store value into it.
> 176:   __ subptr(temp, wordSize);                                  // temp := next index

Instead of 

  __ testptr(temp, temp);
  __ jcc(Assembler::zero, runtime);
  __ subptr(temp, wordSize);

it seems like this might be better

  __ subptr(temp, wordSize);
  __ jcc(Assembler::below, runtime);

I think the code in the PR matches what the early expansion generates, so I think a change here
can be deferred to a followup.

src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354:

> 352:   __ bind(runtime);
> 353:   // save the live input values
> 354:   RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread));

I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here.
Also not sure why we're saving `thread` here for 32bit platforms.
Something to think about for the future.  Though maybe the 32bit case will be gone by then :)

src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112:

> 110:           // The answer is that stores of different sizes can co-exist
> 111:           // in the same sequence of RawMem effects.  We sometimes initialize
> 112:           // a whole 'tile' of array elements with a single jint or jlong.)

I'm having trouble making sense of this comment.  I guess a jlong could be used to null-initialize two
32bit oops/narrowOops?  But that doesn't have anything to do with jints.

-------------

Marked as reviewed by kbarrett (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2287188386
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747741376
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747824868
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747898995


More information about the hotspot-compiler-dev mailing list