RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]

Fri Aug 16 21:05:51 UTC 2024

On Fri, 16 Aug 2024 13:03:51 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301:
>> 
>>> 299:                          RegSet::of($mem$$Register, $newval$$Register) /* preserve */);
>>> 300:     __ movq($tmp1$$Register, $newval$$Register);
>>> 301:     __ xchgq($newval$$Register, Address($mem$$Register, 0));
>> 
>> Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice.
>
> Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost.

Note that we had such an optimization already in C2: https://github.com/openjdk/jdk8u-dev/blob/4106121e0ae42d644e45c6eab9037874110ed670/hotspot/src/share/vm/opto/library_call.cpp#L3114
But, it's probably not a big deal. Maybe I can try it on PPC64 which may be more sensitive to accesses on contended memory.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720347473