RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]

Roberto Castañeda Lozano rcastanedalo at openjdk.org
Fri Aug 16 13:06:55 UTC 2024


On Sun, 21 Jul 2024 08:27:52 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Roberto Castañeda Lozano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags
>
> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123:
> 
>> 121:     if ((barrier_data() & G1C2BarrierPost) != 0) {
>> 122:       __ movl($tmp2$$Register, $src$$Register);
>> 123:       if ((barrier_data() & G1C2BarrierPostNotNull) == 0) {
> 
> `decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`. Also note that the latter supports specifying different src and dst registers which saves the extra move operation.

Thanks for the suggestion, Martin! I have prototyped the optimization [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `RemoveRedundantNullChecks`), and in my opinion its expected benefit does not justify the additional complexity, especially since the scope is limited (in my earlier experiments, most of the stores are implemented with `g1EncodePAndStoreN` rather than `g1StoreN`, plus the optimization only applies to a specific compressed OOPs mode). I have run a few general-purpose benchmarks using a non-zero base compressed oops mode and the optimization did not yield any statistically significant improvement, but please let me know if you have any specific benchmark/configuration in mind and I can re-check.

> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182:
> 
>> 180:                          $tmp2$$Register /* pre_val */,
>> 181:                          $tmp3$$Register /* tmp */,
>> 182:                          RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */);
> 
> The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required.

Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right?

> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301:
> 
>> 299:                          RegSet::of($mem$$Register, $newval$$Register) /* preserve */);
>> 300:     __ movq($tmp1$$Register, $newval$$Register);
>> 301:     __ xchgq($newval$$Register, Address($mem$$Register, 0));
> 
> Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice.

Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719811953
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719812882
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719814312


More information about the hotspot-compiler-dev mailing list