RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]

Martin Doerr mdoerr at openjdk.org
Fri Aug 16 20:58:53 UTC 2024


On Fri, 16 Aug 2024 13:01:28 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123:
>> 
>>> 121:     if ((barrier_data() & G1C2BarrierPost) != 0) {
>>> 122:       __ movl($tmp2$$Register, $src$$Register);
>>> 123:       if ((barrier_data() & G1C2BarrierPostNotNull) == 0) {
>> 
>> `decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`. Also note that the latter supports specifying different src and dst registers which saves the extra move operation.
>
> Thanks for the suggestion, Martin! I have prototyped the optimization [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `RemoveRedundantNullChecks`), and in my opinion its expected benefit does not justify the additional complexity, especially since the scope is limited (in my earlier experiments, most of the stores are implemented with `g1EncodePAndStoreN` rather than `g1StoreN`, plus the optimization only applies to a specific compressed OOPs mode). I have run a few general-purpose benchmarks using a non-zero base compressed oops mode and the optimization did not yield any statistically significant improvement, but please let me know if you have any specific benchmark/configuration in mind and I can re-check.

Thanks for trying! I think I should try it on PPC64. The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. It could be that x86 is less sensitive to such optimizations.

>> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182:
>> 
>>> 180:                          $tmp2$$Register /* pre_val */,
>>> 181:                          $tmp3$$Register /* tmp */,
>>> 182:                          RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */);
>> 
>> The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required.
>
> Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right?

Exactly.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720338672
PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720340149


More information about the hotspot-compiler-dev mailing list