RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]

Mon Aug 26 07:26:08 UTC 2024

On Fri, 23 Aug 2024 13:28:03 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> OK, thanks. I just ran some benchmarks with zero-based OOP compression ([prototype here](https://github.com/robcasloz/jdk/tree/JDK-8334060-g1-late-barrier-expansion-x64-optimizations)) and could not observe any significant performance effect on three different x64 implementations. I think I will keep the `g1StoreN` implementation as-is in the x64 and aarch64 backends, for simplicity. Again, we can revisit this in follow-up work if need be.
>
> I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`:
> https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476
> This has 2 advantages:
> - Reduce replicated code in the .ad file.
> - Make the discussed optimization easy. Please take a look.

Great that you already have an experimental port! Thanks for the heads-up, I agree that the OOP decoding + null check fusion becomes less intrusive, but I still prefer the current decoupled implementation for x64 and aarch64 (even simpler, IMO). In the benchmarks I have run (admittedly, only on x64), I could not observe any positive effect, whereas I found a slight regression in one case using zero-based OOP compression. I have not investigated further, but I wonder if hoisting the null check above the region-crossing test could have a negative impact on branch predictability.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730806021