RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2]

Mon Aug 26 07:46:06 UTC 2024

On Mon, 26 Aug 2024 07:23:40 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`:
>> https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476
>> This has 2 advantages:
>> - Reduce replicated code in the .ad file.
>> - Make the discussed optimization easy. Please take a look.
>
> Great that you already have an experimental port! Thanks for the heads-up, I agree that the OOP decoding + null check fusion becomes less intrusive, but I still prefer the current decoupled implementation for x64 and aarch64 (even simpler, IMO). In the benchmarks I have run (admittedly, only on x64), I could not observe any positive effect, whereas I found a slight regression in one case using zero-based OOP compression. I have not investigated further, but I wonder if hoisting the null check above the region-crossing test could have a negative impact on branch predictability.

It can be implemented like this:

- If oop decoding requires a null check, redirect the branch to jump over the barrier code.
- Else insert the null check after the region crossing check.

This way, I don't see how it can have a negative effect. But I leave you free to decide about x86 and aarch64. Optimizations could be done later if needed as you already mentioned.

Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730832653