RFR: 8372285: G1: Micro-optimize x86 barrier code
Thomas Schatzl
tschatzl at openjdk.org
Fri Nov 21 09:51:26 UTC 2025
On Fri, 21 Nov 2025 09:06:54 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> We know from [JDK-8372284](https://bugs.openjdk.org/browse/JDK-8372284) that G1 C2 stubs can take ~10% of total instructions. So minor optimizations in hand-written assembly pay off for code density. This PR does a little x86-specific polishing: `testptr` where possible, short forward branches where possible. I rewired some code to make it abundantly clear the branches in question are short. It also makes clear that lots of the affected methods are essentially fall-through.
>
> The patch is deliberately on simpler side, so we can backport it to 25u, if need arises.
>
> Additional testing:
> - [x] Linux x86_64 server fastdebug, `tier1`
> - [ ] Linux x86_64 server fastdebug, `all`
Changes requested by tschatzl (Reviewer).
src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 131:
> 129: __ cmpptr(addr, count);
> 130: __ jcc(Assembler::belowEqual, loop);
> 131: __ jmpb(done);
Not related to this line, but for `jcc` there is also a `jccb` variant that could be used (line 121); you actually used it in other code. Since these short jumps have a signed displacement, they can also be used for backward jumps. (E.g. in below `__jmp(next_card)`, but maybe I'm overlooking something.
src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 209:
> 207: // Jump out if done, or fall-through to runtime
> 208: __ bind(L_null);
> 209: __ jmp(L_done);
Maybe add a comment that we do not know the distance of `L_done` we use the long form or something (I assume that's the reason for not using `jmpb` here).
-------------
PR Review: https://git.openjdk.org/jdk/pull/28446#pullrequestreview-3491945514
PR Review Comment: https://git.openjdk.org/jdk/pull/28446#discussion_r2549146822
PR Review Comment: https://git.openjdk.org/jdk/pull/28446#discussion_r2549149717
More information about the hotspot-dev
mailing list