RFR: 8372285: G1: Micro-optimize x86 barrier code

Fri Nov 21 09:56:57 UTC 2025

On Fri, 21 Nov 2025 09:45:54 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> We know from [JDK-8372284](https://bugs.openjdk.org/browse/JDK-8372284) that G1 C2 stubs can take ~10% of total instructions. So minor optimizations in hand-written assembly pay off for code density. This PR does a little x86-specific polishing: `testptr` where possible, short forward branches where possible. I rewired some code to make it abundantly clear the branches in question are short. It also makes clear that lots of the affected methods are essentially fall-through.
>> 
>> The patch is deliberately on simpler side, so we can backport it to 25u, if need arises.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `tier1`
>>  - [ ]  Linux x86_64 server fastdebug, `all`
>
> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 131:
> 
>> 129:   __ cmpptr(addr, count);
>> 130:   __ jcc(Assembler::belowEqual, loop);
>> 131:   __ jmpb(done);
> 
> Not related to this line, but for `jcc` there is also a `jccb` variant that could be used (line 121); you actually used it in other code. Since these short jumps have a signed displacement, they can also be used for backward jumps. (E.g. in below `__jmp(next_card)`, but maybe I'm overlooking something.

Backward jumps are actually shortened automatically, because assembler knows when their offset is small. Only forward branches need this special (forward-looking) treatment: assembler has no advanced knowledge the jump can be short, so we have to tell it. This is our SOP: rely on automatic shortening where possible for backward branches, shorten the forward branches where it is obvious. 

Yes, I think we can shorten `__ jcc(Assembler::equal, is_clean_card);` too, let me try that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28446#discussion_r2549169909