RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v10]

Daniel Lundén dlunden at openjdk.org
Wed Nov 12 15:59:56 UTC 2025


On Wed, 12 Nov 2025 04:00:46 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
>> 
>> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
>> 
>> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
>> 
>> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm.  Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
>> 
>> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
>> 
>> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>>  
>> **Micro:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
>> 
>> 
>> **Baseline :-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
>> 
>> **With opt:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
>> 
>> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor cleanup

> Hi @dlunde , improvements are gauged by inspecting the JIT code size. Every NDD instruction expects a 4-byte extended EVEX prefix. By demoting its to REX/REX2 prefix, we save 2-3 bytes per instruction. For example, consider the following micro kernel, with this patch, almost every NDD instruction gets the benefit of register biasing, and thus the assembler layer demotes these REX/REX2 prefixed instructions.

I am convinced your patch provides improvements in many cases. What I'm worried about is regressions. Do I understand you correctly: the patch provides, in theory, strict code size improvements without any other disadvantages? My performance testing (DaCapo 23, SPECjbb 2005, and SPECjvm 2008) indicates no regressions, so that's good.

> I have shared the details on validation configuration above

Ah, sorry, I missed that. Looks reasonable.

Great that you moved much of the logic to the AD files. Looks much cleaner.

Finally, it looks like you only partially applied my suggested changes: https://github.com/dlunde/jdk/commit/d2b511804c757c89c5662028ea9e4a9dff43b641. Please consider also applying the rest (or let me know if you disagree with them). I'll rerun testing for sanity when you have applied the final changes!

src/hotspot/cpu/x86/x86.ad line 2646:

> 2644: 
> 2645:   // Returns true for MachNode corresponding to Intel APX NDD selection patterns which
> 2646:   // can be demoted to REX/REX2 encodings, for commutative operations with register

Suggestion:

  // can be demoted to REX/REX2 encodings. For commutative operations with register

-------------

Changes requested by dlunden (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3454188127
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2518869268


More information about the hotspot-compiler-dev mailing list