RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v6]
Jatin Bhateja
jbhateja at openjdk.org
Tue Oct 21 12:24:06 UTC 2025
On Tue, 21 Oct 2025 12:17:02 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
>>
>> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
>>
>> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
>>
>> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
>>
>> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
>>
>> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>>
>> **Micro:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
>>
>>
>> **Baseline :-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
>>
>> **With opt:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
>>
>> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
>
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016
> - Limiting register biasing to NDD specific demotable instructions
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016
> - Fix jtreg, one less spill
> - Updating as per reivew suggestions
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016
> - Some refactoring
> - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions
Current scheme of validation is manual:-
1) Revert https://github.com/openjdk/jdk/pull/27320, since SDE 9.58 does not support APX_NCI_NDD_NF flag yet.
2) Static register allocation ordering change in x86_64.ad to always perference to EGPR R16-R31 during allocation.
3) Register allocation biasing facilitates demotion, which happens in the assembler layer.
4) Added debug messages in demotable assembler routines.
5) Inspected the assembler encoding in Intel xed64
6) Ran following tests with -XX:-UseSuperWord to exercise various NDD demotable instructions with Intel SDE 9.58.
- test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java
- test/hotspot/jtreg/compiler/c2/cr6340864/TestLongVect.java
**By limiting the scope of the fix to NDD specific instructions we have now mitigated any unwanted performance side effects on any other backend OR non-APX x86 backend.**
We do have existing tests in place for functional correctness of NDD assembler instructions https://github.com/openjdk/jdk/blob/master/test/hotspot/gtest/x86/x86-asmtest.py
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3426307551
More information about the hotspot-compiler-dev
mailing list