RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v15]

Sat Nov 22 04:56:52 UTC 2025

On Sat, 22 Nov 2025 02:59:21 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
>> 
>> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
>> 
>> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
>> 
>> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm.  Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
>> 
>> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
>> 
>> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>>  
>> **Micro:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
>> 
>> 
>> **Baseline :-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
>> 
>> **With opt:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
>> 
>> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated

src/hotspot/share/opto/chaitin.cpp line 1522:

> 1520:   uint copy_lrg = _lrg_map.find(lrg._copy_bias);
> 1521:   OptoReg::Name reg = select_bias_lrg_color(lrg, copy_lrg);
> 1522:   if (reg != OptoReg::Bad) {

Please, use `OptoReg::is_valid(reg)` here. I find it more readable.

Also, there's repetitive pattern for `lrg._copy_bias` and `lrg._copy_bias2`. Would be nice to hide it behind a single `select_bias_lrg_color(_lrg_map, lrg)` call.

src/hotspot/share/opto/chaitin.cpp line 1661:

> 1659:     }
> 1660: 
> 1661:     Node* def = lrg->_def;

I'm concerned about the approach chosen here. It iterates over all operands trying to find a candidate for biasing irrespective of the shape of Mach node.

Instead, I'd be much more comfortable with 2 operand probes at fixed positions (ideally, at indices 1 and 2). Any mismatches in Mach node shape should be reported. In other words, any failed operand probe on a mach node marked with `Flag_ndd_demotable` or `Flag_ndd_demotable_commutative` should trigger an assert. (Corresponding AD instructions can be adjusted to fit the desired pattern.)

src/hotspot/share/opto/chaitin.cpp line 1663:

> 1661:     Node* def = lrg->_def;
> 1662:     MachNode* mdef = lrg->is_singledef() && !lrg->_is_bound && def->is_Mach() ? def->as_Mach() : nullptr;
> 1663:     if (mdef != nullptr) {

Please, reshape it as follows:

if (lrg->is_singledef() && !lrg->_is_bound && def->is_Mach()) {
  MachNode* mdef = def->as_Mach();

src/hotspot/share/opto/chaitin.cpp line 1665:

> 1663:     if (mdef != nullptr) {
> 1664:       int i = 1;
> 1665:       uint lrg_def = _lrg_map.find(def);

The whole block can be guarded by `lrg->_copy_bias == 0` condition.

src/hotspot/share/opto/chaitin.cpp line 1667:

> 1665:       uint lrg_def = _lrg_map.find(def);
> 1666:       for (; i < mdef->num_opnds(); i++) {
> 1667:         if (Matcher::is_register_biasing_candidate(mdef, 1, i)) {

`_copy_bias` and `_copy_bias2` initialization code is mostly a duplication. Please, extract it into a helper function.

src/hotspot/share/opto/chaitin.cpp line 1681:

> 1679:       // For commutative operation, def allocation can also be
> 1680:       // biased towards LRG of second input's def.
> 1681:       for (; i < mdef->num_opnds(); i++) {

Same here (`lrg->_copy_bias2 == 0`).

src/hotspot/share/opto/chaitin.cpp line 1686:

> 1684:           if (in2 != nullptr) {
> 1685:             uint lrg_in2 = _lrg_map.find(in2);
> 1686:             if (lrg_in2 != 0 && lrg->_copy_bias == 0 && !_ifg->test_edge_sq(lrg_def, lrg_in2)) {

Do you have a typo here? (`s/_copy_bias/_copy_bias2/`)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552100990
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552064851
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552067802
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2551966801
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2552072823
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2551968813
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2551970868