RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v18]

Vladimir Ivanov vlivanov at openjdk.org
Mon Nov 24 20:57:25 UTC 2025


On Sun, 23 Nov 2025 03:15:17 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
>> 
>> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
>> 
>> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
>> 
>> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm.  Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
>> 
>> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
>> 
>> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>>  
>> **Micro:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
>> 
>> 
>> **Baseline :-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
>> 
>> **With opt:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
>> 
>> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated.

Thanks, Jatin. Overall, looks good.

src/hotspot/cpu/x86/x86.ad line 2641:

> 2639:   }
> 2640: 
> 2641:   if (mdef->num_opnds() <= oper_index || mdef->operand_index(oper_index) < 0 ||

IMO all 4 checks (plus, `mdef->in(mdef->operand_index(oper_index)) != null`) can be considered as structural invariants for mach nodes marked as NDD demotable. So, I'd like to see an assert ensuring they don't fail for NDD-demotable nodes.

Speaking of the code shape, I suggest to restore previous shape with explicit checks before the switch and add asserts before returning negative result:

static boolean is_ndd_demotable() {
  return ((mdef->flags() & Node::PD::Flag_ndd_demotable) != 0) ||
         ((mdef->flags() & Node::PD::Flag_ndd_demotable_commutative) != 0));
}

bool Matcher::is_register_biasing_candidate(const MachNode* mdef, int oper_index) {
  if (mdef == nullptr) {
    return false;
  }

  if (mdef->num_opnds() <= oper_index || 
      mdef->operand_index(oper_index) < 0 ||
      mdef->in(mdef->operand_index(oper_index)) != nullptr) {
    assert(!is_ndd_demotable(mdef), "%s", mdef->Name());
    return false;
  }

  // Complex memory operand covers multiple incoming edges needed for
  // address computation, biasing def towards any address component will not
  // result into NDD demotion by assembler.
  if (mdef->operand_num_edges(oper_index) != 1) {
    assert(!is_ndd_demotable(mdef), "%s", mdef->Name());
    return false;
  }

  // Demotion candidate must be register mask compatible with definition.
  const RegMask& oper_mask = mdef->in_RegMask(mdef->operand_index(oper_index));
  if (!oper_mask.overlap(mdef->out_RegMask())) {
    assert(!is_ndd_demotable(mdef), "%s", mdef->Name());
    return false;
  }

  switch (oper_index) {
    // First operand of MachNode corresponding to Intel APX NDD selection
    // pattern can share its assigned register with definition operand if
    // their live ranges do not overlap. In such a scenario we can demote
    // it to legacy map0/map1 instruction by replacing its 4-byte extended
    // EVEX prefix with shorter REX/REX2 encoding. Demotion candidates
    // are decorated with a special flag by instruction selector.
    case 1: return is_ndd_demotable(mdef);

    // Definition operand of commutative operation can be biased towards second operand.
    case 2: return (mdef->flags() & Node::PD::Flag_ndd_demotable_commutative) != 0;

    // Current scheme only selects up to two biaising candidates
    default: assert(false, "unhandled operand index: %s", mdef->Name()); break; 
  }

  return false;
}

src/hotspot/share/opto/chaitin.cpp line 1475:

> 1473: 
> 1474: OptoReg::Name PhaseChaitin::select_bias_lrg_color(LRG &lrg) {
> 1475:   uint bias_lrg1_idx = lrg._copy_bias;

Do I get it right that `_copy_bias2 != 0` implies `_copy_bias != 0`? Can you enforce it? (Here and in `PhaseChaitin::bias_color` where `_copy_bias2` is initialized.)

src/hotspot/share/opto/chaitin.cpp line 1498:

> 1496:     // the chances of register sharing once the bias live range
> 1497:     // becomes the part of IFG.
> 1498:     uint bias_lrg = lrgs(bias_lrg1_idx).degrees_of_freedom() >

How does it work when `bias_lrg1_idx` or `bias_lrg2_idx` indices are 0? Original code had a guard against such condition.

src/hotspot/share/opto/chaitin.cpp line 1538:

> 1536: 
> 1537:   // Try biasing the color with non-interfering bias live range[s].
> 1538:   if (lrg._copy_bias != 0 || lrg._copy_bias2 != 0) {

IMO you can drop `(lrg._copy_bias != 0 || lrg._copy_bias2 != 0)` guard. Original code didn't check it and there are enough guard in `select_bias_lrg_color` to catch it.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3502091982
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557677038
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557617768
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557609213
PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2557605287


More information about the hotspot-compiler-dev mailing list