RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2 [v2]

Fri Sep 5 22:08:14 UTC 2025

On Thu, 4 Sep 2025 20:11:28 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
>> 
>> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
>> 
>> For example:
>> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - nomenclature change
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into cdemotion
>  - remove trailing whitespaces
>  - remove unused instructions
>  - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2

src/hotspot/cpu/x86/assembler_x86.cpp line 13125:

> 13123:   emit_arith(op1, op2, src1, src2, second_operand_demotable);
> 13124: }
> 13125: 

This could be written something like below:

void Assembler::emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register src1, Register src2, VexSimdPrefix pre, VexOpcode opc,
                                                      InstructionAttr *attributes, int op1, int op2, bool no_flags, bool use_prefixq, bool is_commutative) {
  bool demotable = is_demotable(no_flags, dst->encoding(), src1->encoding());
  if (!demotable && is_commutative) {
      if (is_demotable(no_flags, dst->encoding(), src2->encoding())) {
        demotable = true;
        // swap src1 and src2
        Register tmp = src1;
        src1 = src2;
        src2 = tmp;
      }     
  } 
 (void)emit_eevex_prefix_or_demote_ndd(src1->encoding(), dst->encoding(), src2->encoding(), pre, opc, attributes, no_flags, use_prefixq);
  emit_arith(op1, op2, src1, src2);
}

Then we don't need extra argument in emit_arith() and emit_eevex_prefix_or_demote_ndd.

src/hotspot/cpu/x86/assembler_x86.hpp line 812:

> 810:   void emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register src1, Register src2, VexSimdPrefix pre, VexOpcode opc,
> 811:                                       InstructionAttr *attributes, int op1, int op2, bool no_flags = false, bool use_prefixq = false, bool is_commutative = false);
> 812: 

The attributes parameter could be replaced by int size and the attributes computed inside the emit_eevex_prefix_or_demote_arith_ndd. Also then no need to have use_prefixq as a separate parameter, (size == EVEX_64bit) implies use_prefixq.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2326128354
PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2325781623