RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27]

Sun May 18 01:43:05 UTC 2025

On Fri, 16 May 2025 17:42:18 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source.
>> 
>> For example:
>> 
>> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update x86-asmtest.py to enable demotion by default and make test generation optional

Hi @vamsi-parasa ,
Kindly accomodate some re-factoring suggestions, overall patch looks good to me.

Best Regards,
Jatin

src/hotspot/cpu/x86/assembler_x86.cpp line 1657:

> 1655: void Assembler::eandl(Register dst, Register src1, Address src2, bool no_flags) {
> 1656:   InstructionMark im(this);
> 1657:   evex_prefix_int8_operand(dst, src1, src2, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0x23, no_flags);

Nomenclature of routine is not very clear here, why do you mean by int8 operand, this is operating over 32bit word.

src/hotspot/cpu/x86/assembler_x86.cpp line 6809:

> 6807: 
> 6808: void Assembler::eshldl(Register dst, Register src1, Register src2, int8_t imm8, bool no_flags) {
> 6809:   evex_opcode_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xA4, no_flags, true /* is_map1 */);

Here the opcode is set to 0xA4 which is correct for demotion case, in case we don't demote then opcode is 0x24.  This is only relevant to NDD flavours of shldl/shldq with imm8 shift values. I will suggest adding a comment here giving clear explanation as it not intiutive at first glance and manul clearly specify 0x24 as the opcode.

src/hotspot/cpu/x86/assembler_x86.cpp line 6827:

> 6825: 
> 6826: void Assembler::eshrdl(Register dst, Register src1, Register src2, int8_t imm8, bool no_flags) {
> 6827:   evex_opcode_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xAC, no_flags, true /* is_map1 */);

Suggestion:

  emit_eevex_or_demote(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xAC, no_flags, true /* is_map1 */);

Above nomenclature looks more appropriate here.

src/hotspot/cpu/x86/assembler_x86.cpp line 6935:

> 6933:   InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);
> 6934:   attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit);
> 6935:   evex_prefix_ndd(src, dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, &attributes, no_flags);

To comply with existing convention like below
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.hpp#L762

We should use eevex instead of evex as the prefix

src/hotspot/cpu/x86/assembler_x86.cpp line 12857:

> 12855: 
> 12856: void Assembler::evex_prefix_int8_operand(Register dst, Register src1, Address src2, VexSimdPrefix pre, VexOpcode opc,
> 12857:                                          int size, int b1, bool no_flags, bool is_map1) {

Suggestion:

                                         int size, int b1, bool no_flags, bool is_map1) {

Suggestion:

                                         int size, int opcode_byte, bool no_flags, bool is_map1) {

src/hotspot/cpu/x86/assembler_x86.cpp line 12942:

> 12940: }
> 12941: 
> 12942: void Assembler::evex_opcode_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, int8_t imm8, VexSimdPrefix pre, VexOpcode opc,

Only difference b/w this method and one below is that it accepts an immediate shift count and modifies the opcode if we domete the instruction to legacy / REX2 variant.

Demotion logic and rest of the logic is exaclty same.  Should we merge these into one and then based on the incoming opcode i.e. if its 0x24 or 0x2C we chonsider immediate shift and associated opcode pruning if demoted.

src/hotspot/cpu/x86/assembler_x86.cpp line 12948:

> 12946:     int encode = is_prefixq ? prefixq_and_encode(src_enc, dst_enc, is_map1) : prefix_and_encode(src_enc, dst_enc, is_map1);
> 12947:     emit_opcode_prefix_and_encoding((unsigned char)byte1, 0xC0, encode, imm8);
> 12948:   } else {

FTR, existing demotion w.r.t to first operand is safe for all kinds to instructions, for commutative instructions, add, mul, xor, and, or, max , min etc, we can check against the second operand by passing is_commutative flags from top level assembler instruction. I am ok to handle this as part of https://bugs.openjdk.org/browse/JDK-8354348

src/hotspot/cpu/x86/assembler_x86.cpp line 12958:

> 12956: void Assembler::evex_opcode_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc,
> 12957:                                               int size, int byte1, bool no_flags, bool is_map1) {
> 12958:   bool is_prefixq = (size == EVEX_64bit);

Nit pick, on line https://github.com/openjdk/jdk/pull/24431/files#diff-e3576e9c22db89236cdb906f032ff00748ff6d1c21b05277d991d80af75daf3aR12944 
you are using a conditional operator to select b/w true and false., Lets follow one convention.

src/hotspot/cpu/x86/assembler_x86.cpp line 12962:

> 12960:     if (size == EVEX_16bit) {
> 12961:       emit_int8(0x66);
> 12962:     }

I cannot find a caller that passes EVEX_16bit for the size argument.

src/hotspot/cpu/x86/assembler_x86.cpp line 12973:

> 12971: }
> 12972: 
> 12973: void Assembler::evex_opcode_prefix_and_encode_swap(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc,

Can we also not unify this one with evex_opcode_prefix_and_encode by passing additional swap argument

src/hotspot/cpu/x86/assembler_x86.cpp line 12991:

> 12989: 
> 12990: int Assembler::evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc,
> 12991:                                           InstructionAttr *attributes, bool no_flags, bool use_prefixq) {

evex_prefix_and_encode_ndd => emit_eevex_prefix_or_demote_ndd

Naming suggestion.

src/hotspot/cpu/x86/assembler_x86.cpp line 13002:

> 13000: }
> 13001: 
> 13002: int Assembler::evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc,

Suggestion:

int Assembler::emit_eevex_prefix_or_demote_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc,

Nameing suggetion

src/hotspot/cpu/x86/assembler_x86.cpp line 13024:

> 13022: }
> 13023: 
> 13024: void Assembler::evex_prefix_arith(Register dst, Register nds, int32_t imm32, VexSimdPrefix pre, VexOpcode opc,

Suggestion:

void Assembler::emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register nds, int32_t imm32, VexSimdPrefix pre, VexOpcode opc,

-------------

PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2847341916
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094104970
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094296385
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094297820
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2093533374
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094299275
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094299023
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2093497917
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2093540188
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094102193
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094300060
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094300860
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094300471
PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094301333