RFR: 8283232: x86: Improve vector broadcast operations [v12]
Quan Anh Mai
duke at openjdk.org
Fri Jul 29 13:48:13 UTC 2022
On Fri, 29 Jul 2022 05:24:19 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>>
>> unnecessary TEMP dst
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1651:
>
>> 1649: case 32: vmovdqu(dst, src); break;
>> 1650: case 64: evmovdqul(dst, src, Assembler::AVX_512bit); break;
>> 1651: default: ShouldNotReachHere();
>
> No change in this file, may be you can remove it from change set.
Since I added the method `C2_MacroAssembler::load_constant_vector` near here anyway I think this style change can be kept.
> src/hotspot/cpu/x86/x86.ad line 4159:
>
>> 4157:
>> 4158: instruct vReplS_reg(vec dst, rRegI src) %{
>> 4159: predicate(UseAVX >= 2);
>
> Can be folded with below pattern, by pushing predicate into encoding block.
Aligning the predicate of the reg and the mem version allows the adlc parser to recognise their relationship and during register allocation can substitute a reg operation with a spilt operand with its corresponding mem node. You can see in the generated code the reg node has specific methods such as `cisc_operand` and `cisc_version`
> src/hotspot/cpu/x86/x86.ad line 4253:
>
>> 4251: int vlen_enc = vector_length_encoding(this);
>> 4252: if (VM_Version::supports_avx()) {
>> 4253: __ vbroadcastss($dst$$XMMRegister, addr, vlen_enc);
>
> Emitting vbroadcastss for all the vector sizes for Replicate[B/S/I] may result into domain switch over penalty, can be limited to only <=16 bytes replications and above that we can emit VPBROADCASTD.
Got it
> src/hotspot/cpu/x86/x86.ad line 4261:
>
>> 4259: __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
>> 4260: }
>> 4261: }
>
> Please move into a new macro-assembly routine.
Done
-------------
PR: https://git.openjdk.org/jdk/pull/7832
More information about the hotspot-compiler-dev
mailing list