[15] RFR (M): 8239008: C2: Simplify Replicate support for sub-word types on x86
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Feb 14 23:04:19 UTC 2020
>> instruct ReplB_mem(vec dst, memory mem) %{
>>
>> Original assert was checking for AVX >2 which is avx512 you replaced
>> it with avx ==2 predicate. As I understand in avx2 vpbroadcastb can
>> replicate only upto 256 bits.
>>
>> I did not look for the rest of code. But it seems avx operated only
>> upto 256 in general.
>
> For vpbroadcastb in particular, there are 5 possible encodings [1]: 2
> VEX-encoded and 3 EVEX-encoded.
>
> Depending on whether AVX512BW is present or not,
> Assembler::vpbroadcastw() chooses between VEX- or EVEX-encoded variants
> [2] [3].
>
> Also, 512-bit byte vectors are enabled only if AVX512BW is present.
>
> AVX2 only AVX512BW+AVX512VL
> AVX512F w/o AVX512BW
> xmm: VEX.128.66.0F38.W0 78 /r EVEX.128.66.0F38.W0 78 /r
> ymm: VEX.256.66.0F38.W0 78 /r EVEX.256.66.0F38.W0 78 /r
> zmm: N/A EVEX.512.66.0F38.W0 78 /r
>
>
> So, the only open question is whether
> Assembler::vpbroadcastw()/Assembler::vex_prefix() are smart enough to
> use VEX encoding when BW is present, but VL is not.
>
> +AVX512BW -AVX512VL
> xmm: VEX.128.66.0F38.W0 78 /r
> ymm: VEX.256.66.0F38.W0 78 /r
> zmm: EVEX.512.66.0F38.W0 78 /r
Assembler::vex_prefix() does cover such case [1].
And existing register masks properly handle legacy cases (VEX-encoded).
operand vecX() %{
constraint(ALLOC_IN_RC(vectorx_reg_vlbwdq));
reg_class_dynamic vectorx_reg_vlbwdq(vectorx_reg_evex,
vectorx_reg_legacy, %{ VM_Version::supports_avx512vlbwdq() %} );
operand vecY() %{
constraint(ALLOC_IN_RC(vectory_reg_vlbwdq));
reg_class_dynamic vectory_reg_vlbwdq(vectory_reg_evex,
vectory_reg_legacy, %{ VM_Version::supports_avx512vlbwdq() %} );
operand vecZ() %{
constraint(ALLOC_IN_RC(vectorz_reg));
reg_class_dynamic vectorz_reg (vectorz_reg_evex, vectorz_reg_legacy,
%{ VM_Version::supports_evex() %} );
Best regards,
Vladimir Ivanov
[1]
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/assembler_x86.cpp#l7893
> I'll double-check what happens in such case. Unfortunately, no such
> hardware exist: KNL doesn't support BW and VL at all while SKL (and
> later) support both.
>
>> I would like to see test compiler/codegen/Test*Vect.java which verify
>> these instructions in different CPU configurations.
>
> Can you elaborate, please? compiler/codegen/Test*Vect.java and
> compiler/c2/cr6340864/Test*Vect.java tests do exercise Repl* AD
> instructions and I saw them catching bugs while working on the patch.
>
> Best regards,
> Vladimir Ivanov
>
> [1]
> VEX.128.66.0F38.W0 78 /r
> VPBROADCASTB xmm1, xmm2/m8
> AVX2
>
> VEX.256.66.0F38.W0 78 /r
> VPBROADCASTB ymm1, xmm2/m8
> AVX2
>
> EVEX.128.66.0F38.W0 78 /r
> VPBROADCASTB xmm1{k1}{z}, xmm2/m8
> AVX512VL AVX512BW
>
> EVEX.256.66.0F38.W0 78 /r
> VPBROADCASTB ymm1{k1}{z}, xmm2/m8
> AVX512VL AVX512BW
>
> EVEX.512.66.0F38.W0 78 /r
> VPBROADCASTB zmm1{k1}{z}, xmm2/m8
> AVX512BW
>
> [2]
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/assembler_x86.cpp#l7881
>
>
> [3]
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/assembler_x86.cpp#l7078
>
>
>> On 2/13/20 7:55 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8239008/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8239008
>>>
>>> Simplify Replicate support for sub-word types on x86 based on the
>>> following observations:
>>> * 512-bit vectors of sub-word element types are supported only on
>>> AVX512BW-capable hardware [1];
>>> * VBROADCASTS[SD]/VPBROADCAST[BWDQ] are available since AVX/AVX2.
>>>
>>> Also, fixed asserts in VBROADCASTS[SD] according to the manual:
>>> * reg-to-reg variants are part of AVX2 (while mem-to-reg are
>>> introduced in AVX);
>>> * VBROADCASTSD doesn't have 128-bit variant.
>>>
>>> Testing: hs-precheckin-comp,hs-tier1,hs-tier2
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1]
>>> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1524
>>>
>>>
>>> const int Matcher::vector_width_in_bytes(BasicType bt) {
>>> ...
>>> if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>>> size = (VM_Version::supports_avx512bw()) ? 64 : 32;
More information about the hotspot-compiler-dev
mailing list