[15] RFR (M): 8239008: C2: Simplify Replicate support for sub-word types on x86
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Feb 14 22:37:44 UTC 2020
> instruct ReplB_mem(vec dst, memory mem) %{
>
> Original assert was checking for AVX >2 which is avx512 you replaced it
> with avx ==2 predicate. As I understand in avx2 vpbroadcastb can
> replicate only upto 256 bits.
>
> I did not look for the rest of code. But it seems avx operated only upto
> 256 in general.
For vpbroadcastb in particular, there are 5 possible encodings [1]: 2
VEX-encoded and 3 EVEX-encoded.
Depending on whether AVX512BW is present or not,
Assembler::vpbroadcastw() chooses between VEX- or EVEX-encoded variants
[2] [3].
Also, 512-bit byte vectors are enabled only if AVX512BW is present.
AVX2 only AVX512BW+AVX512VL
AVX512F w/o AVX512BW
xmm: VEX.128.66.0F38.W0 78 /r EVEX.128.66.0F38.W0 78 /r
ymm: VEX.256.66.0F38.W0 78 /r EVEX.256.66.0F38.W0 78 /r
zmm: N/A EVEX.512.66.0F38.W0 78 /r
So, the only open question is whether
Assembler::vpbroadcastw()/Assembler::vex_prefix() are smart enough to
use VEX encoding when BW is present, but VL is not.
+AVX512BW -AVX512VL
xmm: VEX.128.66.0F38.W0 78 /r
ymm: VEX.256.66.0F38.W0 78 /r
zmm: EVEX.512.66.0F38.W0 78 /r
I'll double-check what happens in such case. Unfortunately, no such
hardware exist: KNL doesn't support BW and VL at all while SKL (and
later) support both.
> I would like to see test compiler/codegen/Test*Vect.java which verify
> these instructions in different CPU configurations.
Can you elaborate, please? compiler/codegen/Test*Vect.java and
compiler/c2/cr6340864/Test*Vect.java tests do exercise Repl* AD
instructions and I saw them catching bugs while working on the patch.
Best regards,
Vladimir Ivanov
[1]
VEX.128.66.0F38.W0 78 /r
VPBROADCASTB xmm1, xmm2/m8
AVX2
VEX.256.66.0F38.W0 78 /r
VPBROADCASTB ymm1, xmm2/m8
AVX2
EVEX.128.66.0F38.W0 78 /r
VPBROADCASTB xmm1{k1}{z}, xmm2/m8
AVX512VL AVX512BW
EVEX.256.66.0F38.W0 78 /r
VPBROADCASTB ymm1{k1}{z}, xmm2/m8
AVX512VL AVX512BW
EVEX.512.66.0F38.W0 78 /r
VPBROADCASTB zmm1{k1}{z}, xmm2/m8
AVX512BW
[2]
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/assembler_x86.cpp#l7881
[3]
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/assembler_x86.cpp#l7078
> On 2/13/20 7:55 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8239008/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8239008
>>
>> Simplify Replicate support for sub-word types on x86 based on the
>> following observations:
>> * 512-bit vectors of sub-word element types are supported only on
>> AVX512BW-capable hardware [1];
>> * VBROADCASTS[SD]/VPBROADCAST[BWDQ] are available since AVX/AVX2.
>>
>> Also, fixed asserts in VBROADCASTS[SD] according to the manual:
>> * reg-to-reg variants are part of AVX2 (while mem-to-reg are
>> introduced in AVX);
>> * VBROADCASTSD doesn't have 128-bit variant.
>>
>> Testing: hs-precheckin-comp,hs-tier1,hs-tier2
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1]
>> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1524
>>
>>
>> const int Matcher::vector_width_in_bytes(BasicType bt) {
>> ...
>> if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>> size = (VM_Version::supports_avx512bw()) ? 64 : 32;
More information about the hotspot-compiler-dev
mailing list