[15] RFR (M): 8239008: C2: Simplify Replicate support for sub-word types on x86

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Feb 14 22:37:44 UTC 2020


>   instruct ReplB_mem(vec dst, memory mem) %{
> 
> Original assert was checking for AVX >2 which is avx512 you replaced it 
> with avx ==2 predicate. As I understand in avx2 vpbroadcastb can 
> replicate only upto 256 bits.
> 
> I did not look for the rest of code. But it seems avx operated only upto 
> 256 in general.

For vpbroadcastb in particular, there are 5 possible encodings [1]: 2 
VEX-encoded and 3 EVEX-encoded.

Depending on whether AVX512BW is present or not, 
Assembler::vpbroadcastw() chooses between VEX- or EVEX-encoded variants 
[2] [3].

Also, 512-bit byte vectors are enabled only if AVX512BW is present.

              AVX2 only                     AVX512BW+AVX512VL 

            AVX512F w/o AVX512BW
    xmm: VEX.128.66.0F38.W0 78 /r   EVEX.128.66.0F38.W0 78 /r
    ymm: VEX.256.66.0F38.W0 78 /r   EVEX.256.66.0F38.W0 78 /r
    zmm: N/A                        EVEX.512.66.0F38.W0 78 /r


So, the only open question is whether 
Assembler::vpbroadcastw()/Assembler::vex_prefix() are smart enough to 
use VEX encoding when BW is present, but VL is not.

            +AVX512BW -AVX512VL
    xmm: VEX.128.66.0F38.W0 78 /r
    ymm: VEX.256.66.0F38.W0 78 /r
    zmm: EVEX.512.66.0F38.W0 78 /r

I'll double-check what happens in such case. Unfortunately, no such 
hardware exist: KNL doesn't support BW and VL at all while SKL (and 
later) support both.

> I would like to see test compiler/codegen/Test*Vect.java which verify 
> these instructions in different CPU configurations.

Can you elaborate, please? compiler/codegen/Test*Vect.java and 
compiler/c2/cr6340864/Test*Vect.java tests do exercise Repl* AD 
instructions and I saw them catching bugs while working on the patch.

Best regards,
Vladimir Ivanov

[1]
VEX.128.66.0F38.W0 78 /r
VPBROADCASTB xmm1, xmm2/m8
AVX2

VEX.256.66.0F38.W0 78 /r
VPBROADCASTB ymm1, xmm2/m8
AVX2

EVEX.128.66.0F38.W0 78 /r
VPBROADCASTB xmm1{k1}{z}, xmm2/m8
AVX512VL AVX512BW

EVEX.256.66.0F38.W0 78 /r
VPBROADCASTB ymm1{k1}{z}, xmm2/m8
AVX512VL AVX512BW

EVEX.512.66.0F38.W0 78 /r
VPBROADCASTB zmm1{k1}{z}, xmm2/m8
AVX512BW

[2] 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/assembler_x86.cpp#l7881

[3] 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/assembler_x86.cpp#l7078

> On 2/13/20 7:55 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8239008/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8239008
>>
>> Simplify Replicate support for sub-word types on x86 based on the 
>> following observations:
>>    * 512-bit vectors of sub-word element types are supported only on 
>> AVX512BW-capable hardware [1];
>>    * VBROADCASTS[SD]/VPBROADCAST[BWDQ] are available since AVX/AVX2.
>>
>> Also, fixed asserts in VBROADCASTS[SD] according to the manual:
>>    * reg-to-reg variants are part of AVX2 (while mem-to-reg are 
>> introduced in AVX);
>>    * VBROADCASTSD doesn't have 128-bit variant.
>>
>> Testing: hs-precheckin-comp,hs-tier1,hs-tier2
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] 
>> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86.ad#l1524 
>>
>>
>> const int Matcher::vector_width_in_bytes(BasicType bt) {
>> ...
>>    if (UseAVX > 2 && (bt == T_BYTE || bt == T_SHORT || bt == T_CHAR))
>>      size = (VM_Version::supports_avx512bw()) ? 64 : 32;


More information about the hotspot-compiler-dev mailing list