RFR: 8320347: Emulate vblendvp[sd] on ECore [v5]

Sat Nov 25 00:40:09 UTC 2023

On Fri, 24 Nov 2023 17:23:28 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote:

>> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3601:
>> 
>>> 3599:     if (compute_mask) {
>>> 3600:       vpxor(scratch, scratch, scratch, vector_len);
>>> 3601:       vpcmpgtq(scratch, scratch, mask, vector_len);
>> 
>> I see assertion failures in following tests with JAVA_OPTIONS= -XX:UseAVX=1 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -Xbatch
>> 
>> compiler/c2/cr6340864/TestDoubleVect.java
>> compiler/loopopts/superword/ReductionPerf.java
>> compiler/vectorization/TestSignumVector.java
>> compiler/vectorization/runner/BasicDoubleOpTest.java
>> 
>> AVX1 does not support integral vectors above 16 bytes, please use floating point compare instruction.
>
> Hmm. Good catch!
> 
> Thinking about AVX1 case some more.. Platforms where this `vpblendvp*` emulation is needed have AVX2 at least, otherwise vpblendvp is faster. I think its better to disable this optimization entirely if AVX1 is required to be used. 
> 
> I would go even further and disable `EnableX86ECoreOpts` if `UseAVX==1`. Preference?

vpblendps/pd are supported for AVX1 targets, Since the patch is about emulating floating point variable blends using alternate sequence I think we should remove any impediment which prohibit its usage over E-core at AVX1 level.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1404694438