RFR: 8320347: Emulate vblendvp[sd] on ECore [v5]

Fri Nov 24 17:26:08 UTC 2023

On Fri, 24 Nov 2023 06:52:33 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   test break fix
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3601:
> 
>> 3599:     if (compute_mask) {
>> 3600:       vpxor(scratch, scratch, scratch, vector_len);
>> 3601:       vpcmpgtq(scratch, scratch, mask, vector_len);
> 
> I see assertion failures in following tests with JAVA_OPTIONS= -XX:UseAVX=1 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -Xbatch
> 
> compiler/c2/cr6340864/TestDoubleVect.java
> compiler/loopopts/superword/ReductionPerf.java
> compiler/vectorization/TestSignumVector.java
> compiler/vectorization/runner/BasicDoubleOpTest.java
> 
> AVX1 does not support integral vectors above 16 bytes, please use floating point compare instruction.

Hmm. Good catch!

Thinking about AVX1 case some more.. Platforms where this `vpblendvp*` emulation is needed have AVX2 at least, otherwise vpblendvp is faster. I think its better to disable this optimization entirely if AVX1 is required to be used. 

I would go even further and disable `EnableX86ECoreOpts` if `UseAVX==1`. Preference?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1404552951