RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

Thu Dec 12 22:43:50 UTC 2019

Hi Vladimir,

Thanks for reviewing the code. I will make changes to the code as per your suggestion and submit another webrev for review.
I will also post performance gains expected with this change (with and without VBMI2). 

The vector instructions in this code will be executed only after a threshold has reached. I have taken care that the vector code will be executed only on CPU's supporting VBMI2.

Thanks,
Smita

-----Original Message-----
From: Vladimir Kozlov <vladimir.kozlov at oracle.com> 
Sent: Wednesday, December 11, 2019 10:55 AM
To: Kamath, Smita <smita.kamath at intel.com>; 'hotspot compiler' <hotspot-compiler-dev at openjdk.java.net>; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
Subject: Re: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

Hi Kamath,

First, general question. What performance you see when VBMI2 instructions are *not* used with your new code vs code generated by C2.
What improvement you see when VBMI2 is used. This is to understand if we need only VBMI2 version of intrinsic or not.

Second. Sandhya recently pushed 8235510 changes to rollback avx512 code for CRC32 due to performance issues. Does you change has any issues on some Intel's CPU too? Should it be excluded on such CPUs?

Third. I would suggest to wait after we fork JDK 14 with this changes. I think it may be too late for 14 because we would need test this including performance testing.

In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert. 
For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit when UseAVX < 3 ( < avx512). You can also use supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in stubGenerator_x86_64.cpp combinations with that.

I don't think we need separate flag UseVBMI2 - it could be controlled by UseAVX flag. We don't have flag for VNNI or other avx512 instructions subset.

In vm_version_x86.cpp you need to add more %s in print statement for new output.

You need to add @requires vm.compiler2.enabled to new test's commands to run it only with C2.

You need to add intrinsics to Graal's test to ignore them:

http://hg.openjdk.java.net/jdk/jdk/file/d188996ea355/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l416

Thanks,
Vladimir

On 12/10/19 5:41 PM, Kamath, Smita wrote:
> Hi,
> 
> 
> As per Intel Architecture Instruction Set Reference [1] VBMI2 Operations will be supported in future Intel ISA. I would like to contribute optimizations for BigInteger shift operations using AVX512+VBMI2 instructions. This optimization is for x86_64 architecture enabled.
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8167065
> 
> Link to webrev : 
> http://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev00/
> 
> 
> 
> I ran jtreg test suite with the algorithm on Intel SDE [2] to confirm that encoding and semantics are correctly implemented.
> 
> 
> [1] 
> https://software.intel.com/sites/default/files/managed/39/c5/325462-sd
> m-vol-1-2abcd-3abcd.pdf (vpshrdv -> Vol. 2C 5-477 and vpshldv -> Vol. 
> 2C 5-471)
> 
> [2] 
> https://software.intel.com/en-us/articles/intel-software-development-e
> mulator
> 
> 
> Regards,
> 
> Smita Kamath
>