RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Civlin, Jan jan.civlin at intel.com
Wed Apr 20 10:11:52 UTC 2016


Vladimir,

Please look at the updated patch at
http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/

I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().

The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.

The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.

Thank you,

J

[jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 28.756324129 seconds
TestSHA throughput = 356.09558280340946 MB/s

[jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 28.912701124 seconds
TestSHA throughput = 354.1696071938408 MB/s

[jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 29.339789962 seconds
TestSHA throughput = 349.01408678325697 MB/s


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Monday, April 18, 2016 5:09 PM
To: Civlin, Jan; hotspot compiler
Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Hi Jan,

The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.

I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.

Please, move new code in macroAssembler_x86_sha.cpp to the end of file.

_k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:

StubRoutines::x86::_k256_W_adr = generate_k256_W();

What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.

Thanks,
Vladimir

On 4/18/16 2:44 PM, Civlin, Jan wrote:
> == Correction in the subject line ===
>
> We would like to contribute the SHA256 AVX2 intrinsic.
>
> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>
> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>
> Contributor: Jan Civlin.
>
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>


More information about the hotspot-compiler-dev mailing list