RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Civlin, Jan jan.civlin at intel.com
Thu Apr 21 19:58:18 UTC 2016


I know how to run jtreg.
I tested it on TestSHA.jar, admittedly it was rather a standalone run. 

$ /cygdrive/E/Java/sha-041116/build/windows-x86_64-normal-server-fastdebug/jdk/bin/java.exe -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:UseSSE=2 -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000
provider = SUN
algorithm = SHA-256
msgSize = 1024 bytes
offset = 0
iters = 10000000
warmupIters = 20000
hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9
TestSHA runtime = 33.507065719 seconds
TestSHA throughput = 305.60718404516876 MB/s


jcivlin at JCIVLIN-DESK /cygdrive/C/Java/hotspot/test/civlin/TestSHA/TestSHA/dist
$

I'll take a look what is wrong with jtreg.


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, April 21, 2016 12:31 PM
To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available)

Good. But testing found that jrteg SHA tests failed because they don't expect the SHA2 is supported:

"Expected message not found: 'SHA instructions are not available on this CPU'"

hotspot/test/compiler/intrinsics/sha/

Do you know how to run jtreg tests? May be Sandhya or Michael can help.

Thanks,
Vladimir

On 4/21/16 11:15 AM, Civlin, Jan wrote:
> Vladimir,
>
> I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function.
> Please look at
>
> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/
>
> Thank you,
>
> J
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 20, 2016 3:51 PM
> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
> <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no 
> supports_sha() available)
>
> Testing is continued but it found next problem already when running tests with -XX:UseSSE=2:
>
> #  Internal Error
> (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.
> cpp:3693),
> pid=52652, tid=3587
> #  Error: assert(VM_Version::supports_ssse3()) failed
>
> V  [libjvm.dylib+0x4193d7]  report_vm_error(char const*, int, char 
> const*, char const*, ...)+0xcd V  [libjvm.dylib+0x1eedd2]  
> Assembler::vpshufb(XMMRegisterImpl*,
> XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V  
> [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, 
> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, 
> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, 
> XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, 
> RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V  
> [libjvm.dylib+0xa4dc47] 
> StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b
>
>
> Vladimir
>
> On 4/20/16 1:13 PM, Civlin, Jan wrote:
>> Thank you, Vladimir.
>>
>> I guess it was a warning.
>> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new.
>>
>>
>> Section 6.7.2.2 of C99 lists the syntax as:
>>
>> enum-specifier:
>>       enum identifieropt { enumerator-list }
>>       enum identifieropt { enumerator-list , }
>>       enum identifier
>> enumerator-list:
>>       enumerator
>>       enumerator-list , enumerator
>> enumerator:
>>       enumeration-constant
>>       enumeration-constant = constant-expression
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 20, 2016 1:04 PM
>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
>> <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>> supports_sha() available)
>>
>> One thing was caught during build is ',' at the last line of enum:
>>
>> +  STACK_SIZE = _RSP      + _RSP_SIZE,
>> +};
>>
>> Compiler complains about it so I removed it in my local repo.
>>
>> Vladimir
>>
>> On 4/20/16 12:07 PM, Civlin, Jan wrote:
>>> Thank you!
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 20, 2016 11:38 AM
>>> To: Civlin, Jan <jan.civlin at intel.com>; hotspot compiler 
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>> supports_sha() available)
>>>
>>> Looks good to me. I submitted testing on all platforms before integrating.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/20/16 3:11 AM, Civlin, Jan wrote:
>>>> Vladimir,
>>>>
>>>> Please look at the updated patch at 
>>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/
>>>>
>>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq().
>>>>
>>>> The k256_W is actually a table of the size of two k256 - each line of k256  is repeated twice. As you have suggested I made changes to generate  k256_W  from k256.
>>>>
>>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64.
>>>>
>>>> Thank you,
>>>>
>>>> J
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/j
>>>> a v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 
>>>> af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA 
>>>> throughput = 356.09558280340946 MB/s
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin
>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 
>>>> af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA 
>>>> throughput = 354.1696071938408 MB/s
>>>>
>>>> [jcivlin at HSW-EP02 TestSHA]$
>>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin
>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics 
>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar
>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes 
>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07
>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 
>>>> af
>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA 
>>>> throughput = 349.01408678325697 MB/s
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Monday, April 18, 2016 5:09 PM
>>>> To: Civlin, Jan; hotspot compiler
>>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no
>>>> supports_sha() available)
>>>>
>>>> Hi Jan,
>>>>
>>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources.
>>>>
>>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(),  instructions.
>>>>
>>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file.
>>>>
>>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256:
>>>>
>>>> StubRoutines::x86::_k256_W_adr = generate_k256_W();
>>>>
>>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/18/16 2:44 PM, Civlin, Jan wrote:
>>>>> == Correction in the subject line ===
>>>>>
>>>>> We would like to contribute the SHA256 AVX2 intrinsic.
>>>>>
>>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only.
>>>>>
>>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message.
>>>>>
>>>>> Contributor: Jan Civlin.
>>>>>
>>>>>
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495
>>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/
>>>>>


More information about the hotspot-compiler-dev mailing list