RFR(S) 8200067: Vector Carry-less Multiplication support
Rukmannagari, Shravya
shravya.rukmannagari at intel.com
Tue Mar 27 00:43:18 UTC 2018
Hi Vladimir,
I have made the suggested changes. Please let me know if you have any questions or comments.
http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/
Thanks,
Shravya.
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Monday, March 26, 2018 1:36 PM
To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc: Kamath, Smita <smita.kamath at intel.com>
Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support
I was talking about next change since you need new check only when vpclmulqdq is supported:
+ if (VM_Version::supports_vpclmulqdq()) {
+ Label Parallel_loop, L_No_Parallel;
+
+ cmpl(len, 8);
+ jccb(Assembler::less, L_No_Parallel);
+
+ movdqu(xmm0,
ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32));
+ evmovdquq(xmm1, Address(buf, 0), Assembler::AVX_512bit);
+ movdl(xmm5, crc);
+ evpxorq(xmm1, xmm1, xmm5, Assembler::AVX_512bit);
+ addptr(buf, 64);
+ subl(len, 7);
+ evshufi64x2(xmm0, xmm0, xmm0, 0x00, Assembler::AVX_512bit);
//propagate the mask from 128 bits to 512 bits
+
+ BIND(Parallel_loop);
+ fold_128bit_crc32_avx512(xmm1, xmm0, xmm5, buf, 0);
+ addptr(buf, 64);
+ subl(len, 4);
+ jcc(Assembler::greater, Parallel_loop);
+
+ vextracti64x2(xmm2, xmm1, 0x01);
+ vextracti64x2(xmm3, xmm1, 0x02);
+ vextracti64x2(xmm4, xmm1, 0x03);
+ jmp(L_fold_512b);
+
+ BIND(L_No_Parallel);
+ }
Please, update webrev. I will start testing with my change and let you know results.
Thanks,
Vladimir
On 3/26/18 11:51 AM, Rukmannagari, Shravya wrote:
> Hi Vladimir,
> Thanks a lot for reviewing it. I have made the suggested changes. Please find the latest changes below and let me know if you have any questions or comments.
> http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.01/
>
> Thanks,
> Shravya.
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, March 23, 2018 2:47 PM
> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; hotspot
> compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Kamath, Smita <smita.kamath at intel.com>
> Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support
>
> Hi Shravya,
>
> macroAssembler_x86.cpp:
>
> Why you placed xmm0 initialization before size check?:
>
> + movdqu(xmm0,
> + ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32));
>
> I think initialization and the check should be inside code guarded by supports_vpclmulqdq().
>
> L_Parallel is not used - no jump to it.
>
> Thanks,
> Vladimir
>
> On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote:
>> Hi everyone,
>>
>> As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference"
>> manual [1], vector carry-less multiplication (vpclmulqdq) instruction
>> will be supported in future Intel ISA. I have updated the CRC32
>> algorithm to take advantage of this instruction. I have tested with
>> Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments.
>>
>> http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/
>>
>> Thanks,
>>
>> Shravya.
>>
>> [1]
>> https://software.intel.com/sites/default/files/managed/c5/15/architec
>> t ure-instruction-set-extensions-programming-reference.pdf
>>
>> [2]
>> https://software.intel.com/en-us/articles/intel-software-development-
>> e
>> mulator
>>
>> [3] https://bugs.openjdk.java.net/browse/JDK-8200067
>>
More information about the hotspot-compiler-dev
mailing list