RFR(S) 8200067: Vector Carry-less Multiplication support

Vladimir Kozlov vladimir.kozlov at oracle.com
Mon Mar 26 20:36:27 UTC 2018


I was talking about next change since you need new check only when 
vpclmulqdq is supported:

+  if (VM_Version::supports_vpclmulqdq()) {
+    Label Parallel_loop, L_No_Parallel;
+
+    cmpl(len, 8);
+    jccb(Assembler::less, L_No_Parallel);
+
+    movdqu(xmm0, 
ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32));
+    evmovdquq(xmm1, Address(buf, 0), Assembler::AVX_512bit);
+    movdl(xmm5, crc);
+    evpxorq(xmm1, xmm1, xmm5, Assembler::AVX_512bit);
+    addptr(buf, 64);
+    subl(len, 7);
+    evshufi64x2(xmm0, xmm0, xmm0, 0x00, Assembler::AVX_512bit); 
//propagate the mask from 128 bits to 512 bits
+
+    BIND(Parallel_loop);
+    fold_128bit_crc32_avx512(xmm1, xmm0, xmm5, buf, 0);
+    addptr(buf, 64);
+    subl(len, 4);
+    jcc(Assembler::greater, Parallel_loop);
+
+    vextracti64x2(xmm2, xmm1, 0x01);
+    vextracti64x2(xmm3, xmm1, 0x02);
+    vextracti64x2(xmm4, xmm1, 0x03);
+    jmp(L_fold_512b);
+
+    BIND(L_No_Parallel);
+  }

Please, update webrev. I will start testing with my change and let you 
know results.

Thanks,
Vladimir

On 3/26/18 11:51 AM, Rukmannagari, Shravya wrote:
> Hi Vladimir,
> Thanks a lot for reviewing it. I have made the suggested changes. Please find the latest changes below and let me know if you have any questions or comments.
> http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.01/
> 
> Thanks,
> Shravya.
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, March 23, 2018 2:47 PM
> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Kamath, Smita <smita.kamath at intel.com>
> Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support
> 
> Hi Shravya,
> 
> macroAssembler_x86.cpp:
> 
> Why you placed xmm0 initialization before size check?:
> 
> +   movdqu(xmm0,
> + ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32));
> 
> I think initialization and the check should be inside code guarded by supports_vpclmulqdq().
> 
> L_Parallel is not used - no jump to it.
> 
> Thanks,
> Vladimir
> 
> On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote:
>> Hi everyone,
>>
>> As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference"
>> manual [1], vector carry-less multiplication (vpclmulqdq) instruction
>> will be supported in future Intel ISA. I have updated the CRC32
>> algorithm to take advantage of this instruction. I have tested with
>> Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments.
>>
>> http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/
>>
>> Thanks,
>>
>> Shravya.
>>
>> [1]
>> https://software.intel.com/sites/default/files/managed/c5/15/architect
>> ure-instruction-set-extensions-programming-reference.pdf
>>
>> [2]
>> https://software.intel.com/en-us/articles/intel-software-development-e
>> mulator
>>
>> [3] https://bugs.openjdk.java.net/browse/JDK-8200067
>>


More information about the hotspot-compiler-dev mailing list