RFR(S) 8200067: Vector Carry-less Multiplication support

Tue Mar 27 00:47:09 UTC 2018

Good.

Testing passed with these changes. I will push it.

Thanks,
Vladimir

On 3/26/18 5:43 PM, Rukmannagari, Shravya wrote:
> Hi Vladimir,
> I have made the suggested changes. Please let me know if you have any questions or comments.
> http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.02/
> 
> Thanks,
> Shravya.
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, March 26, 2018 1:36 PM
> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Kamath, Smita <smita.kamath at intel.com>
> Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support
> 
> I was talking about next change since you need new check only when vpclmulqdq is supported:
> 
> +  if (VM_Version::supports_vpclmulqdq()) {
> +    Label Parallel_loop, L_No_Parallel;
> +
> +    cmpl(len, 8);
> +    jccb(Assembler::less, L_No_Parallel);
> +
> +    movdqu(xmm0,
> ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32));
> +    evmovdquq(xmm1, Address(buf, 0), Assembler::AVX_512bit);
> +    movdl(xmm5, crc);
> +    evpxorq(xmm1, xmm1, xmm5, Assembler::AVX_512bit);
> +    addptr(buf, 64);
> +    subl(len, 7);
> +    evshufi64x2(xmm0, xmm0, xmm0, 0x00, Assembler::AVX_512bit);
> //propagate the mask from 128 bits to 512 bits
> +
> +    BIND(Parallel_loop);
> +    fold_128bit_crc32_avx512(xmm1, xmm0, xmm5, buf, 0);
> +    addptr(buf, 64);
> +    subl(len, 4);
> +    jcc(Assembler::greater, Parallel_loop);
> +
> +    vextracti64x2(xmm2, xmm1, 0x01);
> +    vextracti64x2(xmm3, xmm1, 0x02);
> +    vextracti64x2(xmm4, xmm1, 0x03);
> +    jmp(L_fold_512b);
> +
> +    BIND(L_No_Parallel);
> +  }
> 
> Please, update webrev. I will start testing with my change and let you know results.
> 
> Thanks,
> Vladimir
> 
> On 3/26/18 11:51 AM, Rukmannagari, Shravya wrote:
>> Hi Vladimir,
>> Thanks a lot for reviewing it. I have made the suggested changes. Please find the latest changes below and let me know if you have any questions or comments.
>> http://cr.openjdk.java.net/~srukmannagar/ICL_crc32/webrev.01/
>>
>> Thanks,
>> Shravya.
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Friday, March 23, 2018 2:47 PM
>> To: Rukmannagari, Shravya <shravya.rukmannagari at intel.com>; hotspot
>> compiler <hotspot-compiler-dev at openjdk.java.net>
>> Cc: Kamath, Smita <smita.kamath at intel.com>
>> Subject: Re: RFR(S) 8200067: Vector Carry-less Multiplication support
>>
>> Hi Shravya,
>>
>> macroAssembler_x86.cpp:
>>
>> Why you placed xmm0 initialization before size check?:
>>
>> +   movdqu(xmm0,
>> + ExternalAddress(StubRoutines::x86::crc_by128_masks_addr() + 32));
>>
>> I think initialization and the check should be inside code guarded by supports_vpclmulqdq().
>>
>> L_Parallel is not used - no jump to it.
>>
>> Thanks,
>> Vladimir
>>
>> On 3/22/18 12:11 PM, Rukmannagari, Shravya wrote:
>>> Hi everyone,
>>>
>>> As per "Intel Architecture Instruction Set Extensions and Future Features Programming Reference"
>>> manual [1], vector carry-less multiplication (vpclmulqdq) instruction
>>> will be supported in future Intel ISA. I have updated the CRC32
>>> algorithm to take advantage of this instruction. I have tested with
>>> Intel SDE [2] to confirm encoding and semantics are correctly implemented. Please take a look and let me know if you have any questions or comments.
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/ICL_crc32/webrev.00/
>>>
>>> Thanks,
>>>
>>> Shravya.
>>>
>>> [1]
>>> https://software.intel.com/sites/default/files/managed/c5/15/architec
>>> t ure-instruction-set-extensions-programming-reference.pdf
>>>
>>> [2]
>>> https://software.intel.com/en-us/articles/intel-software-development-
>>> e
>>> mulator
>>>
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8200067
>>>