RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Fri Sep 14 18:39:28 UTC 2018

Looks good to me. I will start testing and let you know results.

Thanks,
Vladimir

On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
> 
> Please find below the updated webrev with all your comments incorporated:
> 
> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
> 
> I have run the jtreg compiler tests on SKX and KNL which have two different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
> 
> Best Regards,
> Sandhya
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Tuesday, September 11, 2018 8:54 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> 
> Thank you, Sandhya
> 
> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
> 
> Vladimir
> 
> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Thanks a lot for the detailed review. I really appreciate your feedback.
>> Please see my response in your email below marked with (Sandhya >>>). Looking forward to your advice.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, September 11, 2018 5:11 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Thank you.
>>
>> I want to discuss next issue:
>>
>>    > You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>    >>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>
>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>> Also we don't check that register could be the same as result you may get unneeded moves.
>>
>> I would advice add memory moves at least.
>>
>> Sandhya >>>  I had added those rules initially and removed them in the final patch. I noticed that the register allocator uses the memory rules (e.g. LoadF) to initialize the idealreg2reg mask (matcher.cpp). I would like the register allocator to get all the possible register on an architecture for idealreg2reg mask. I wondered that multiple instruct rules in .ad file for LoadF from memory might cause problems.  I would have to have higher cost for loading into restricted register set like vlReg. Then I decided that the register allocator can handle this in much better way than me adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>     MachNode *spillCP = match_tree(new LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>> #endif
>>     MachNode *spillI  = match_tree(new LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>     MachNode *spillL  = match_tree(new LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered, LoadNode::DependsO
>> nlyOnTest, false));
>>     MachNode *spillF  = match_tree(new LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>     MachNode *spillD  = match_tree(new LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>     MachNode *spillP  = match_tree(new LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>     ....
>>     idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>
>> An other question. You use movflt() and movdbl() which use either movap[s|d] and movs[s|d] instructions:
>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l164
>> Are these instructions work when avx512vl is not available? I see for vectors you use
>> vpxor+vinserti* combination.
>>
>> Sandhya >>> Yes the scalar floating point instructions are available with AVX512 encoding when avx512vl is not available. That is why you would see not just movflt, movdbl but all the other scalar operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>
>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>
>> Should it be (UseAVX < 3)?
>>
>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Thanks a lot for your review and feedback. Please see my response in your email below. I will send an updated webrev incorporating your feedback.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Monday, September 10, 2018 6:09 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>
>>> Very nice. Thank you, Sandhya.
>>>
>>> I would like to see more meaningful naming in .ad files - instead of rreg* and ovec* to have something like vlReg* and legVec*.
>>>
>>>>>> Yes, accepted.
>>>
>>> New load_from_* and load_to_* instructions in .ad files should be renamed to next and collocate with other Move*_reg_reg* instructions:
>>>
>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>> vlRegF src)
>>>>>> Yes, accepted.
>>>
>>> You did not added instructions to load these registers from memory (and stack). What happens in such cases when you need to load or store?
>>>>>> Let us take an example, e.g. for loading into rregF. First it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>
>>> Also please explain why these registers are used when UseAVX == 0?:
>>>
>>> +instruct absD_reg(rregD dst) %{
>>>        predicate((UseSSE>=2) && (UseAVX == 0));
>>>
>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>      661   if (UseAVX < 3) {
>>>      662     _features &= ~CPU_AVX512F;
>>>
>>>>>> Yes, accepted. It could be regD here.
>>>
>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>
>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>> +vectors_reg_legacy, %{
>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>> VM_Version::supports_avx512dq() &&
>>> VM_Version::supports_avx512vl() %} );
>>>
>>>>>> Yes, accepted.
>>>
>>> I would suggest to test these changes on different machines (non-avx512 and avx512) and with different UseAVX values.
>>>
>>>>>> Will do.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>> Recently there have been couple of high priority issues with regards
>>>> to high bank of XMM register
>>>> (XMM16-XMM31) usage by C2:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>
>>>> Please find below a patch which attempts to clean up the XMM register handling by using register groups.
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>
>>>> The patch provides a restricted set of registers to the match rules
>>>> in the ad file based on the underlying architecture.
>>>>
>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>
>>>> By removing the special handling, the patch reduces the overall code size by about 1800 lines of code.
>>>>
>>>> Your review and feedback is very welcome.
>>>>
>>>> Best Regards,
>>>>
>>>> Sandhya
>>>>