RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Fri Sep 14 20:27:29 UTC 2018
Thanks Vladimir, the below should fix this issue:
------------------------------
--- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
+++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
@@ -233,22 +233,6 @@
_xmm_regs[13] = xmm13;
_xmm_regs[14] = xmm14;
_xmm_regs[15] = xmm15;
- _xmm_regs[16] = xmm16;
- _xmm_regs[17] = xmm17;
- _xmm_regs[18] = xmm18;
- _xmm_regs[19] = xmm19;
- _xmm_regs[20] = xmm20;
- _xmm_regs[21] = xmm21;
- _xmm_regs[22] = xmm22;
- _xmm_regs[23] = xmm23;
- _xmm_regs[24] = xmm24;
- _xmm_regs[25] = xmm25;
- _xmm_regs[26] = xmm26;
- _xmm_regs[27] = xmm27;
- _xmm_regs[28] = xmm28;
- _xmm_regs[29] = xmm29;
- _xmm_regs[30] = xmm30;
- _xmm_regs[31] = xmm31;
#endif // _LP64
for (int i = 0; i < 8; i++) {
---------------------------------
I think the gcc version on my desktop is older so didn’t catch this.
The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation from you before changing it back to 3.
Best Regards,
Sandhya
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Friday, September 14, 2018 12:13 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
I got build failure:
workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the end of the array (which contains 16 elements) [-Werror,-Warray-bounds]
jib > _xmm_regs[16] = xmm16;
I also noticed that we don't have RFE for this work. I filed:
https://bugs.openjdk.java.net/browse/JDK-8209735
You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago). I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
- product(intx, UseAVX, 2, \
+ product(intx, UseAVX, 3, \
Thanks,
Vladimir
On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
> Looks good to me. I will start testing and let you know results.
>
> Thanks,
> Vladimir
>
> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>> Hi Vladimir,
>>
>> Please find below the updated webrev with all your comments incorporated:
>>
>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>
>> I have run the jtreg compiler tests on SKX and KNL which have two
>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the three platforms.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Tuesday, September 11, 2018 8:54 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>> hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> Thank you, Sandhya
>>
>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>
>> Vladimir
>>
>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>> Hi Vladimir,
>>>
>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>> Please see my response in your email below marked with (Sandhya
>>> >>>). Looking forward to your advice.
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>> instruction
>>>
>>> Thank you.
>>>
>>> I want to discuss next issue:
>>>
>>> > You did not added instructions to load these registers from
>>> memory (and stack). What happens in such cases when you need to load or store?
>>> >>>> Let us take an example, e.g. for loading into rregF. First
>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>
>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>
>>> I would advice add memory moves at least.
>>>
>>> Sandhya >>> I had added those rules initially and removed them in
>>> the final patch. I noticed that the register allocator uses the
>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>> (matcher.cpp). I would like the register allocator to get all the
>>> possible register on an architecture for idealreg2reg mask. I
>>> wondered that multiple instruct rules in .ad file for LoadF from
>>> memory might cause problems. I would have to have higher cost for
>>> loading into restricted register set like vlReg. Then I decided that
>>> the register allocator can handle this in much better way than me
>>> adding rules to load from memory. This is with the background that the regF is always all the available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS. Let me know you thoughts on this and if I should still add the rules to load from memory into vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>> MachNode *spillCP = match_tree(new
>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> #endif
>>> MachNode *spillI = match_tree(new
>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>> MachNode *spillL = match_tree(new
>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>> LoadNode::DependsO nlyOnTest, false));
>>> MachNode *spillF = match_tree(new
>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>> MachNode *spillD = match_tree(new
>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>> MachNode *spillP = match_tree(new
>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>> ....
>>> idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>
>>> An other question. You use movflt() and movdbl() which use either
>>> movap[s|d] and movs[s|d]
>>> instructions:
>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>> avx512vl is not available? I see for vectors you use
>>> vpxor+vinserti* combination.
>>>
>>> Sandhya >>> Yes the scalar floating point instructions are available
>>> with AVX512 encoding when avx512vl is not available. That is why you
>>> would see not just movflt, movdbl but all the other scalar
>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are AVX512F instructions.
>>>
>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>
>>> Should it be (UseAVX < 3)?
>>>
>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks a lot for your review and feedback. Please see my response
>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>> instruction
>>>>
>>>> Very nice. Thank you, Sandhya.
>>>>
>>>> I would like to see more meaningful naming in .ad files - instead
>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>
>>>>>>> Yes, accepted.
>>>>
>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>
>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>> vlRegF src)
>>>>>>> Yes, accepted.
>>>>
>>>> You did not added instructions to load these registers from memory
>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>
>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>
>>>> +instruct absD_reg(rregD dst) %{
>>>> predicate((UseSSE>=2) && (UseAVX == 0));
>>>>
>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>> 661 if (UseAVX < 3) {
>>>> 662 _features &= ~CPU_AVX512F;
>>>>
>>>>>>> Yes, accepted. It could be regD here.
>>>>
>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have some):
>>>>
>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>> +vectors_reg_legacy, %{
>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>> VM_Version::supports_avx512dq() &&
>>>> VM_Version::supports_avx512vl() %} );
>>>>
>>>>>>> Yes, accepted.
>>>>
>>>> I would suggest to test these changes on different machines
>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>
>>>>>>> Will do.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>> Recently there have been couple of high priority issues with
>>>>> regards to high bank of XMM register
>>>>> (XMM16-XMM31) usage by C2:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>
>>>>> Please find below a patch which attempts to clean up the XMM
>>>>> register handling by using register groups.
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>
>>>>> The patch provides a restricted set of registers to the match
>>>>> rules in the ad file based on the underlying architecture.
>>>>>
>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>
>>>>> By removing the special handling, the patch reduces the overall
>>>>> code size by about 1800 lines of code.
>>>>>
>>>>> Your review and feedback is very welcome.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Sandhya
>>>>>
More information about the hotspot-compiler-dev
mailing list