RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
Vladimir Kozlov
vladimir.kozlov at oracle.com
Sat Sep 15 00:22:41 UTC 2018
I gave incorrect link to RFE. Here is correct:
https://bugs.openjdk.java.net/browse/JDK-8210764
Vladimir
On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
> Build failure I got on MacOS and Windows. Linux passed for some reason so I am not surprise you did
> not noticed.
>
> Anyway. I tested with these changes and got next failures on avx1 machines. I am planning to run on
> avx512 too.
>
> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or '-XX:CompileThreshold=100
> -XX:-TieredCompilation' on CPU with AVX1 only
>
> # SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
> # Problematic frame:
> # V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20
>
> Current CompileTask:
> C2: 154 5 java.lang.String::equals (65 bytes)
>
> Stack: [0x00007f3b10044000,0x00007f3b10145000], sp=0x00007f3b1013fe70, free space=1007k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V [libjvm.so+0x46f0f0] MachNode::ideal_reg() const+0x20
> V [libjvm.so+0x882a72] PhaseChaitin::gather_lrg_masks(bool)+0x872
> V [libjvm.so+0xd82235] PhaseCFG::global_code_motion()+0xfc5
> V [libjvm.so+0xd824b1] PhaseCFG::do_global_code_motion()+0x51
> V [libjvm.so+0xa2c26c] Compile::Code_Gen()+0x24c
> V [libjvm.so+0xa2ff82] Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool,
> DirectiveSet*)+0xe42
>
> ------------------------------------------------------------------------------------------------
> 2. compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java with '-Xcomp'
> # Internal Error (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016, tid=22073
> # assert(false) failed: cannot spill interval that is used in first instruction (possible reason:
> no register found)
>
> Current CompileTask:
> C1: 854767 13391 3 org.sunflow.math.Matrix4::multiply (692 bytes)
>
> Stack: [0x00007f23b9d82000,0x00007f23b9e83000], sp=0x00007f23b9e7f9d0, free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V [libjvm.so+0x1882202] VMError::report_and_die(int, char const*, char const*, __va_list_tag*,
> Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x562
> V [libjvm.so+0x1882d2f] VMError::report_and_die(Thread*, void*, char const*, int, char const*,
> char const*, __va_list_tag*)+0x2f
> V [libjvm.so+0xb0bea0] report_vm_error(char const*, int, char const*, char const*, ...)+0x100
> V [libjvm.so+0x7e0410] LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
> V [libjvm.so+0x7e0a20] LinearScanWalker::activate_current()+0x280
> V [libjvm.so+0x7e0c7d] IntervalWalker::walk_to(int) [clone .constprop.299]+0x9d
> V [libjvm.so+0x7e1078] LinearScan::allocate_registers()+0x338
> V [libjvm.so+0x7e2135] LinearScan::do_linear_scan()+0x155
> V [libjvm.so+0x70a6bb] Compilation::emit_lir()+0x99b
> V [libjvm.so+0x70caff] Compilation::compile_java_method()+0x42f
> V [libjvm.so+0x70d974] Compilation::compile_method()+0x1d4
> V [libjvm.so+0x70e547] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int,
> BufferBlob*, DirectiveSet*)+0x357
> V [libjvm.so+0x71073c] Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x14c
> V [libjvm.so+0xa3cf89] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
>
> Vladimir
>
> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
>>
>> Thanks Vladimir, the below should fix this issue:
>>
>> ------------------------------
>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.488379912 -0700
>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14 13:10:23.308379915 -0700
>> @@ -233,22 +233,6 @@
>> _xmm_regs[13] = xmm13;
>> _xmm_regs[14] = xmm14;
>> _xmm_regs[15] = xmm15;
>> - _xmm_regs[16] = xmm16;
>> - _xmm_regs[17] = xmm17;
>> - _xmm_regs[18] = xmm18;
>> - _xmm_regs[19] = xmm19;
>> - _xmm_regs[20] = xmm20;
>> - _xmm_regs[21] = xmm21;
>> - _xmm_regs[22] = xmm22;
>> - _xmm_regs[23] = xmm23;
>> - _xmm_regs[24] = xmm24;
>> - _xmm_regs[25] = xmm25;
>> - _xmm_regs[26] = xmm26;
>> - _xmm_regs[27] = xmm27;
>> - _xmm_regs[28] = xmm28;
>> - _xmm_regs[29] = xmm29;
>> - _xmm_regs[30] = xmm30;
>> - _xmm_regs[31] = xmm31;
>> #endif // _LP64
>>
>> for (int i = 0; i < 8; i++) {
>> ---------------------------------
>>
>> I think the gcc version on my desktop is older so didn’t catch this.
>>
>> The updated patch along with the above change and setting UseAVX as 3 is uploaded to:
>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
>>
>> FYI, I did notice that the default for UseAVX had been rolled back and wanted to get confirmation
>> from you before changing it back to 3.
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Friday, September 14, 2018 12:13 PM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>
>> I got build failure:
>>
>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error: array index 16 is past the
>> end of the array (which contains 16 elements) [-Werror,-Warray-bounds]
>> jib > _xmm_regs[16] = xmm16;
>>
>> I also noticed that we don't have RFE for this work. I filed:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>
>> You did not enabled avx512 by default (8209735 change was synced from jdk 11 into 12 2 weeks ago).
>> I added next change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
>>
>> - product(intx, UseAVX, 2, \
>> + product(intx, UseAVX, 3, \
>>
>> Thanks,
>> Vladimir
>>
>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
>>> Looks good to me. I will start testing and let you know results.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
>>>> Hi Vladimir,
>>>>
>>>> Please find below the updated webrev with all your comments incorporated:
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
>>>>
>>>> I have run the jtreg compiler tests on SKX and KNL which have two
>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also tested SPECjvm2008 on the
>>>> three platforms.
>>>>
>>>> Best Regards,
>>>> Sandhya
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Tuesday, September 11, 2018 8:54 PM
>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>>>>
>>>> Thank you, Sandhya
>>>>
>>>> I am satisfied with your detailed answer for memory loads issues. Okay lets not add them.
>>>>
>>>> Vladimir
>>>>
>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Thanks a lot for the detailed review. I really appreciate your feedback.
>>>>> Please see my response in your email below marked with (Sandhya
>>>>>>>> ). Looking forward to your advice.
>>>>>
>>>>> Best Regards,
>>>>> Sandhya
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>> instruction
>>>>>
>>>>> Thank you.
>>>>>
>>>>> I want to discuss next issue:
>>>>>
>>>>> > You did not added instructions to load these registers from
>>>>> memory (and stack). What happens in such cases when you need to load or store?
>>>>> >>>> Let us take an example, e.g. for loading into rregF. First
>>>>> it gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>
>>>>> This is what I thought. You increase registers pressure this way which may cause spills on stack.
>>>>> Also we don't check that register could be the same as result you may get unneeded moves.
>>>>>
>>>>> I would advice add memory moves at least.
>>>>>
>>>>> Sandhya >>> I had added those rules initially and removed them in
>>>>> the final patch. I noticed that the register allocator uses the
>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
>>>>> (matcher.cpp). I would like the register allocator to get all the
>>>>> possible register on an architecture for idealreg2reg mask. I
>>>>> wondered that multiple instruct rules in .ad file for LoadF from
>>>>> memory might cause problems. I would have to have higher cost for
>>>>> loading into restricted register set like vlReg. Then I decided that
>>>>> the register allocator can handle this in much better way than me
>>>>> adding rules to load from memory. This is with the background that the regF is always all the
>>>>> available registers and vlRegF is the restricted register set. Likewise for VecS and legVecS.
>>>>> Let me know you thoughts on this and if I should still add the rules to load from memory into
>>>>> vlReg and legVec. The specific code from matcher.cpp that I am referring to is:
>>>>> MachNode *spillCP = match_tree(new
>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>> #endif
>>>>> MachNode *spillI = match_tree(new
>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
>>>>> MachNode *spillL = match_tree(new
>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
>>>>> LoadNode::DependsO nlyOnTest, false));
>>>>> MachNode *spillF = match_tree(new
>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
>>>>> MachNode *spillD = match_tree(new
>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
>>>>> MachNode *spillP = match_tree(new
>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
>>>>> ....
>>>>> idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
>>>>>
>>>>> An other question. You use movflt() and movdbl() which use either
>>>>> movap[s|d] and movs[s|d]
>>>>> instructions:
>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
>>>>> avx512vl is not available? I see for vectors you use
>>>>> vpxor+vinserti* combination.
>>>>>
>>>>> Sandhya >>> Yes the scalar floating point instructions are available
>>>>> with AVX512 encoding when avx512vl is not available. That is why you
>>>>> would see not just movflt, movdbl but all the other scalar
>>>>> operations like adds, addsd etc using the entire xmm range (xmm0-31). In other words they are
>>>>> AVX512F instructions.
>>>>>
>>>>> Last question. I notice next UseAVX check in vectors spills code in x86.ad:
>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
>>>>>
>>>>> Should it be (UseAVX < 3)?
>>>>>
>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the updated webrev.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Thanks a lot for your review and feedback. Please see my response
>>>>>> in your email below. I will send an updated webrev incorporating your feedback.
>>>>>>
>>>>>> Best Regards,
>>>>>> Sandhya
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>>> Sent: Monday, September 10, 2018 6:09 PM
>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
>>>>>> hotspot-compiler-dev at openjdk.java.net
>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
>>>>>> instruction
>>>>>>
>>>>>> Very nice. Thank you, Sandhya.
>>>>>>
>>>>>> I would like to see more meaningful naming in .ad files - instead
>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
>>>>>>
>>>>>>>>> Yes, accepted.
>>>>>>
>>>>>> New load_from_* and load_to_* instructions in .ad files should be
>>>>>> renamed to next and collocate with other Move*_reg_reg* instructions:
>>>>>>
>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
>>>>>> vlRegF src)
>>>>>>>>> Yes, accepted.
>>>>>>
>>>>>> You did not added instructions to load these registers from memory
>>>>>> (and stack). What happens in such cases when you need to load or store?
>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
>>>>>>>>> gets loaded from memory into regF and then register to register move to rregF and vice versa.
>>>>>>
>>>>>> Also please explain why these registers are used when UseAVX == 0?:
>>>>>>
>>>>>> +instruct absD_reg(rregD dst) %{
>>>>>> predicate((UseSSE>=2) && (UseAVX == 0));
>>>>>>
>>>>>> we switch off evex so regular regD (only legacy register in this case) should work too:
>>>>>> 661 if (UseAVX < 3) {
>>>>>> 662 _features &= ~CPU_AVX512F;
>>>>>>
>>>>>>>>> Yes, accepted. It could be regD here.
>>>>>>
>>>>>> Next checks could be combined by using new function in vm_version_x86.hpp (you already have
>>>>>> some):
>>>>>>
>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
>>>>>> +vectors_reg_legacy, %{
>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
>>>>>> VM_Version::supports_avx512dq() &&
>>>>>> VM_Version::supports_avx512vl() %} );
>>>>>>
>>>>>>>>> Yes, accepted.
>>>>>>
>>>>>> I would suggest to test these changes on different machines
>>>>>> (non-avx512 and avx512) and with different UseAVX values.
>>>>>>
>>>>>>>>> Will do.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
>>>>>>> Recently there have been couple of high priority issues with
>>>>>>> regards to high bank of XMM register
>>>>>>> (XMM16-XMM31) usage by C2:
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
>>>>>>>
>>>>>>> Please find below a patch which attempts to clean up the XMM
>>>>>>> register handling by using register groups.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
>>>>>>>
>>>>>>> The patch provides a restricted set of registers to the match
>>>>>>> rules in the ad file based on the underlying architecture.
>>>>>>>
>>>>>>> The aim is to remove special handling/workaround from macro assembler and assembler.
>>>>>>>
>>>>>>> By removing the special handling, the patch reduces the overall
>>>>>>> code size by about 1800 lines of code.
>>>>>>>
>>>>>>> Your review and feedback is very welcome.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Sandhya
>>>>>>>
More information about the hotspot-compiler-dev
mailing list