RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction

Tue Sep 18 04:28:50 UTC 2018

Hi Sandhya,

How are you invoking the test for NativeCallTest?

The way I would do it using jtreg would be something like this:

$ export BUILD_TYPE=release
$ export JDK_PATH=wherever you have your JDK

>From the test subfolder:
$ wherever-your-jtreg-is/bin/jtreg
-nativepath:$JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/support/test/hotspot/jtreg/native/lib
-jdk $JDK_PATH/build/linux-x86_64-normal-server-$BUILD_TYPE/images/jdk
hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java

Seems to pass for me.

But much easier is:
$ make run-test
TEST="test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java"

That seems to pass for me as well and is easier to use :)

For information, the make run-test documentation is here:
http://hg.openjdk.java.net/jdk9/jdk9/raw-file/tip/common/doc/testing.html

Let me know if that helps,
Jc

Note: For NativeCallTest.java and many others I get the following message
> in the corresponding .jtr file:
>                     "test result: Error. Use -nativepath to specify the
> location of native code"
>             Do I need to give any additional info to jtreg to get over
> this problem?
>
> Thanks a lot!
> Best Regards,
> Sandhya
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, September 17, 2018 10:14 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
>
> I finished testing on avx512 machine.
> All passed except known (TestNaNVector.java) failures.
>
> Thanks,
> Vladimir
>
> On 9/14/18 5:22 PM, Vladimir Kozlov wrote:
> > I gave incorrect link to RFE. Here is correct:
> >
> > https://bugs.openjdk.java.net/browse/JDK-8210764
> >
> > Vladimir
> >
> > On 9/14/18 3:49 PM, Vladimir Kozlov wrote:
> >> Build failure I got on MacOS and Windows. Linux passed for some reason
> so I am not surprise you did not noticed.
> >>
> >> Anyway. I tested with these changes and got next failures on avx1
> machines. I am planning to run on avx512 too.
> >>
> >> 1. compiler/vectorization/TestNaNVector.java with '-Xcomp' or
> '-XX:CompileThreshold=100 -XX:-TieredCompilation' on CPU
> >> with AVX1 only
> >>
> >> #  SIGSEGV (0xb) at pc=0x00007f3b146410f0, pid=13871, tid=13884
> >> # Problematic frame:
> >> # V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
> >>
> >> Current CompileTask:
> >> C2:    154    5             java.lang.String::equals (65 bytes)
> >>
> >> Stack: [0x00007f3b10044000,0x00007f3b10145000],
> sp=0x00007f3b1013fe70,  free space=1007k
> >> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> >> V  [libjvm.so+0x46f0f0]  MachNode::ideal_reg() const+0x20
> >> V  [libjvm.so+0x882a72]  PhaseChaitin::gather_lrg_masks(bool)+0x872
> >> V  [libjvm.so+0xd82235]  PhaseCFG::global_code_motion()+0xfc5
> >> V  [libjvm.so+0xd824b1]  PhaseCFG::do_global_code_motion()+0x51
> >> V  [libjvm.so+0xa2c26c]  Compile::Code_Gen()+0x24c
> >> V  [libjvm.so+0xa2ff82]  Compile::Compile(ciEnv*, C2Compiler*,
> ciMethod*, int, bool, bool, bool, DirectiveSet*)+0xe42
> >>
> >>
> ------------------------------------------------------------------------------------------------
> >> 2.
> compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java
> with '-Xcomp'
> >> #  Internal Error
> (workspace/open/src/hotspot/share/c1/c1_LinearScan.cpp:5646), pid=22016,
> tid=22073
> >> #  assert(false) failed: cannot spill interval that is used in first
> instruction (possible reason: no register found)
> >>
> >> Current CompileTask:
> >> C1: 854767 13391       3       org.sunflow.math.Matrix4::multiply (692
> bytes)
> >>
> >> Stack: [0x00007f23b9d82000,0x00007f23b9e83000],
> sp=0x00007f23b9e7f9d0,  free space=1014k
> >> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> >> V  [libjvm.so+0x1882202]  VMError::report_and_die(int, char const*,
> char const*, __va_list_tag*, Thread*, unsigned
> >> char*, void*, void*, char const*, int, unsigned long)+0x562
> >> V  [libjvm.so+0x1882d2f]  VMError::report_and_die(Thread*, void*, char
> const*, int, char const*, char const*,
> >> __va_list_tag*)+0x2f
> >> V  [libjvm.so+0xb0bea0]  report_vm_error(char const*, int, char const*,
> char const*, ...)+0x100
> >> V  [libjvm.so+0x7e0410]
> LinearScanWalker::alloc_locked_reg(Interval*)+0x3a0
> >> V  [libjvm.so+0x7e0a20]  LinearScanWalker::activate_current()+0x280
> >> V  [libjvm.so+0x7e0c7d]  IntervalWalker::walk_to(int) [clone
> .constprop.299]+0x9d
> >> V  [libjvm.so+0x7e1078]  LinearScan::allocate_registers()+0x338
> >> V  [libjvm.so+0x7e2135]  LinearScan::do_linear_scan()+0x155
> >> V  [libjvm.so+0x70a6bb]  Compilation::emit_lir()+0x99b
> >> V  [libjvm.so+0x70caff]  Compilation::compile_java_method()+0x42f
> >> V  [libjvm.so+0x70d974]  Compilation::compile_method()+0x1d4
> >> V  [libjvm.so+0x70e547]  Compilation::Compilation(AbstractCompiler*,
> ciEnv*, ciMethod*, int, BufferBlob*,
> >> DirectiveSet*)+0x357
> >> V  [libjvm.so+0x71073c]  Compiler::compile_method(ciEnv*, ciMethod*,
> int, DirectiveSet*)+0x14c
> >> V  [libjvm.so+0xa3cf89]
> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x409
> >>
> >> Vladimir
> >>
> >> On 9/14/18 1:27 PM, Viswanathan, Sandhya wrote:
> >>>
> >>> Thanks Vladimir, the below should fix this issue:
> >>>
> >>> ------------------------------
> >>> --- old/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
> 13:10:23.488379912 -0700
> >>> +++ new/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp 2018-09-14
> 13:10:23.308379915 -0700
> >>> @@ -233,22 +233,6 @@
> >>>     _xmm_regs[13]  = xmm13;
> >>>     _xmm_regs[14]  = xmm14;
> >>>     _xmm_regs[15]  = xmm15;
> >>> -  _xmm_regs[16]  = xmm16;
> >>> -  _xmm_regs[17]  = xmm17;
> >>> -  _xmm_regs[18]  = xmm18;
> >>> -  _xmm_regs[19]  = xmm19;
> >>> -  _xmm_regs[20]  = xmm20;
> >>> -  _xmm_regs[21]  = xmm21;
> >>> -  _xmm_regs[22]  = xmm22;
> >>> -  _xmm_regs[23]  = xmm23;
> >>> -  _xmm_regs[24]  = xmm24;
> >>> -  _xmm_regs[25]  = xmm25;
> >>> -  _xmm_regs[26]  = xmm26;
> >>> -  _xmm_regs[27]  = xmm27;
> >>> -  _xmm_regs[28]  = xmm28;
> >>> -  _xmm_regs[29]  = xmm29;
> >>> -  _xmm_regs[30]  = xmm30;
> >>> -  _xmm_regs[31]  = xmm31;
> >>>   #endif // _LP64
> >>>
> >>>     for (int i = 0; i < 8; i++) {
> >>> ---------------------------------
> >>>
> >>> I think the gcc version on my desktop is older so didn’t catch this.
> >>>
> >>> The updated patch along with the above change and setting UseAVX as 3
> is uploaded to:
> >>> Patch: http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.02/
> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8209735
> >>>
> >>> FYI, I did notice that the default for UseAVX had been rolled back and
> wanted to get confirmation from you before
> >>> changing it back to 3.
> >>>
> >>> Best Regards,
> >>> Sandhya
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>> Sent: Friday, September 14, 2018 12:13 PM
> >>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> hotspot-compiler-dev at openjdk.java.net
> >>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> >>>
> >>> I got build failure:
> >>>
> >>> workspace/open/src/hotspot/cpu/x86/c1_FrameMap_x86.cpp:236:3: error:
> array index 16 is past the end of the array
> >>> (which contains 16 elements) [-Werror,-Warray-bounds]
> >>> jib >   _xmm_regs[16]  = xmm16;
> >>>
> >>> I also noticed that we don't have RFE for this work. I filed:
> >>>
> >>> https://bugs.openjdk.java.net/browse/JDK-8209735
> >>>
> >>> You did not enabled avx512 by default (8209735 change was synced from
> jdk 11 into 12 2 weeks ago). I added next
> >>> change to yours in src/hotspot/cpu/x86/globals_x86.hpp:
> >>>
> >>> - product(intx, UseAVX, 2, \
> >>> + product(intx, UseAVX, 3, \
> >>>
> >>> Thanks,
> >>> Vladimir
> >>>
> >>> On 9/14/18 11:39 AM, Vladimir Kozlov wrote:
> >>>> Looks good to me. I will start testing and let you know results.
> >>>>
> >>>> Thanks,
> >>>> Vladimir
> >>>>
> >>>> On 9/13/18 6:05 AM, Viswanathan, Sandhya wrote:
> >>>>> Hi Vladimir,
> >>>>>
> >>>>> Please find below the updated webrev with all your comments
> incorporated:
> >>>>>
> >>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.01/
> >>>>>
> >>>>> I have run the jtreg compiler tests on SKX and KNL which have two
> >>>>> different flavors of AVX512 and Haswell a non-avx512 system. Also
> tested SPECjvm2008 on the three platforms.
> >>>>>
> >>>>> Best Regards,
> >>>>> Sandhya
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>>>> Sent: Tuesday, September 11, 2018 8:54 PM
> >>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512 instruction
> >>>>>
> >>>>> Thank you, Sandhya
> >>>>>
> >>>>> I am satisfied with your detailed answer for memory loads issues.
> Okay lets not add them.
> >>>>>
> >>>>> Vladimir
> >>>>>
> >>>>> On 9/11/18 6:13 PM, Viswanathan, Sandhya wrote:
> >>>>>> Hi Vladimir,
> >>>>>>
> >>>>>> Thanks a lot for the detailed review. I really appreciate your
> feedback.
> >>>>>> Please see my response in your email below marked with (Sandhya
> >>>>>>>>> ). Looking forward to your advice.
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Sandhya
> >>>>>>
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>>>>> Sent: Tuesday, September 11, 2018 5:11 PM
> >>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >>>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
> >>>>>> instruction
> >>>>>>
> >>>>>> Thank you.
> >>>>>>
> >>>>>> I want to discuss next issue:
> >>>>>>
> >>>>>>     > You did not added instructions to load these registers from
> >>>>>> memory (and stack). What happens in such cases when you need to
> load or store?
> >>>>>>     >>>> Let us take an example, e.g. for loading into rregF. First
> >>>>>> it gets loaded from memory into regF and then register to register
> move to rregF and vice versa.
> >>>>>>
> >>>>>> This is what I thought. You increase registers pressure this way
> which may cause spills on stack.
> >>>>>> Also we don't check that register could be the same as result you
> may get unneeded moves.
> >>>>>>
> >>>>>> I would advice add memory moves at least.
> >>>>>>
> >>>>>> Sandhya >>>  I had added those rules initially and removed them in
> >>>>>> the final patch. I noticed that the register allocator uses the
> >>>>>> memory rules (e.g. LoadF) to initialize the idealreg2reg mask
> >>>>>> (matcher.cpp). I would like the register allocator to get all the
> >>>>>> possible register on an architecture for idealreg2reg mask. I
> >>>>>> wondered that multiple instruct rules in .ad file for LoadF from
> >>>>>> memory might cause problems.  I would have to have higher cost for
> >>>>>> loading into restricted register set like vlReg. Then I decided that
> >>>>>> the register allocator can handle this in much better way than me
> >>>>>> adding rules to load from memory. This is with the background that
> the regF is always all the available registers
> >>>>>> and vlRegF is the restricted register set. Likewise for VecS and
> legVecS. Let me know you thoughts on this and if
> >>>>>> I should still add the rules to load from memory into vlReg and
> legVec. The specific code from matcher.cpp that I
> >>>>>> am referring to is:
> >>>>>>      MachNode *spillCP = match_tree(new
> >>>>>> LoadNNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
> >>>>>> #endif
> >>>>>>      MachNode *spillI  = match_tree(new
> >>>>>> LoadINode(NULL,mem,fp,atp,TypeInt::INT,MemNode::unordered));
> >>>>>>      MachNode *spillL  = match_tree(new
> >>>>>> LoadLNode(NULL,mem,fp,atp,TypeLong::LONG,MemNode::unordered,
> >>>>>> LoadNode::DependsO nlyOnTest, false));
> >>>>>>      MachNode *spillF  = match_tree(new
> >>>>>> LoadFNode(NULL,mem,fp,atp,Type::FLOAT,MemNode::unordered));
> >>>>>>      MachNode *spillD  = match_tree(new
> >>>>>> LoadDNode(NULL,mem,fp,atp,Type::DOUBLE,MemNode::unordered));
> >>>>>>      MachNode *spillP  = match_tree(new
> >>>>>> LoadPNode(NULL,mem,fp,atp,TypeInstPtr::BOTTOM,MemNode::unordered));
> >>>>>>      ....
> >>>>>>      idealreg2regmask[Op_RegF] = &spillF->out_RegMask();
> >>>>>>
> >>>>>> An other question. You use movflt() and movdbl() which use either
> >>>>>> movap[s|d] and movs[s|d]
> >>>>>> instructions:
> >>>>>>
> http://hg.openjdk.java.net/jdk/jdk/file/4ffb0a33f265/src/hotspot/cpu
> >>>>>> /x86/macroAssembler_x86.hpp#l164 Are these instructions work when
> >>>>>> avx512vl is not available? I see for vectors you use
> >>>>>> vpxor+vinserti* combination.
> >>>>>>
> >>>>>> Sandhya >>> Yes the scalar floating point instructions are available
> >>>>>> with AVX512 encoding when avx512vl is not available. That is why you
> >>>>>> would see not just movflt, movdbl but all the other scalar
> >>>>>> operations like adds, addsd etc using the entire xmm range
> (xmm0-31). In other words they are AVX512F instructions.
> >>>>>>
> >>>>>> Last question. I notice next UseAVX check in vectors spills code in
> x86.ad:
> >>>>>> if ((UseAVX < 2) || VM_Version::supports_avx512vl())
> >>>>>>
> >>>>>> Should it be (UseAVX < 3)?
> >>>>>>
> >>>>>> Sandhya >>> Yes, that is a very good catch. I will fix this in the
> updated webrev.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vladimir
> >>>>>>
> >>>>>> On 9/11/18 2:58 PM, Viswanathan, Sandhya wrote:
> >>>>>>> Hi Vladimir,
> >>>>>>>
> >>>>>>> Thanks a lot for your review and feedback. Please see my response
> >>>>>>> in your email below. I will send an updated webrev incorporating
> your feedback.
> >>>>>>>
> >>>>>>> Best Regards,
> >>>>>>> Sandhya
> >>>>>>>
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> >>>>>>> Sent: Monday, September 10, 2018 6:09 PM
> >>>>>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>;
> >>>>>>> hotspot-compiler-dev at openjdk.java.net
> >>>>>>> Subject: Re: RFR (M) 8207746:C2: Lucene crashes on AVX512
> >>>>>>> instruction
> >>>>>>>
> >>>>>>> Very nice. Thank you, Sandhya.
> >>>>>>>
> >>>>>>> I would like to see more meaningful naming in .ad files - instead
> >>>>>>> of rreg* and ovec* to have something like vlReg* and legVec*.
> >>>>>>>
> >>>>>>>>>> Yes, accepted.
> >>>>>>>
> >>>>>>> New load_from_* and load_to_* instructions in .ad files should be
> >>>>>>> renamed to next and collocate with other Move*_reg_reg*
> instructions:
> >>>>>>>
> >>>>>>> instruct MoveF2VL(vlRegF dst, regF src) instruct MoveVL2F(regF dst,
> >>>>>>> vlRegF src)
> >>>>>>>>>> Yes, accepted.
> >>>>>>>
> >>>>>>> You did not added instructions to load these registers from memory
> >>>>>>> (and stack). What happens in such cases when you need to load or
> store?
> >>>>>>>>>> Let us take an example, e.g. for loading into rregF. First it
> >>>>>>>>>> gets loaded from memory into regF and then register to register
> move to rregF and vice versa.
> >>>>>>>
> >>>>>>> Also please explain why these registers are used when UseAVX == 0?:
> >>>>>>>
> >>>>>>> +instruct absD_reg(rregD dst) %{
> >>>>>>>         predicate((UseSSE>=2) && (UseAVX == 0));
> >>>>>>>
> >>>>>>> we switch off evex so regular regD (only legacy register in this
> case) should work too:
> >>>>>>>       661   if (UseAVX < 3) {
> >>>>>>>       662     _features &= ~CPU_AVX512F;
> >>>>>>>
> >>>>>>>>>> Yes, accepted. It could be regD here.
> >>>>>>>
> >>>>>>> Next checks could be combined by using new function in
> vm_version_x86.hpp (you already have some):
> >>>>>>>
> >>>>>>> +reg_class_dynamic vectors_reg_bwdqvl(vectors_reg_evex,
> >>>>>>> +vectors_reg_legacy, %{
> >>>>>>> VM_Version::supports_evex() && VM_Version::supports_avx512bw() &&
> >>>>>>> VM_Version::supports_avx512dq() &&
> >>>>>>> VM_Version::supports_avx512vl() %} );
> >>>>>>>
> >>>>>>>>>> Yes, accepted.
> >>>>>>>
> >>>>>>> I would suggest to test these changes on different machines
> >>>>>>> (non-avx512 and avx512) and with different UseAVX values.
> >>>>>>>
> >>>>>>>>>> Will do.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Vladimir
> >>>>>>>
> >>>>>>> On 9/5/18 4:09 PM, Viswanathan, Sandhya wrote:
> >>>>>>>> Recently there have been couple of high priority issues with
> >>>>>>>> regards to high bank of XMM register
> >>>>>>>> (XMM16-XMM31) usage by C2:
> >>>>>>>>
> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207746
> >>>>>>>>
> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8209735
> >>>>>>>>
> >>>>>>>> Please find below a patch which attempts to clean up the XMM
> >>>>>>>> register handling by using register groups.
> >>>>>>>>
> >>>>>>>> http://cr.openjdk.java.net/~vdeshpande/xmm_reg/webrev.00/
> >>>>>>>> <http://cr.openjdk.java.net/%7Evdeshpande/xmm_reg/webrev.00/>
> >>>>>>>>
> >>>>>>>> The patch provides a restricted set of registers to the match
> >>>>>>>> rules in the ad file based on the underlying architecture.
> >>>>>>>>
> >>>>>>>> The aim is to remove special handling/workaround from macro
> assembler and assembler.
> >>>>>>>>
> >>>>>>>> By removing the special handling, the patch reduces the overall
> >>>>>>>> code size by about 1800 lines of code.
> >>>>>>>>
> >>>>>>>> Your review and feedback is very welcome.
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>>
> >>>>>>>> Sandhya
> >>>>>>>>
>

-- 

Thanks,
Jc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180917/83fc0cbb/attachment-0001.html>