From christian.thalinger at oracle.com Fri Apr 1 00:22:08 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 31 Mar 2016 14:22:08 -1000 Subject: RFR: 8144964: JVMCI compilations need to be disabled until the module system is initialized In-Reply-To: <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> References: <56FCB10C.3020609@oracle.com> <1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com> <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> Message-ID: Vladimir pointed out a bug. Of course it should be: + bool must_load; +#if INCLUDE_JVMCI + if (EnableJVMCI) { + // If JVMCI is enabled we require its classes to be found. + must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci); + } else +#endif + { + must_load = (init_opt < SystemDictionary::Opt); + } > On Mar 31, 2016, at 1:08 PM, Christian Thalinger wrote: > > I found a problem when graal.jar is appended to the boot class path. Somehow (and I don?t know why, yet) in that case jdk.vm.ci classes are not found when trying to preload them and the VM crashes. We need to make sure the jdk.vm.ci classes are preloaded if the JVMCI is enabled. > > diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp > --- a/src/share/vm/classfile/systemDictionary.cpp Thu Mar 31 09:16:49 2016 -0700 > +++ b/src/share/vm/classfile/systemDictionary.cpp Thu Mar 31 13:04:35 2016 -1000 > @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla > int sid = (info >> CEIL_LG_OPTION_LIMIT); > Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid); > InstanceKlass** klassp = &_well_known_klasses[id]; > - bool must_load = (init_opt < SystemDictionary::Opt); > + > + bool must_load; > +#if INCLUDE_JVMCI > + if (EnableJVMCI) { > + // If JVMCI is enabled we require its classes to be found. > + must_load = (init_opt <= SystemDictionary::Jvmci); > + } else > +#endif > + { > + must_load = (init_opt < SystemDictionary::Opt); > + } > + > if ((*klassp) == NULL) { > Klass* k; > if (must_load) { > diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp > --- a/src/share/vm/classfile/systemDictionary.hpp Thu Mar 31 09:16:49 2016 -0700 > +++ b/src/share/vm/classfile/systemDictionary.hpp Thu Mar 31 13:04:35 2016 -1000 > @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic { > > Opt, // preload tried; NULL if not present > #if INCLUDE_JVMCI > - Jvmci, // preload tried; error if not present, use only with JVMCI > + Jvmci, // preload tried; error if not present if JVMCI enabled > #endif > OPTION_LIMIT, > CEIL_LG_OPTION_LIMIT = 2 // OPTION_LIMIT <= (1< > >> On Mar 31, 2016, at 11:10 AM, Christian Thalinger > wrote: >> >> Thanks, Vladimir. >> >>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov > wrote: >>> >>> Looks fine. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/30/16 5:01 PM, Christian Thalinger wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8144964 >>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/ >>>> >>>> JVMCI compilations need to be disabled until the module system is initialized. Basically, only allow tier 1-3 compilations until it's up. >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Apr 1 00:36:01 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 31 Mar 2016 17:36:01 -0700 Subject: RFR: 8144964: JVMCI compilations need to be disabled until the module system is initialized In-Reply-To: References: <56FCB10C.3020609@oracle.com> <1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com> <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> Message-ID: <56FDC271.7090201@oracle.com> Looks good. Thanks, Vladimir On 3/31/16 5:22 PM, Christian Thalinger wrote: > Vladimir pointed out a bug. Of course it should be: > > + bool must_load; > +#if INCLUDE_JVMCI > + if (EnableJVMCI) { > + // If JVMCI is enabled we require its classes to be found. > + must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci); > + } else > +#endif > + { > + must_load = (init_opt < SystemDictionary::Opt); > + } > >> On Mar 31, 2016, at 1:08 PM, Christian Thalinger > > wrote: >> >> I found a problem when graal.jar is appended to the boot class path. Somehow (and I don?t know why, yet) in that case >> jdk.vm.ci classes are not found when trying to preload them and the VM crashes. We need to make sure the jdk.vm.ci >> classes are preloaded if the JVMCI is enabled. >> >> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp >> --- a/src/share/vm/classfile/systemDictionary.cppThu Mar 31 09:16:49 2016 -0700 >> +++ b/src/share/vm/classfile/systemDictionary.cppThu Mar 31 13:04:35 2016 -1000 >> @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla >> int sid = (info >> CEIL_LG_OPTION_LIMIT); >> Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid); >> InstanceKlass** klassp = &_well_known_klasses[id]; >> - bool must_load = (init_opt < SystemDictionary::Opt); >> + >> + bool must_load; >> +#if INCLUDE_JVMCI >> + if (EnableJVMCI) { >> + // If JVMCI is enabled we require its classes to be found. >> + must_load = (init_opt <= SystemDictionary::Jvmci); >> + } else >> +#endif >> + { >> + must_load = (init_opt < SystemDictionary::Opt); >> + } >> + >> if ((*klassp) == NULL) { >> Klass* k; >> if (must_load) { >> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp >> --- a/src/share/vm/classfile/systemDictionary.hppThu Mar 31 09:16:49 2016 -0700 >> +++ b/src/share/vm/classfile/systemDictionary.hppThu Mar 31 13:04:35 2016 -1000 >> @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic { >> >> Opt, // preload tried; NULL if not present >> #if INCLUDE_JVMCI >> - Jvmci, // preload tried; error if not present, use only with JVMCI >> + Jvmci, // preload tried; error if not present if JVMCI enabled >> #endif >> OPTION_LIMIT, >> CEIL_LG_OPTION_LIMIT = 2 // OPTION_LIMIT <= (1<> >> >>> On Mar 31, 2016, at 11:10 AM, Christian Thalinger >> > wrote: >>> >>> Thanks, Vladimir. >>> >>>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov > wrote: >>>> >>>> Looks fine. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/30/16 5:01 PM, Christian Thalinger wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8144964 >>>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/ >>>>> >>>>> JVMCI compilations need to be disabled until the module system is initialized. Basically, only allow tier 1-3 >>>>> compilations until it's up. >>>>> >>> >> > From christian.thalinger at oracle.com Fri Apr 1 00:36:59 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 31 Mar 2016 14:36:59 -1000 Subject: RFR: 8144964: JVMCI compilations need to be disabled until the module system is initialized In-Reply-To: <56FDC271.7090201@oracle.com> References: <56FCB10C.3020609@oracle.com> <1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com> <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> <56FDC271.7090201@oracle.com> Message-ID: <8E272A3F-4D29-4E31-BAC9-B602745F87A4@oracle.com> Thank you, Vladimir. > On Mar 31, 2016, at 2:36 PM, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 3/31/16 5:22 PM, Christian Thalinger wrote: >> Vladimir pointed out a bug. Of course it should be: >> >> + bool must_load; >> +#if INCLUDE_JVMCI >> + if (EnableJVMCI) { >> + // If JVMCI is enabled we require its classes to be found. >> + must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci); >> + } else >> +#endif >> + { >> + must_load = (init_opt < SystemDictionary::Opt); >> + } >> >>> On Mar 31, 2016, at 1:08 PM, Christian Thalinger >> >> wrote: >>> >>> I found a problem when graal.jar is appended to the boot class path. Somehow (and I don?t know why, yet) in that case >>> jdk.vm.ci classes are not found when trying to preload them and the VM crashes. We need to make sure the jdk.vm.ci >>> classes are preloaded if the JVMCI is enabled. >>> >>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp >>> --- a/src/share/vm/classfile/systemDictionary.cppThu Mar 31 09:16:49 2016 -0700 >>> +++ b/src/share/vm/classfile/systemDictionary.cppThu Mar 31 13:04:35 2016 -1000 >>> @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla >>> int sid = (info >> CEIL_LG_OPTION_LIMIT); >>> Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid); >>> InstanceKlass** klassp = &_well_known_klasses[id]; >>> - bool must_load = (init_opt < SystemDictionary::Opt); >>> + >>> + bool must_load; >>> +#if INCLUDE_JVMCI >>> + if (EnableJVMCI) { >>> + // If JVMCI is enabled we require its classes to be found. >>> + must_load = (init_opt <= SystemDictionary::Jvmci); >>> + } else >>> +#endif >>> + { >>> + must_load = (init_opt < SystemDictionary::Opt); >>> + } >>> + >>> if ((*klassp) == NULL) { >>> Klass* k; >>> if (must_load) { >>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp >>> --- a/src/share/vm/classfile/systemDictionary.hppThu Mar 31 09:16:49 2016 -0700 >>> +++ b/src/share/vm/classfile/systemDictionary.hppThu Mar 31 13:04:35 2016 -1000 >>> @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic { >>> >>> Opt, // preload tried; NULL if not present >>> #if INCLUDE_JVMCI >>> - Jvmci, // preload tried; error if not present, use only with JVMCI >>> + Jvmci, // preload tried; error if not present if JVMCI enabled >>> #endif >>> OPTION_LIMIT, >>> CEIL_LG_OPTION_LIMIT = 2 // OPTION_LIMIT <= (1<>> >>> >>>> On Mar 31, 2016, at 11:10 AM, Christian Thalinger >>>> >> wrote: >>>> >>>> Thanks, Vladimir. >>>> >>>>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov >> wrote: >>>>> >>>>> Looks fine. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/30/16 5:01 PM, Christian Thalinger wrote: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8144964 >>>>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/ >>>>>> >>>>>> JVMCI compilations need to be disabled until the module system is initialized. Basically, only allow tier 1-3 >>>>>> compilations until it's up. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Fri Apr 1 03:36:45 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 1 Apr 2016 03:36:45 +0000 Subject: CR for RFR 8151573 In-Reply-To: <56FCADE3.20403@oracle.com> References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> Message-ID: Vladimir, I think I have addressed every concern in the latest webrev: http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. The code is fully retested with no issues. Thanks, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, March 30, 2016 9:56 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 On 3/30/16 4:57 PM, Berg, Michael C wrote: > See below for context. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 3:51 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > Michael, > > First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. > > multi_version_post_loops() can use is_canonical_main_loop_entry() from > 8148754 but you need to modify it to move > is_Main() assert to other call sites. > > ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > > I did not get rce'd post loop checks in loopnode.cpp. > > First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. > Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. > The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > > Swap next checks since has_range_checks() may be expensive scanning loop body: > + // only process RCE'd main loops > + if (cl->has_range_checks() || !cl->is_main_loop()) return; > > Ok, makes sense. Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. But, please, rename has_range_checks(cl) method to avoid confusion. Thanks, Vladimir > > Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. > I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. > > I perceive the real problem is don't scan more than once after we check. I will move towards that solution. > > > Why you need local copies?: > > - visited.Clear(); > - clones.clear(); > + Arena *a = Thread::current()->resource_area(); > + VectorSet visited(a); > + Node_Stack clones(a, main_head->back_control()->outcnt()); > > I will look into this, and see if it can be cleaned up. > > > I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. > > Ok, I will look into a version without PostLoopInfo. > > Thanks, > Vladimir > > On 3/30/16 1:44 PM, Berg, Michael C wrote: >> Here is an update after full testing, the webrev is: >> >> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >> >> Please review and comment, >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: hotspot-compiler-dev >> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >> Berg, Michael C >> Sent: Wednesday, March 16, 2016 10:30 AM >> To: Vladimir Kozlov ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: RE: CR for RFR 8151573 >> >> Putting a hold on the review, retesting everything on my end. >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 16, 2016 8:42 AM >> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: CR for RFR 8151573 >> >> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>> Vladimir: >>> >>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >> >> I understand that we can get some benefits. But in general case they will not be visible. >> >>> >>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >> >> Yes, after you explained me vector masking I now understand why it could be used for post loop. >> >> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >> >> Regards, >> Vladimir >> >>> >>> Regards, >>> Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, March 15, 2016 4:37 PM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> As we all know we can always construct microbenchmarks which shows >>> 30% >>> - 50% difference. When in real application we will never see >>> difference. I still don't see a real reason why we should spend time >>> and optimize >>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>> >>> Why "programmable SIMD" depends on it? What about pre-loop? >>> >>> Thanks, >>> Vladimir >>> >>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>> Correction below... >>>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev >>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>>> Berg, Michael C >>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: RE: CR for RFR 8151573 >>>> >>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>> >>>> for(int i = 0; i < process_len; i++) >>>> { >>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>> } >>>> >>>> The above code makes 9 vector ops. >>>> >>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>> >>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>> >>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>> >>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>> >>>> Regards, >>>> Michael >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> Hi Michael, >>>> >>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute multi-versioning post loops for range >>>>> check elimination. Beforehand cfg optimizations after register >>>>> allocation were where post loop optimizations were done for range >>>>> checks. I have added code which produces the desired effect much >>>>> earlier by introducing a safe transformation which will minimally >>>>> allow a range check free version of the final post loop to execute >>>>> up until the point it actually has to take a range check exception >>>>> by re-ranging the limit of the rce'd loop, then exit the rce'd >>>>> post loop and take the range check exception in the legacy loops execution if required. >>>>> If during optimization we discover that we know enough to remove >>>>> the range check version of the post loop, mostly by exposing the >>>>> load range values into the limit logic of the rce'd post loop, we >>>>> will eliminate the range check post loop altogether much like cfg >>>>> optimizations did, but much earlier. This gives optimizations >>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>> vectorize the rce'd post loops to a single iteration based on mask >>>>> vectors which map to the residual iterations. Programmable SIMD >>>>> will be a follow on change set utilizing this code to stage its >>>>> work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>> Currently I have enabled this optimization for x86 only. We base >>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>> >>>>> This code was tested as follows: >>>>> >>>>> >>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>> >>>>> >>>>> webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>> >>>>> Thanks, >>>>> >>>>> Michael >>>>> From igor.veresov at oracle.com Fri Apr 1 05:44:05 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 31 Mar 2016 22:44:05 -0700 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56FD2F47.1000709@azulsystems.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> <56A8BC9D.8060004@azulsystems.com> <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> <56AA2AE4.2090803@azulsystems.com> <2538083C-7906-44AA-A074-7DBF5F2D8654@oracle.com> <50C14C66-4068-4DD7-BD94-96E37F7C9B0A@oracle.com> <56AF85F3.3060802@azulsystems.com> <56BBCBF4.2070504@azulsystems.com> <56BD1F7F.3020808@azulsystems.com> <56E0B770.1@azulsystems.com> <56FD2F47.1000709@azulsystems.com> Message-ID: <080FF9DB-5B5C-47B7-AC1C-174755C9B826@oracle.com> Looks good. igor > On Mar 31, 2016, at 7:08 AM, Ivan Krylov wrote: > > I have updated the webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/ > > I overlooked the missing LIR_Assembler::on_spin_wait() for non-x86 platforms. I have no access to non-intel boxes > and hence saw the problems only at integration time. > c1_LIRAssembler.o: In function `LIR_Assembler::emit_op0(LIR_Op0*)': > hotspot/src/share/vm/c1/c1_LIRAssembler.cpp:683: undefined reference to `LIR_Assembler::on_spin_wait()' > > So, 3 empty method implementations were added to the corresponding files - the top 3 on the webrev above. > > Paul, thanks for identifying those issues. > > Regards, > > Ivan > > > > > On 10/03/2016 04:04, Igor Veresov wrote: >> Ok, good. >> >> igor >> >>> On Mar 9, 2016, at 3:53 PM, Ivan Krylov wrote: >>> >>> Paul, Indeed, thanks. I have modified the test. >>> I also made changes to reflect the fact that onSpinWait is now decided to be placed into j.l.Thread. >>> >>> Igor, >>> This is a new webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/ >>> This is the diff between previous and this patches (03 vs 04): >>> http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/diff.txt >>> >>> Thanks, >>> >>> Ivan >>> >>> On 12/02/2016 06:01, Paul Sandoz wrote: >>>>> On 12 Feb 2016, at 00:55, Ivan Krylov wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> Thanks both for your help and your reviews. >>>>> Here is a new version, tested on mac for c1 and c2: >>>>> >>>>> http://cr.openjdk.java.net/~ikrylov/8147844.hs.03 >>>>> >>>> Now that support C1 is supported should the test be updated with C1 only execution? >>>> >>>> Paul. > From rahul.v.raghavan at oracle.com Fri Apr 1 08:22:59 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 1 Apr 2016 01:22:59 -0700 (PDT) Subject: RFR (XXS): 8150690: C++11 user-defined literal syntax in jvmciCompilerToVM.cpp In-Reply-To: <84F04E81-69FB-402F-956B-1C9CD21AD4C2@oracle.com> References: <99c44ac7-0279-421f-9469-9f5445d1312a@default> <84F04E81-69FB-402F-956B-1C9CD21AD4C2@oracle.com> Message-ID: <00b78f82-b68b-4c42-867f-438efccaa3ba@default> > -----Original Message----- > From: Christian Thalinger > Sent: Friday, April 01, 2016 5:01 AM > > Looks correct. Thank you Chris. > > > On Mar 30, 2016, at 6:21 PM, Rahul Raghavan wrote: > > > > Hi, > > > > : https://bugs.openjdk.java.net/browse/JDK-8150690 > > : http://cr.openjdk.java.net/~rraghavan/8150690/webrev.00/ > > > > - Added space required between literal and identifier for C++11, in CompilerToVM::methods array initializer. > > (only white space changes) > > - Confirmed no other similar issues elsewhere in jvmciCompilerToVM.cpp. > > - This proposed fix is similar to fix done for JDK-8081202, JDK-8135209, JDK-8132969. > > > > - Could not try and reconfirm with Visual Studio 2015. > > But manually confirmed the changes and > > understood another related infrastructure/build task is reported separately - JDK-8145549 (to build OpenJDK using Visual Studio > 2015 Community edition) > > > > - No issues with jprt run (-testset hotspot). > > > > Thanks, > > Rahul > From jamsheed.c.m at oracle.com Fri Apr 1 09:02:04 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 1 Apr 2016 14:32:04 +0530 Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...) failed: a) MT-unsafe modification of inline cache In-Reply-To: <56F9521E.5020808@oracle.com> References: <56F456B7.6010104@oracle.com> <3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com> <56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com> <56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com> <56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com> <56F9521E.5020808@oracle.com> Message-ID: <56FE390C.20906@oracle.com> Hi Vladimir Ivanov, I used overloaded clearInlineCaches wb api. revised webrevs: hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/ root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/ Best Regards, Jamsheed On 3/28/2016 9:17 PM, Vladimir Ivanov wrote: >>> in addition it clears this. >>> void static_stub_Relocation::clear_inline_cache() { >>> // Call stub is only used when calling the interpreted code. >>> // It does not really need to be cleared, except that we want to >>> clean out the methodoop. >>> CompiledStaticCall::set_stub_to_clean(this); >> >> i want assert to catch this issue. if static stubs are cleared, assert >> wouldn't fail. > I see. Then I suggest to rename the method to > WhiteBox.cleanupInlineCaches() and iterate over the whole code cache > (don't specify Method*). > > void CodeCache::cleanup_inline_caches() { > assert_locked_or_safepoint(CodeCache_lock); > NMethodIterator iter; > while(iter.next_alive()) { > iter.method()->cleanup_inline_caches(true); > } > } > > Best regards, > Vladimir Ivanov > > >> -Jamsheed >>> } >>> >>> Best Regards, >>> Jamsheed >>>> >>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb)) >>>> VM_ClearICs clear_ics; >>>> VMThread::execute(&clear_ics); >>>> WB_END >>>> >>>> class VM_ClearICs: public VM_Operation { >>>> ... >>>> void doit() { CodeCache::clear_inline_caches(); } >>>> ... >>>> }; >>>> >>>> void CodeCache::clear_inline_caches() { >>>> assert_locked_or_safepoint(CodeCache_lock); >>>> NMethodIterator iter; >>>> while(iter.next_alive()) { >>>> iter.method()->clear_inline_caches(); >>>> } >>>> } >>>> >>>> void nmethod::clear_inline_caches() { >>>> assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's >>>> only allowed at safepoint"); >>>> if (is_zombie()) { >>>> return; >>>> } >>>> >>>> RelocIterator iter(this); >>>> while (iter.next()) { >>>> iter.reloc()->clear_inline_cache(); >>>> } >>>> } >>>> >>>> void static_call_Relocation::clear_inline_cache() { >>>> // Safe call site info >>>> CompiledStaticCall* handler = compiledStaticCall_at(this); >>>> handler->set_to_clean(); >>>> } >>>> >>>> void opt_virtual_call_Relocation::clear_inline_cache() { >>>> // No stubs for ICs >>>> // Clean IC >>>> ResourceMark rm; >>>> CompiledIC* icache = CompiledIC_at(this); >>>> icache->set_to_clean(); >>>> } >>>> >>>> void virtual_call_Relocation::clear_inline_cache() { >>>> // No stubs for ICs >>>> // Clean IC >>>> ResourceMark rm; >>>> CompiledIC* icache = CompiledIC_at(this); >>>> icache->set_to_clean(); >>>> } >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> >>>>> Best Regards, >>>>> Jamsheed >>>>> >>>>> On 3/26/2016 1:50 PM, Dean Long wrote: >>>>>> Instead of changing cleanup_inline_caches() to take a new flag, can >>>>>> you use the existing >>>>>> clear_inline_caches()? >>>>>> >>>>>> dl >>>>>> >>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote: >>>>>>> Thank you Chris. >>>>>>> I have updated the code. >>>>>>> >>>>>>> + if (method == NULL) { >>>>>>> + return; >>>>>>> + } >>>>>>> + nmethod* nm = method->code(); >>>>>>> + if (nm == NULL || nm->is_unloaded()) { >>>>>>> + return; >>>>>>> + } >>>>>>> + nm->cleanup_inline_caches(true); >>>>>>> Best Regards, >>>>>>> Jamsheed >>>>>>> >>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote: >>>>>>>> >>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> Request for review, >>>>>>>>> >>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247 >>>>>>>>> >>>>>>>>> webrevs: >>>>>>>>> fix: >>>>>>>>> jdk part: >>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/ >>>>>>>>> >>>>>>>>> >>>>>>>>> newly added test case >>>>>>>>> hotspot part: >>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/ >>>>>>>>> >>>>>>>>> under hs-comp/test >>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java >>>>>>>>> Testing: JPRT with new test case, with fix, without fix >>>>>>>>> >>>>>>>>> Problem Summary: MH.invoke linksite take assistance of java code >>>>>>>>> to get an adapter method. Here a new method holder class and a >>>>>>>>> adapter method are created for a MT and lform instance is cached. >>>>>>>>> Normally this cached lform get returned for a linksite request of >>>>>>>>> same MT. When these cached lform get collected(due to memory >>>>>>>>> pressure), a new class and method gets created for same MT(even >>>>>>>>> though old method holder class and adapter method are live). >>>>>>>>> Fix Summary: Kept a strong reference to lform instance in adapter >>>>>>>>> method holder class of MT. >>>>>>>> >>>>>>>> Wow! You found the cause for his long-standing issue? Nice. >>>>>>>> + if (method == NULL) { return; } >>>>>>>> + nmethod* nm = method->code(); >>>>>>>> + if (nm == NULL) { return; } >>>>>>>> + if (nm->is_unloaded()) { return; } >>>>>>>> Please put the return and } on separate lines. >>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> From igor.ignatyev at oracle.com Fri Apr 1 09:22:42 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 1 Apr 2016 12:22:42 +0300 Subject: RFR(S): 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays In-Reply-To: <56FC242C.6030108@oracle.com> References: <56FC242C.6030108@oracle.com> Message-ID: <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com> Hi Dmitrij, the fix looks good to me Thanks, ? Igor > On Mar 30, 2016, at 10:08 PM, Dmitrij Pochepko wrote: > > Hi, > > please review small fix for 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays > > A problem was in Arrays.fill method usage with mismatched argument types for primitive types arrays, so, generated tests compilation failed. > > This fix removes respective Arrays.fill usage generation for primitive types. > > bug: https://bugs.openjdk.java.net/browse/JDK-8151828 > webrev: http://cr.openjdk.java.net/~dpochepk/8151828/webrev.01/ > > I've tested fix locally. > > Thanks, > Dmitrij > > From vladimir.x.ivanov at oracle.com Fri Apr 1 09:48:53 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 1 Apr 2016 12:48:53 +0300 Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...) failed: a) MT-unsafe modification of inline cache In-Reply-To: <56FE390C.20906@oracle.com> References: <56F456B7.6010104@oracle.com> <3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com> <56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com> <56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com> <56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com> <56F9521E.5020808@oracle.com> <56FE390C.20906@oracle.com> Message-ID: <56FE4405.3080807@oracle.com> Looks good! Small detail: the following comment in the test is misleading: 71 test(); // new LF creation should fail. LF shouldn't be unloaded, so no new LF is normally not instantiated. Something like the following: // Trigger call site re-resolution. Invoker LambdaForm should stay the same. test(); No need to send new webrev. Best regards, Vladimir Ivanov On 4/1/16 12:02 PM, Jamsheed C m wrote: > Hi Vladimir Ivanov, > > I used overloaded clearInlineCaches wb api. > > revised webrevs: > hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/ > root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/ > > Best Regards, > Jamsheed > > On 3/28/2016 9:17 PM, Vladimir Ivanov wrote: >>>> in addition it clears this. >>>> void static_stub_Relocation::clear_inline_cache() { >>>> // Call stub is only used when calling the interpreted code. >>>> // It does not really need to be cleared, except that we want to >>>> clean out the methodoop. >>>> CompiledStaticCall::set_stub_to_clean(this); >>> >>> i want assert to catch this issue. if static stubs are cleared, assert >>> wouldn't fail. >> I see. Then I suggest to rename the method to >> WhiteBox.cleanupInlineCaches() and iterate over the whole code cache >> (don't specify Method*). >> >> void CodeCache::cleanup_inline_caches() { >> assert_locked_or_safepoint(CodeCache_lock); >> NMethodIterator iter; >> while(iter.next_alive()) { >> iter.method()->cleanup_inline_caches(true); >> } >> } >> >> Best regards, >> Vladimir Ivanov >> >> >>> -Jamsheed >>>> } >>>> >>>> Best Regards, >>>> Jamsheed >>>>> >>>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb)) >>>>> VM_ClearICs clear_ics; >>>>> VMThread::execute(&clear_ics); >>>>> WB_END >>>>> >>>>> class VM_ClearICs: public VM_Operation { >>>>> ... >>>>> void doit() { CodeCache::clear_inline_caches(); } >>>>> ... >>>>> }; >>>>> >>>>> void CodeCache::clear_inline_caches() { >>>>> assert_locked_or_safepoint(CodeCache_lock); >>>>> NMethodIterator iter; >>>>> while(iter.next_alive()) { >>>>> iter.method()->clear_inline_caches(); >>>>> } >>>>> } >>>>> >>>>> void nmethod::clear_inline_caches() { >>>>> assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's >>>>> only allowed at safepoint"); >>>>> if (is_zombie()) { >>>>> return; >>>>> } >>>>> >>>>> RelocIterator iter(this); >>>>> while (iter.next()) { >>>>> iter.reloc()->clear_inline_cache(); >>>>> } >>>>> } >>>>> >>>>> void static_call_Relocation::clear_inline_cache() { >>>>> // Safe call site info >>>>> CompiledStaticCall* handler = compiledStaticCall_at(this); >>>>> handler->set_to_clean(); >>>>> } >>>>> >>>>> void opt_virtual_call_Relocation::clear_inline_cache() { >>>>> // No stubs for ICs >>>>> // Clean IC >>>>> ResourceMark rm; >>>>> CompiledIC* icache = CompiledIC_at(this); >>>>> icache->set_to_clean(); >>>>> } >>>>> >>>>> void virtual_call_Relocation::clear_inline_cache() { >>>>> // No stubs for ICs >>>>> // Clean IC >>>>> ResourceMark rm; >>>>> CompiledIC* icache = CompiledIC_at(this); >>>>> icache->set_to_clean(); >>>>> } >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>> >>>>>> Best Regards, >>>>>> Jamsheed >>>>>> >>>>>> On 3/26/2016 1:50 PM, Dean Long wrote: >>>>>>> Instead of changing cleanup_inline_caches() to take a new flag, can >>>>>>> you use the existing >>>>>>> clear_inline_caches()? >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote: >>>>>>>> Thank you Chris. >>>>>>>> I have updated the code. >>>>>>>> >>>>>>>> + if (method == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + nmethod* nm = method->code(); >>>>>>>> + if (nm == NULL || nm->is_unloaded()) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + nm->cleanup_inline_caches(true); >>>>>>>> Best Regards, >>>>>>>> Jamsheed >>>>>>>> >>>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote: >>>>>>>>> >>>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Request for review, >>>>>>>>>> >>>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247 >>>>>>>>>> >>>>>>>>>> webrevs: >>>>>>>>>> fix: >>>>>>>>>> jdk part: >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> newly added test case >>>>>>>>>> hotspot part: >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/ >>>>>>>>>> >>>>>>>>>> under hs-comp/test >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java >>>>>>>>>> Testing: JPRT with new test case, with fix, without fix >>>>>>>>>> >>>>>>>>>> Problem Summary: MH.invoke linksite take assistance of java code >>>>>>>>>> to get an adapter method. Here a new method holder class and a >>>>>>>>>> adapter method are created for a MT and lform instance is cached. >>>>>>>>>> Normally this cached lform get returned for a linksite request of >>>>>>>>>> same MT. When these cached lform get collected(due to memory >>>>>>>>>> pressure), a new class and method gets created for same MT(even >>>>>>>>>> though old method holder class and adapter method are live). >>>>>>>>>> Fix Summary: Kept a strong reference to lform instance in adapter >>>>>>>>>> method holder class of MT. >>>>>>>>> >>>>>>>>> Wow! You found the cause for his long-standing issue? Nice. >>>>>>>>> + if (method == NULL) { return; } >>>>>>>>> + nmethod* nm = method->code(); >>>>>>>>> + if (nm == NULL) { return; } >>>>>>>>> + if (nm->is_unloaded()) { return; } >>>>>>>>> Please put the return and } on separate lines. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Jamsheed >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> > From jamsheed.c.m at oracle.com Fri Apr 1 10:05:49 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 1 Apr 2016 15:35:49 +0530 Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...) failed: a) MT-unsafe modification of inline cache In-Reply-To: <56FE4405.3080807@oracle.com> References: <56F456B7.6010104@oracle.com> <3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com> <56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com> <56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com> <56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com> <56F9521E.5020808@oracle.com> <56FE390C.20906@oracle.com> <56FE4405.3080807@oracle.com> Message-ID: <56FE47FD.2010905@oracle.com> Sure. Thank you Vladimir Ivanov! Best Regards, Jamsheed On 4/1/2016 3:18 PM, Vladimir Ivanov wrote: > Looks good! > > Small detail: the following comment in the test is misleading: > > 71 test(); // new LF creation should fail. > > LF shouldn't be unloaded, so no new LF is normally not instantiated. > > Something like the following: > // Trigger call site re-resolution. Invoker LambdaForm should stay > the same. > test(); > > No need to send new webrev. > > Best regards, > Vladimir Ivanov > > On 4/1/16 12:02 PM, Jamsheed C m wrote: >> Hi Vladimir Ivanov, >> >> I used overloaded clearInlineCaches wb api. >> >> revised webrevs: >> hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/ >> root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/ >> >> Best Regards, >> Jamsheed >> >> On 3/28/2016 9:17 PM, Vladimir Ivanov wrote: >>>>> in addition it clears this. >>>>> void static_stub_Relocation::clear_inline_cache() { >>>>> // Call stub is only used when calling the interpreted code. >>>>> // It does not really need to be cleared, except that we want to >>>>> clean out the methodoop. >>>>> CompiledStaticCall::set_stub_to_clean(this); >>>> >>>> i want assert to catch this issue. if static stubs are cleared, assert >>>> wouldn't fail. >>> I see. Then I suggest to rename the method to >>> WhiteBox.cleanupInlineCaches() and iterate over the whole code cache >>> (don't specify Method*). >>> >>> void CodeCache::cleanup_inline_caches() { >>> assert_locked_or_safepoint(CodeCache_lock); >>> NMethodIterator iter; >>> while(iter.next_alive()) { >>> iter.method()->cleanup_inline_caches(true); >>> } >>> } >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> >>>> -Jamsheed >>>>> } >>>>> >>>>> Best Regards, >>>>> Jamsheed >>>>>> >>>>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb)) >>>>>> VM_ClearICs clear_ics; >>>>>> VMThread::execute(&clear_ics); >>>>>> WB_END >>>>>> >>>>>> class VM_ClearICs: public VM_Operation { >>>>>> ... >>>>>> void doit() { CodeCache::clear_inline_caches(); } >>>>>> ... >>>>>> }; >>>>>> >>>>>> void CodeCache::clear_inline_caches() { >>>>>> assert_locked_or_safepoint(CodeCache_lock); >>>>>> NMethodIterator iter; >>>>>> while(iter.next_alive()) { >>>>>> iter.method()->clear_inline_caches(); >>>>>> } >>>>>> } >>>>>> >>>>>> void nmethod::clear_inline_caches() { >>>>>> assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's >>>>>> only allowed at safepoint"); >>>>>> if (is_zombie()) { >>>>>> return; >>>>>> } >>>>>> >>>>>> RelocIterator iter(this); >>>>>> while (iter.next()) { >>>>>> iter.reloc()->clear_inline_cache(); >>>>>> } >>>>>> } >>>>>> >>>>>> void static_call_Relocation::clear_inline_cache() { >>>>>> // Safe call site info >>>>>> CompiledStaticCall* handler = compiledStaticCall_at(this); >>>>>> handler->set_to_clean(); >>>>>> } >>>>>> >>>>>> void opt_virtual_call_Relocation::clear_inline_cache() { >>>>>> // No stubs for ICs >>>>>> // Clean IC >>>>>> ResourceMark rm; >>>>>> CompiledIC* icache = CompiledIC_at(this); >>>>>> icache->set_to_clean(); >>>>>> } >>>>>> >>>>>> void virtual_call_Relocation::clear_inline_cache() { >>>>>> // No stubs for ICs >>>>>> // Clean IC >>>>>> ResourceMark rm; >>>>>> CompiledIC* icache = CompiledIC_at(this); >>>>>> icache->set_to_clean(); >>>>>> } >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>>> >>>>>>> Best Regards, >>>>>>> Jamsheed >>>>>>> >>>>>>> On 3/26/2016 1:50 PM, Dean Long wrote: >>>>>>>> Instead of changing cleanup_inline_caches() to take a new flag, >>>>>>>> can >>>>>>>> you use the existing >>>>>>>> clear_inline_caches()? >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote: >>>>>>>>> Thank you Chris. >>>>>>>>> I have updated the code. >>>>>>>>> >>>>>>>>> + if (method == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + nmethod* nm = method->code(); >>>>>>>>> + if (nm == NULL || nm->is_unloaded()) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + nm->cleanup_inline_caches(true); >>>>>>>>> Best Regards, >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote: >>>>>>>>>> >>>>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> Request for review, >>>>>>>>>>> >>>>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247 >>>>>>>>>>> >>>>>>>>>>> webrevs: >>>>>>>>>>> fix: >>>>>>>>>>> jdk part: >>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> newly added test case >>>>>>>>>>> hotspot part: >>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/ >>>>>>>>>>> >>>>>>>>>>> under hs-comp/test >>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java >>>>>>>>>>> Testing: JPRT with new test case, with fix, without fix >>>>>>>>>>> >>>>>>>>>>> Problem Summary: MH.invoke linksite take assistance of java >>>>>>>>>>> code >>>>>>>>>>> to get an adapter method. Here a new method holder class and a >>>>>>>>>>> adapter method are created for a MT and lform instance is >>>>>>>>>>> cached. >>>>>>>>>>> Normally this cached lform get returned for a linksite >>>>>>>>>>> request of >>>>>>>>>>> same MT. When these cached lform get collected(due to memory >>>>>>>>>>> pressure), a new class and method gets created for same >>>>>>>>>>> MT(even >>>>>>>>>>> though old method holder class and adapter method are live). >>>>>>>>>>> Fix Summary: Kept a strong reference to lform instance in >>>>>>>>>>> adapter >>>>>>>>>>> method holder class of MT. >>>>>>>>>> >>>>>>>>>> Wow! You found the cause for his long-standing issue? Nice. >>>>>>>>>> + if (method == NULL) { return; } >>>>>>>>>> + nmethod* nm = method->code(); >>>>>>>>>> + if (nm == NULL) { return; } >>>>>>>>>> + if (nm->is_unloaded()) { return; } >>>>>>>>>> Please put the return and } on separate lines. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Jamsheed >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> From aleksey.shipilev at oracle.com Fri Apr 1 11:35:28 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 1 Apr 2016 14:35:28 +0300 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign Message-ID: <56FE5D00.7000209@oracle.com> Hi, compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify String Concat strategies, because some of them are loading new methods and use them during String concat linkage and execution. Notably, this will happen inside of the asserts. We need to prime the asserts before using them in-between counter polls. Bug: https://bugs.openjdk.java.net/browse/JDK-8153265 Webrev: http://cr.openjdk.java.net/~shade/8153265/webrev.00/ Testing: offending test in oob/-Xcomp modes Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From dmitrij.pochepko at oracle.com Fri Apr 1 12:27:10 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Fri, 1 Apr 2016 15:27:10 +0300 Subject: RFR(S): 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays In-Reply-To: <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com> References: <56FC242C.6030108@oracle.com> <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com> Message-ID: <56FE691E.2050208@oracle.com> Thank you! > Hi Dmitrij, > > the fix looks good to me > > Thanks, > ? Igor >> On Mar 30, 2016, at 10:08 PM, Dmitrij Pochepko wrote: >> >> Hi, >> >> please review small fix for 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays >> >> A problem was in Arrays.fill method usage with mismatched argument types for primitive types arrays, so, generated tests compilation failed. >> >> This fix removes respective Arrays.fill usage generation for primitive types. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8151828 >> webrev: http://cr.openjdk.java.net/~dpochepk/8151828/webrev.01/ >> >> I've tested fix locally. >> >> Thanks, >> Dmitrij >> >> From zoltan.majo at oracle.com Fri Apr 1 12:32:01 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 1 Apr 2016 14:32:01 +0200 Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop optimizations to 'develop' In-Reply-To: <56FD4906.4040909@oracle.com> References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com> <56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com> Message-ID: <56FE6A41.6030703@oracle.com> Hi Vladimir, thank you for the feedback! On 03/31/2016 05:57 PM, Vladimir Kozlov wrote: > It is nice to have not product flags which is easy to remove :) Yes, indeed. :-) > > Clean up looks good. Thank you. > > Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions > -XX:-LoopLimitCheck" only? It has interesting code shape. Add comment > that it was ran with "-XX:+UnlockDiagnosticVMOptions > -XX:-LoopLimitCheck" to trigger problem. Yes, of course. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/ Thank you! Best regards, Zoltan > > Thanks, > Vladimir > > On 3/31/16 1:15 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for your feedback! >> >> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote: >>> These flags were added when I fixed long standing C2 problem with >>> counted loops: 5091921. >>> They were added to have ability to revert back to original code if >>> new code cause a problem. >>> Looks like the old code which executed with these flags switched off >>> become rotten. >>> >>> Zoltan, did you find what cause the crash? Looks like product VM was >>> used in the bug report. What result gives >>> fastdebug VM? >> >> I've tried starting different VM versions with the flag(s) off. The >> most frequent error I get is >> >> # Internal Error >> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), >> pid=32727, tid=32746 >> # assert(false) failed: Bad graph detected in build_loop_late >> >> So it seems that the code executed with the flags off has indeed >> become rotten. >> >>> Converting flags to develop will not prevent problems happening with >>> fastdebug VM where these flags could be switched >>> off even when they are develop. >>> >>> If the problem with original code (flags are off) is something >>> fundamental we may simple remove old code and remove >>> these flags and have only new code. 5 years already passed since >>> 5091921 was fixed. >> >> Yes, I agree. I think it's reasonable to remove the old code. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/ >> >> The changes pass JPRT. >> >> I've changed the title of the bug to "Cleanup: Remove some unused >> flags/code in loop optimizations" to better reflect >> what the change is doing. I have kept the original title in the RFR. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8072422. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8072422 >>>> >>>> Problem: Some flags controlling loop optimizations are currently >>>> 'diagnostic'. Even though these flags are useful >>>> mostly for compiler-related development, their value can be changed >>>> not only in >>>> fastdebug, but also also in release builds, >>>> >>>> Solution: Change the flags to 'develop'. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/ >>>> >>>> Testing: >>>> - locally built/started VM; >>>> - locally executed >>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> From martin.doerr at sap.com Fri Apr 1 12:37:30 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 1 Apr 2016 12:37:30 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Message-ID: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Fri Apr 1 13:55:01 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 1 Apr 2016 15:55:01 +0200 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method Message-ID: <56FE7DB5.404@oracle.com> Hi all, Please review this fix. Summary: There is a mismatch in the CompilerWhiteBox testcases between the callable and the executable constructors. SimpleTestCase$Helper implements all constructors and methods that are tested. However since Helper is an inner class there will be an extra (javac created) constructor that has the parent class as an appended argument. The callable will invoke this constructor, but the executable will reference the normal constructor. Solution: Stop have the Helper as an inner class. Rename it to SimpleTestCaseHelper for some uniqueness in compiler commands and directives. Testing: Run all hotspot/compiler/whitebox tests on all platforms, and all hotspot/compiler tests on one platform. Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ Best regards, Nils Eliasson From aleksey.shipilev at oracle.com Fri Apr 1 14:37:54 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 1 Apr 2016 17:37:54 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <56F5676D.7020401@oracle.com> References: <56F5676D.7020401@oracle.com> Message-ID: <56FE87C2.50002@oracle.com> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: > I would like to solicit comments for C1 support for new > Unsafe.compareAndExchange intrinsics (we have support for them in C2). > The rest of new Unsafe methods that are not intrinsified by C1 are > handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be > emulated with existing APIs. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8152753 > > Webrev: > http://cr.openjdk.java.net/~shade/8152753/webrev.00/ Update: http://cr.openjdk.java.net/~shade/8152753/webrev.01/ Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some other cleanups. Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT hs-comp testset (some unrelated timeouts on SPARC). Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From christian.thalinger at oracle.com Fri Apr 1 16:33:35 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 1 Apr 2016 06:33:35 -1000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: > On Apr 1, 2016, at 2:37 AM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s exception cache. Readers of the cache may read stale data on weak memory platforms. > > The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. > Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Does it make sense to keep: void set_exception_cache(ExceptionCache *ec) { _exception_cache = ec; } or would it be safer to always do the store-release even when clearing the cache? > > Please review. I will also need a sponsor. > > Best regards, > Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Fri Apr 1 16:42:38 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 1 Apr 2016 17:42:38 +0100 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: <56FEA4FE.2010807@redhat.com> On 04/01/2016 01:37 PM, Doerr, Martin wrote: > Therefore, the nmethod's field _exception_cache needs to be volatile > and adding new entries must be done by releasing stores. (Loading > seems to be fine without acquire because there's an address > dependency from the load of the cache to the usage of its contents > which is sufficient to ensure ordering on all openjdk platforms.) I think that's very risky. We can't be really sure what an optimizer might do in this area, as discussed at (very) considerable length in concurrency forums. memory_order_consume does this correctly in C++11 but we're not yet using C++11. I'd use acquire and leave a note that in future this can be replaced by memory_order_consume. Andrew. From igor.veresov at oracle.com Fri Apr 1 18:28:35 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 1 Apr 2016 11:28:35 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime Message-ID: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ Thanks, igor From tom.rodriguez at oracle.com Fri Apr 1 19:47:25 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 1 Apr 2016 12:47:25 -0700 Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should return dependencies_failed Message-ID: http://cr.openjdk.java.net/~never/8153315/webrev This fixes a minor issue which showed up while debugging Java code. evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies. Graal considers it an error to build invalid dependencies so it complained. Tested under the Eclipse debugger. tom From igor.veresov at oracle.com Fri Apr 1 20:18:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 1 Apr 2016 13:18:48 -0700 Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should return dependencies_failed In-Reply-To: References: Message-ID: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com> Looks good. igor > On Apr 1, 2016, at 12:47 PM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8153315/webrev > > This fixes a minor issue which showed up while debugging Java code. evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies. Graal considers it an error to build invalid dependencies so it complained. Tested under the Eclipse debugger. > > tom From michael.c.berg at intel.com Fri Apr 1 21:51:14 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 1 Apr 2016 21:51:14 +0000 Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler Message-ID: Hi All, I would like to contribute some clean up on the x86 assembler applied to vex encoding to address the usage of the nds assembler parameter. For all instructions which use nds source xmm registers, the validity check has been removed. It was originally placed there here: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.1269 And propagated. Now nds register usage is fully compliant with each isa descrption. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001 webrev: http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/ Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sat Apr 2 03:18:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Apr 2016 20:18:28 -0700 Subject: CR for RFR 8151573 In-Reply-To: References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> Message-ID: <56FF3A04.5090601@oracle.com> I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend time >>>> and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>>>> Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to execute >>>>>> up until the point it actually has to take a range check exception >>>>>> by re-ranging the limit of the rce'd loop, then exit the rce'd >>>>>> post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on mask >>>>>> vectors which map to the residual iterations. Programmable SIMD >>>>>> will be a follow on change set utilizing this code to stage its >>>>>> work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From michael.c.berg at intel.com Sat Apr 2 03:25:16 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sat, 2 Apr 2016 03:25:16 +0000 Subject: CR for RFR 8151573 In-Reply-To: <56FF3A04.5090601@oracle.com> References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 01, 2016 8:18 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() >> from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend >>>> time and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>> Of Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to >>>>>> execute up until the point it actually has to take a range check >>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>> mask vectors which map to the residual iterations. Programmable >>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From michael.c.berg at intel.com Sat Apr 2 05:16:01 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sat, 2 Apr 2016 05:16:01 +0000 Subject: CR for RFR 8151573 In-Reply-To: References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: That small revision is reflected in: https://bugs.openjdk.java.net/browse/JDK-8151573 and can be accessed at: http://cr.openjdk.java.net/~mcberg/8151573/webrev.04a/ Regards, Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Friday, April 01, 2016 8:25 PM To: Vladimir Kozlov ; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: CR for RFR 8151573 I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 01, 2016 8:18 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() >> from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend >>>> time and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>> Of Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to >>>>>> execute up until the point it actually has to take a range check >>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>> mask vectors which map to the residual iterations. Programmable >>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From jamsheed.c.m at oracle.com Mon Apr 4 06:14:14 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Mon, 4 Apr 2016 11:44:14 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: <57020636.7010806@oracle.com> Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s exception > cache. Readers of the cache may read stale data on weak memory platforms. > > The writers of the cache are synchronized by locks, but there may be > concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache without locking. > > Therefore, the nmethod's field _exception_cache needs to be volatile > and adding new entries must be done by releasing stores. (Loading > seems to be fine without acquire because there's an address dependency > from the load of the cache to the usage of its contents which is > sufficient to ensure ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to read the > volatile field _state only once. It is certainly undesired to force > the compiler to load it from memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.v.raghavan at oracle.com Mon Apr 4 08:09:08 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Mon, 4 Apr 2016 01:09:08 -0700 (PDT) Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: <56FD74F2.2080102@oracle.com> References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> Message-ID: <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Hi, Please review the revised fix for JDK- 8149488. : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). Points from Michael for the record - " > I believe Dean is right, I have debugged this and analyzed the usage model, > we never made use of the upper components > and register allocation has been right for VecZ for a good deal of time. > > All we need for a change is, > Regmask.cpp: > > uint RegMask::Size() const { > extern uint8_t bitsInByte[256]; > > A one line change. > > -Michael. > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > where we make use of VecZ and the upper bank of registers." So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. Confirmed no issues with 'JPRT -testset hotspot' run. Thanks, Rahul > -----Original Message----- > From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > > Michael, isn't the correct size for this table 256? I missed how VecZ > relates to the table size. > > dl > > On 3/31/2016 9:58 AM, Berg, Michael C wrote: > > Up until now we have gotten along with the size constraint only. > > Let us have both the size and the table though for completeness. > > I think we can leave the name though. > > > > -Michael > > > > -----Original Message----- > > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > > Sent: Thursday, March 31, 2016 9:18 AM > > To: Dean Long ; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > > > > Hi Michael, > > > > With respect to below thread, request help with some questions. > > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size. > > Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on > targets that support it. > > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > > > > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > > (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in > RegMask::Size()) > > > > ----- src/share/vm/libadt/vectset.hpp > > +#define BITS_IN_BYTE_ARRAY_SIZE 256 > > + > > > > ----- src/share/vm/opto/regmask.cpp > > - extern uint8_t bitsInByte[512]; > > + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > > > > ----- src/share/vm/libadt/vectset.cpp > > -uint8_t bitsInByte[256] = { > > +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > > > > I can send revised webrev for above if all okay. Please tell me if I am missing something. > > > > > > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > > > > Thanks, > > Rahul > > > >> -----Original Message----- > >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > >> > >>> -----Original Message----- > >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > >>> > >>> When do we access elements 256 .. 511? Wouldn't that mean we have > >>> 9-bit bytes? > >> Got your point Dean, Thanks. > >> I too got some questions here now; will check and reply soon. > >> > >> -Rahul > >> > >>> dl > >>> > >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > >>>> Hi, > >>>> > >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > >>>> > >>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>> : > >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > >>>> > >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > >>>> Confirmed no issues with 'JPRT -testset hotspot' run. > >>>> > >>>> Thanks, > >>>> Rahul > >>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > >>> dev at openjdk.java.net > >>>>> Should we not extend: > >>>>> > >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > >>>>> uint8_t bitsInByte[256] = { // ... > >>>>> > >>>>> to 512 > >>>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > >>>>> > >>>>> So how do we intend to map a VecZ register without 512 bits? > >>>>> > >>>>> -Michael > >>>>> > >>>>> -----Original Message----- > >>>>> From: hotspot-compiler-dev > >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > >>>>> Of Vladimir Ivanov > >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > >>>>> hotspot-compiler-dev at openjdk.java.net > >>>>> > >>>>> Rahul, > >>>>> > >>>>> Can we define a constant instead and use it in both places? > >>>>> > >>>>> Best regards, > >>>>> Vladimir Ivanov > >>>>> > >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Please review the patch for JDK- 8149488. > >>>>>> > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > >>>>>> > >>>>>> Corrected the bitsInByte array size in declaration. > >>>>>> > >>>>>> Thanks, > >>>>>> Rahul > >>>>>> > From zoltan.majo at oracle.com Mon Apr 4 10:49:40 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 4 Apr 2016 12:49:40 +0200 Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop optimizations to 'develop' In-Reply-To: <56FE6A41.6030703@oracle.com> References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com> <56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com> <56FE6A41.6030703@oracle.com> Message-ID: <570246C4.7050504@oracle.com> Thank you, Vladimir and Chris, for the reviews! For the record: I'll push the latest webrev (webrev.03) today. Best regards, Zoltan On 04/01/2016 02:32 PM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! > > On 03/31/2016 05:57 PM, Vladimir Kozlov wrote: >> It is nice to have not product flags which is easy to remove :) > > Yes, indeed. :-) > >> >> Clean up looks good. > > Thank you. > >> >> Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions >> -XX:-LoopLimitCheck" only? It has interesting code shape. Add comment >> that it was ran with "-XX:+UnlockDiagnosticVMOptions >> -XX:-LoopLimitCheck" to trigger problem. > > Yes, of course. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/ > > Thank you! > > Best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 3/31/16 1:15 AM, Zolt?n Maj? wrote: >>> Hi Vladimir, >>> >>> >>> thank you for your feedback! >>> >>> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote: >>>> These flags were added when I fixed long standing C2 problem with >>>> counted loops: 5091921. >>>> They were added to have ability to revert back to original code if >>>> new code cause a problem. >>>> Looks like the old code which executed with these flags switched >>>> off become rotten. >>>> >>>> Zoltan, did you find what cause the crash? Looks like product VM >>>> was used in the bug report. What result gives >>>> fastdebug VM? >>> >>> I've tried starting different VM versions with the flag(s) off. The >>> most frequent error I get is >>> >>> # Internal Error >>> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), >>> pid=32727, tid=32746 >>> # assert(false) failed: Bad graph detected in build_loop_late >>> >>> So it seems that the code executed with the flags off has indeed >>> become rotten. >>> >>>> Converting flags to develop will not prevent problems happening >>>> with fastdebug VM where these flags could be switched >>>> off even when they are develop. >>>> >>>> If the problem with original code (flags are off) is something >>>> fundamental we may simple remove old code and remove >>>> these flags and have only new code. 5 years already passed since >>>> 5091921 was fixed. >>> >>> Yes, I agree. I think it's reasonable to remove the old code. >>> >>> Here is the new webrev: >>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/ >>> >>> The changes pass JPRT. >>> >>> I've changed the title of the bug to "Cleanup: Remove some unused >>> flags/code in loop optimizations" to better reflect >>> what the change is doing. I have kept the original title in the RFR. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote: >>>>> Hi, >>>>> >>>>> >>>>> please review the patch for 8072422. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8072422 >>>>> >>>>> Problem: Some flags controlling loop optimizations are currently >>>>> 'diagnostic'. Even though these flags are useful >>>>> mostly for compiler-related development, their value can be >>>>> changed not only in >>>>> fastdebug, but also also in release builds, >>>>> >>>>> Solution: Change the flags to 'develop'. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/ >>>>> >>>>> Testing: >>>>> - locally built/started VM; >>>>> - locally executed >>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java. >>>>> >>>>> Thank you and best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>> > From zoltan.majo at oracle.com Mon Apr 4 10:50:39 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 4 Apr 2016 12:50:39 +0200 Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop optimizations to 'develop' In-Reply-To: <570246C4.7050504@oracle.com> References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com> <56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com> <56FE6A41.6030703@oracle.com> <570246C4.7050504@oracle.com> Message-ID: <570246FF.6030206@oracle.com> P.S.: Typo in my previous mail: I meant webrev.02 (and not webrev.03). Sorry. On 04/04/2016 12:49 PM, Zolt?n Maj? wrote: > Thank you, Vladimir and Chris, for the reviews! For the record: I'll > push the latest webrev (webrev.03) today. > > Best regards, > > > Zoltan > > On 04/01/2016 02:32 PM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! >> >> On 03/31/2016 05:57 PM, Vladimir Kozlov wrote: >>> It is nice to have not product flags which is easy to remove :) >> >> Yes, indeed. :-) >> >>> >>> Clean up looks good. >> >> Thank you. >> >>> >>> Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions >>> -XX:-LoopLimitCheck" only? It has interesting code shape. Add >>> comment that it was ran with "-XX:+UnlockDiagnosticVMOptions >>> -XX:-LoopLimitCheck" to trigger problem. >> >> Yes, of course. Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/ >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 3/31/16 1:15 AM, Zolt?n Maj? wrote: >>>> Hi Vladimir, >>>> >>>> >>>> thank you for your feedback! >>>> >>>> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote: >>>>> These flags were added when I fixed long standing C2 problem with >>>>> counted loops: 5091921. >>>>> They were added to have ability to revert back to original code if >>>>> new code cause a problem. >>>>> Looks like the old code which executed with these flags switched >>>>> off become rotten. >>>>> >>>>> Zoltan, did you find what cause the crash? Looks like product VM >>>>> was used in the bug report. What result gives >>>>> fastdebug VM? >>>> >>>> I've tried starting different VM versions with the flag(s) off. The >>>> most frequent error I get is >>>> >>>> # Internal Error >>>> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), >>>> pid=32727, tid=32746 >>>> # assert(false) failed: Bad graph detected in build_loop_late >>>> >>>> So it seems that the code executed with the flags off has indeed >>>> become rotten. >>>> >>>>> Converting flags to develop will not prevent problems happening >>>>> with fastdebug VM where these flags could be switched >>>>> off even when they are develop. >>>>> >>>>> If the problem with original code (flags are off) is something >>>>> fundamental we may simple remove old code and remove >>>>> these flags and have only new code. 5 years already passed since >>>>> 5091921 was fixed. >>>> >>>> Yes, I agree. I think it's reasonable to remove the old code. >>>> >>>> Here is the new webrev: >>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/ >>>> >>>> The changes pass JPRT. >>>> >>>> I've changed the title of the bug to "Cleanup: Remove some unused >>>> flags/code in loop optimizations" to better reflect >>>> what the change is doing. I have kept the original title in the RFR. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote: >>>>>> Hi, >>>>>> >>>>>> >>>>>> please review the patch for 8072422. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8072422 >>>>>> >>>>>> Problem: Some flags controlling loop optimizations are currently >>>>>> 'diagnostic'. Even though these flags are useful >>>>>> mostly for compiler-related development, their value can be >>>>>> changed not only in >>>>>> fastdebug, but also also in release builds, >>>>>> >>>>>> Solution: Change the flags to 'develop'. >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/ >>>>>> >>>>>> Testing: >>>>>> - locally built/started VM; >>>>>> - locally executed >>>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java. >>>>>> >>>>>> Thank you and best regards, >>>>>> >>>>>> >>>>>> Zoltan >>>>>> >>>> >> > From martin.doerr at sap.com Mon Apr 4 12:27:49 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Apr 2016 12:27:49 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: Hi Christian, thanks for taking a look. I had checked the other places which use set_exception_cache. They either set it to NULL or to an unmodified pre-existing object (which gets released after creation by a cumulative memory barrier after my change). Both should be ok. I have seen many places in hotspot where we have a set_... function and a release_set_... one. So I thought this was kind of common practice. But I don?t have a strong opinion on it. Best regards, Martin From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Freitag, 1. April 2016 18:34 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On Apr 1, 2016, at 2:37 AM, Doerr, Martin > wrote: Hello everyone, we have found a concurrency problem with the nmethod?s exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Does it make sense to keep: void set_exception_cache(ExceptionCache *ec) { _exception_cache = ec; } or would it be safer to always do the store-release even when clearing the cache? Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Mon Apr 4 12:57:31 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Apr 2016 12:57:31 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <56FEA4FE.2010807@redhat.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <56FEA4FE.2010807@redhat.com> Message-ID: <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap> Hi Andrew, there are many places in hotspot where we rely on ordering by address dependency. That sounds feasible to me since we're not supporting Alpha processors. The load of the ExceptionCache pointer is only used to access elements of the cache, not to establish ordering of other accesses. I don't think compilers are allowed to break anything here because I have made the field volatile. This prevents optimizers from reordering, optimizing out or duplicating some of the loads. All supported processors respect the ordering (due to address dependency), too, so I believe we're ok. I'm not sure if this is what you're concerned about. Did I miss anything? Best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Freitag, 1. April 2016 18:43 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On 04/01/2016 01:37 PM, Doerr, Martin wrote: > Therefore, the nmethod's field _exception_cache needs to be volatile > and adding new entries must be done by releasing stores. (Loading > seems to be fine without acquire because there's an address > dependency from the load of the cache to the usage of its contents > which is sufficient to ensure ordering on all openjdk platforms.) I think that's very risky. We can't be really sure what an optimizer might do in this area, as discussed at (very) considerable length in concurrency forums. memory_order_consume does this correctly in C++11 but we're not yet using C++11. I'd use acquire and leave a note that in future this can be replaced by memory_order_consume. Andrew. From aleksey.shipilev at oracle.com Mon Apr 4 13:02:09 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 4 Apr 2016 16:02:09 +0300 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign In-Reply-To: <56FE5D00.7000209@oracle.com> References: <56FE5D00.7000209@oracle.com> Message-ID: <570265D1.6040905@oracle.com> On 04/01/2016 02:35 PM, Aleksey Shipilev wrote: > Hi, > > compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify > String Concat strategies, because some of them are loading new methods > and use them during String concat linkage and execution. Notably, this > will happen inside of the asserts. We need to prime the asserts before > using them in-between counter polls. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8153265 > > Webrev: > http://cr.openjdk.java.net/~shade/8153265/webrev.00/ > > Testing: offending test in oob/-Xcomp modes Anyone? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From ivan at azulsystems.com Mon Apr 4 13:12:20 2016 From: ivan at azulsystems.com (Ivan Krylov) Date: Mon, 4 Apr 2016 16:12:20 +0300 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign In-Reply-To: <570265D1.6040905@oracle.com> References: <56FE5D00.7000209@oracle.com> <570265D1.6040905@oracle.com> Message-ID: <57026834.6090907@azulsystems.com> Looks right, but I am not a reviewer. Ivan On 04/04/2016 16:02, Aleksey Shipilev wrote: > On 04/01/2016 02:35 PM, Aleksey Shipilev wrote: >> Hi, >> >> compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify >> String Concat strategies, because some of them are loading new methods >> and use them during String concat linkage and execution. Notably, this >> will happen inside of the asserts. We need to prime the asserts before >> using them in-between counter polls. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8153265 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8153265/webrev.00/ >> >> Testing: offending test in oob/-Xcomp modes > Anyone? > > Thanks, > -Aleksey > > From aph at redhat.com Mon Apr 4 14:01:32 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Apr 2016 15:01:32 +0100 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <56FEA4FE.2010807@redhat.com> <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap> Message-ID: <570273BC.4090605@redhat.com> On 04/04/2016 01:57 PM, Doerr, Martin wrote: > there are many places in hotspot where we rely on ordering by > address dependency. That sounds feasible to me since we're not > supporting Alpha processors. The load of the ExceptionCache pointer > is only used to access elements of the cache, not to establish > ordering of other accesses. > > I don't think compilers are allowed to break anything here because I > have made the field volatile. This prevents optimizers from > reordering, optimizing out or duplicating some of the loads. All > supported processors respect the ordering (due to address > dependency), too, so I believe we're ok. That sounds alright, at least from an informal reasoning perspective. > I'm not sure if this is what you're concerned about. Did I miss > anything? I don't think so. I presume you've read Hans Boehm's paper where he points out that it's very hard to rely on dependencies for memory ordering in any high-level language [1]. For that reason I tend to err on the side of caution when reasoning about memory. You're sailing a bit too close to the rocks for my comfort. :-) Andrew. [1] http://www.hboehm.info/c++mm/dependencies.html From dean.long at oracle.com Mon Apr 4 18:34:45 2016 From: dean.long at oracle.com (Dean Long) Date: Mon, 4 Apr 2016 11:34:45 -0700 Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Message-ID: <5702B3C5.8070507@oracle.com> Looks OK. dl On 4/4/2016 1:09 AM, Rahul Raghavan wrote: > Hi, > > Please review the revised fix for JDK- 8149488. > > : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ > > Based on further checking and thanks to clarifications from Michael, > it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', > (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). > > Points from Michael for the record - " > > I believe Dean is right, I have debugged this and analyzed the usage model, > > we never made use of the upper components > > and register allocation has been right for VecZ for a good deal of time. > > > > All we need for a change is, > > Regmask.cpp: > > > > uint RegMask::Size() const { > > extern uint8_t bitsInByte[256]; > > > > A one line change. > > > > -Michael. > > > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > > where we make use of VecZ and the upper bank of registers." > > So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. > > Confirmed no issues with 'JPRT -testset hotspot' run. > > Thanks, > Rahul > >> -----Original Message----- >> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM >> >> Michael, isn't the correct size for this table 256? I missed how VecZ >> relates to the table size. >> >> dl >> >> On 3/31/2016 9:58 AM, Berg, Michael C wrote: >>> Up until now we have gotten along with the size constraint only. >>> Let us have both the size and the table though for completeness. >>> I think we can leave the name though. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] >>> Sent: Thursday, March 31, 2016 9:18 AM >>> To: Dean Long ; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C >>> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp >>> >>> Hi Michael, >>> >>> With respect to below thread, request help with some questions. >>> Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size. >>> Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on >> targets that support it. >>> But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. >>> Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? >>> >>> So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? >>> (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in >> RegMask::Size()) >>> ----- src/share/vm/libadt/vectset.hpp >>> +#define BITS_IN_BYTE_ARRAY_SIZE 256 >>> + >>> >>> ----- src/share/vm/opto/regmask.cpp >>> - extern uint8_t bitsInByte[512]; >>> + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; >>> >>> ----- src/share/vm/libadt/vectset.cpp >>> -uint8_t bitsInByte[256] = { >>> +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { >>> >>> I can send revised webrev for above if all okay. Please tell me if I am missing something. >>> >>> >>> OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? >>> (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') >>> >>> Thanks, >>> Rahul >>> >>>> -----Original Message----- >>>> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM >>>> >>>>> -----Original Message----- >>>>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM >>>>> >>>>> When do we access elements 256 .. 511? Wouldn't that mean we have >>>>> 9-bit bytes? >>>> Got your point Dean, Thanks. >>>> I too got some questions here now; will check and reply soon. >>>> >>>> -Rahul >>>> >>>>> dl >>>>> >>>>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: >>>>>> Hi, >>>>>> >>>>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. >>>>>> >>>>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 >>>>>> : >>>>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ >>>>>> >>>>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. >>>>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. >>>>>> Confirmed no issues with 'JPRT -testset hotspot' run. >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: >>>>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- >>>>> dev at openjdk.java.net >>>>>>> Should we not extend: >>>>>>> >>>>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: >>>>>>> uint8_t bitsInByte[256] = { // ... >>>>>>> >>>>>>> to 512 >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' >>>>>>> >>>>>>> So how do we intend to map a VecZ register without 512 bits? >>>>>>> >>>>>>> -Michael >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: hotspot-compiler-dev >>>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>>>> Of Vladimir Ivanov >>>>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; >>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>> >>>>>>> Rahul, >>>>>>> >>>>>>> Can we define a constant instead and use it in both places? >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please review the patch for JDK- 8149488. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 >>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ >>>>>>>> >>>>>>>> Corrected the bitsInByte array size in declaration. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rahul >>>>>>>> From michael.c.berg at intel.com Mon Apr 4 20:05:05 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Mon, 4 Apr 2016 20:05:05 +0000 Subject: FW: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Message-ID: FYI -----Original Message----- From: Berg, Michael C Sent: Monday, April 04, 2016 12:42 PM To: 'Rahul Raghavan' Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp Looks ok Rahul. Thanks, Michael -----Original Message----- From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] Sent: Monday, April 04, 2016 1:09 AM To: hotspot-compiler-dev at openjdk.java.net Cc: Dean Long ; Berg, Michael C ; Tobias Hartmann ; Vladimir Ivanov Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp Hi, Please review the revised fix for JDK- 8149488. : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). Points from Michael for the record - " > I believe Dean is right, I have debugged this and analyzed the usage model, > we never made use of the upper components > and register allocation has been right for VecZ for a good deal of time. > > All we need for a change is, > Regmask.cpp: > > uint RegMask::Size() const { > extern uint8_t bitsInByte[256]; > > A one line change. > > -Michael. > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > where we make use of VecZ and the upper bank of registers." So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. Confirmed no issues with 'JPRT -testset hotspot' run. Thanks, Rahul > -----Original Message----- > From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > > Michael, isn't the correct size for this table 256? I missed how VecZ > relates to the table size. > > dl > > On 3/31/2016 9:58 AM, Berg, Michael C wrote: > > Up until now we have gotten along with the size constraint only. > > Let us have both the size and the table though for completeness. > > I think we can leave the name though. > > > > -Michael > > > > -----Original Message----- > > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > > Sent: Thursday, March 31, 2016 9:18 AM > > To: Dean Long ; > > hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > > > > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in > > regmask.cpp > > > > Hi Michael, > > > > With respect to below thread, request help with some questions. > > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size. > > Also comment got was for requirement to extend bitsInByte table to > > 512 size, for consistent mapping for VecZ register also, on > targets that support it. > > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > > > > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > > (without extending current bitsInByte array contents) (Anyhow at > > present values above 0xFF is never indexed for bitsInByte in > RegMask::Size()) > > > > ----- src/share/vm/libadt/vectset.hpp > > +#define BITS_IN_BYTE_ARRAY_SIZE 256 > > + > > > > ----- src/share/vm/opto/regmask.cpp > > - extern uint8_t bitsInByte[512]; > > + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > > > > ----- src/share/vm/libadt/vectset.cpp > > -uint8_t bitsInByte[256] = { > > +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > > > > I can send revised webrev for above if all okay. Please tell me if I am missing something. > > > > > > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > > > > Thanks, > > Rahul > > > >> -----Original Message----- > >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > >> > >>> -----Original Message----- > >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > >>> > >>> When do we access elements 256 .. 511? Wouldn't that mean we have > >>> 9-bit bytes? > >> Got your point Dean, Thanks. > >> I too got some questions here now; will check and reply soon. > >> > >> -Rahul > >> > >>> dl > >>> > >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > >>>> Hi, > >>>> > >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > >>>> > >>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>> : > >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > >>>> > >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > >>>> Confirmed no issues with 'JPRT -testset hotspot' run. > >>>> > >>>> Thanks, > >>>> Rahul > >>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > >>> dev at openjdk.java.net > >>>>> Should we not extend: > >>>>> > >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > >>>>> uint8_t bitsInByte[256] = { // ... > >>>>> > >>>>> to 512 > >>>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > >>>>> > >>>>> So how do we intend to map a VecZ register without 512 bits? > >>>>> > >>>>> -Michael > >>>>> > >>>>> -----Original Message----- > >>>>> From: hotspot-compiler-dev > >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > >>>>> Of Vladimir Ivanov > >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > >>>>> hotspot-compiler-dev at openjdk.java.net > >>>>> > >>>>> Rahul, > >>>>> > >>>>> Can we define a constant instead and use it in both places? > >>>>> > >>>>> Best regards, > >>>>> Vladimir Ivanov > >>>>> > >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Please review the patch for JDK- 8149488. > >>>>>> > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>> Webrev: > >>>>>> http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > >>>>>> > >>>>>> Corrected the bitsInByte array size in declaration. > >>>>>> > >>>>>> Thanks, > >>>>>> Rahul > >>>>>> > From doug.simon at oracle.com Mon Apr 4 21:30:02 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 4 Apr 2016 23:30:02 +0200 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod Message-ID: The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. https://bugs.openjdk.java.net/browse/JDK-8153439 http://cr.openjdk.java.net/~dnsimon/8153439 From igor.veresov at oracle.com Mon Apr 4 21:46:18 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 4 Apr 2016 14:46:18 -0700 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: Message-ID: <3542A78F-57C3-48D4-ADAB-923760F33EE7@oracle.com> Looks good. igor > On Apr 4, 2016, at 2:30 PM, Doug Simon wrote: > > The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. > > https://bugs.openjdk.java.net/browse/JDK-8153439 > http://cr.openjdk.java.net/~dnsimon/8153439 From christian.thalinger at oracle.com Mon Apr 4 21:50:40 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 11:50:40 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: Message-ID: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> Thanks for the quick turnaround. Looks good. > On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: > > The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. > > https://bugs.openjdk.java.net/browse/JDK-8153439 > http://cr.openjdk.java.net/~dnsimon/8153439 From christian.thalinger at oracle.com Mon Apr 4 22:34:20 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 12:34:20 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> Message-ID: <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> No, not good. We are failing a couple JVMCI tests. Looking into it? > On Apr 4, 2016, at 11:50 AM, Christian Thalinger wrote: > > Thanks for the quick turnaround. Looks good. > >> On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: >> >> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. >> >> https://bugs.openjdk.java.net/browse/JDK-8153439 >> http://cr.openjdk.java.net/~dnsimon/8153439 > From vladimir.kozlov at oracle.com Mon Apr 4 22:57:15 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 15:57:15 -0700 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <56FE7DB5.404@oracle.com> References: <56FE7DB5.404@oracle.com> Message-ID: <5702F14B.1020403@oracle.com> 2 tests have -XX:+PrintCompilation flag added. Why you need it? Thanks, Vladimir On 4/1/16 6:55 AM, Nils Eliasson wrote: > Hi all, > > Please review this fix. > > Summary: > There is a mismatch in the CompilerWhiteBox testcases between the > callable and the executable constructors. SimpleTestCase$Helper > implements all constructors and methods that are tested. However since > Helper is an inner class there will be an extra (javac created) > constructor that has the parent class as an appended argument. The > callable will invoke this constructor, but the executable will reference > the normal constructor. > > Solution: > Stop have the Helper as an inner class. Rename it to > SimpleTestCaseHelper for some uniqueness in compiler commands and > directives. > > Testing: > Run all hotspot/compiler/whitebox tests on all platforms, and all > hotspot/compiler tests on one platform. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 > Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ > > Best regards, > Nils Eliasson From vladimir.kozlov at oracle.com Mon Apr 4 23:12:58 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 16:12:58 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: <5702F4FA.3090305@oracle.com> Looks good to me. Thanks, Vladimir On 4/1/16 11:28 AM, Igor Veresov wrote: > When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 > Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ > > Thanks, > igor > From igor.veresov at oracle.com Mon Apr 4 23:17:08 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 4 Apr 2016 16:17:08 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5702F4FA.3090305@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5702F4FA.3090305@oracle.com> Message-ID: <77FFC56E-2B29-4D0F-8EB6-C181DFFD895D@oracle.com> Thanks, Vladimir! Can I please get another review from the runtime team? igor > On Apr 4, 2016, at 4:12 PM, Vladimir Kozlov wrote: > > Looks good to me. > > Thanks, > Vladimir > > On 4/1/16 11:28 AM, Igor Veresov wrote: >> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >> >> Thanks, >> igor >> From vladimir.kozlov at oracle.com Mon Apr 4 23:25:02 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 16:25:02 -0700 Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler In-Reply-To: References: Message-ID: <5702F7CE.3040600@oracle.com> Bug number in links is incorrect. Should be: https://bugs.openjdk.java.net/browse/JDK-8151003 http://cr.openjdk.java.net/~mcberg/8151003/webrev.02/ Changes looks good. Very nice clean up. I will start testing. I see you changed code for AVX > 2 in macroAssembler_x86.hpp. Is it because new instructions faster or to avoid mixing evex and non-evex instructions? Thanks, Vladimir On 4/1/16 2:51 PM, Berg, Michael C wrote: > Hi All, > > I would like to contribute some clean up on the x86 assembler applied to > vex encoding to address the usage of the nds assembler parameter. > > For all instructions which use nds source xmm registers, the validity > check has been removed. It was originally placed there here: > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.1269 > > And propagated. Now nds register usage is fully compliant with each isa > descrption. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001 > webrev: > > http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/ > > Thanks, > > Michael > From michael.c.berg at intel.com Mon Apr 4 23:30:04 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Mon, 4 Apr 2016 23:30:04 +0000 Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler In-Reply-To: <5702F7CE.3040600@oracle.com> References: <5702F7CE.3040600@oracle.com> Message-ID: Before we were aliasing, which lent some ambiguity regarding AVX2 and EVEX usage, as the aliased forms had more programming via the imm field. This way they are fully separate. Thanks, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, April 04, 2016 4:25 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M) 8151003 remove nds validity checks from vex x86 assembler Bug number in links is incorrect. Should be: https://bugs.openjdk.java.net/browse/JDK-8151003 http://cr.openjdk.java.net/~mcberg/8151003/webrev.02/ Changes looks good. Very nice clean up. I will start testing. I see you changed code for AVX > 2 in macroAssembler_x86.hpp. Is it because new instructions faster or to avoid mixing evex and non-evex instructions? Thanks, Vladimir On 4/1/16 2:51 PM, Berg, Michael C wrote: > Hi All, > > I would like to contribute some clean up on the x86 assembler applied > to vex encoding to address the usage of the nds assembler parameter. > > For all instructions which use nds source xmm registers, the validity > check has been removed. It was originally placed there here: > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.12 > 69 > > And propagated. Now nds register usage is fully compliant with each > isa descrption. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001 > webrev: > > http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/ > > Thanks, > > Michael > From michael.c.berg at intel.com Mon Apr 4 23:32:25 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Mon, 4 Apr 2016 23:32:25 +0000 Subject: CR for RFR 8151573 In-Reply-To: <56FF3A04.5090601@oracle.com> References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: Vladimir, did you restart the integration testing after the small change I sent? Regards, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 01, 2016 8:18 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() >> from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend >>>> time and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>> Of Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to >>>>>> execute up until the point it actually has to take a range check >>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>> mask vectors which map to the residual iterations. Programmable >>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From christian.thalinger at oracle.com Tue Apr 5 00:00:05 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 14:00:05 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: > On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: > > No, not good. We are failing a couple JVMCI tests. Looking into it? Ok, this got a little out of control but for the better: http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. > >> On Apr 4, 2016, at 11:50 AM, Christian Thalinger wrote: >> >> Thanks for the quick turnaround. Looks good. >> >>> On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: >>> >>> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153439 >>> http://cr.openjdk.java.net/~dnsimon/8153439 >> > From igor.veresov at oracle.com Tue Apr 5 00:10:41 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 4 Apr 2016 17:10:41 -0700 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: Seems alright. igor > On Apr 4, 2016, at 5:00 PM, Christian Thalinger wrote: > > >> On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: >> >> No, not good. We are failing a couple JVMCI tests. Looking into it? > > Ok, this got a little out of control but for the better: > > http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ > > The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. > > While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. > > Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. > >> >>> On Apr 4, 2016, at 11:50 AM, Christian Thalinger wrote: >>> >>> Thanks for the quick turnaround. Looks good. >>> >>>> On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: >>>> >>>> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153439 >>>> http://cr.openjdk.java.net/~dnsimon/8153439 >>> >> > From christian.thalinger at oracle.com Tue Apr 5 00:24:12 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 14:24:12 -1000 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign In-Reply-To: <570265D1.6040905@oracle.com> References: <56FE5D00.7000209@oracle.com> <570265D1.6040905@oracle.com> Message-ID: <46818653-AC18-4C9F-BBEB-B1F98CC39FBA@oracle.com> Should be alright. > On Apr 4, 2016, at 3:02 AM, Aleksey Shipilev wrote: > > On 04/01/2016 02:35 PM, Aleksey Shipilev wrote: >> Hi, >> >> compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify >> String Concat strategies, because some of them are loading new methods >> and use them during String concat linkage and execution. Notably, this >> will happen inside of the asserts. We need to prime the asserts before >> using them in-between counter polls. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8153265 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8153265/webrev.00/ >> >> Testing: offending test in oob/-Xcomp modes > > Anyone? > > Thanks, > -Aleksey -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 5 01:29:54 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 18:29:54 -0700 Subject: CR for RFR 8151573 In-Reply-To: References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: <57031512.2060607@oracle.com> Changes looks good. I resubmit testing with new changes (4a). Thanks, Vladimir On 4/1/16 10:16 PM, Berg, Michael C wrote: > That small revision is reflected in: > > https://bugs.openjdk.java.net/browse/JDK-8151573 > > and can be accessed at: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.04a/ > > Regards, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C > Sent: Friday, April 01, 2016 8:25 PM > To: Vladimir Kozlov ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: RE: CR for RFR 8151573 > > I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude. > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, April 01, 2016 8:18 PM > To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: CR for RFR 8151573 > > I start preintegration testing. > > Thanks, > Vladimir > > On 3/31/16 8:36 PM, Berg, Michael C wrote: >> Vladimir, I think I have addressed every concern in the latest webrev: >> >> http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ >> >> I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. >> The code is fully retested with no issues. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 9:56 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> On 3/30/16 4:57 PM, Berg, Michael C wrote: >>> See below for context. >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 30, 2016 3:51 PM >>> To: Berg, Michael C ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: Re: CR for RFR 8151573 >>> >>> Michael, >>> >>> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >>> >>> multi_version_post_loops() can use is_canonical_main_loop_entry() >>> from >>> 8148754 but you need to modify it to move >>> is_Main() assert to other call sites. >>> >>> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. >> >> I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? >> >>> >>> I did not get rce'd post loop checks in loopnode.cpp. >>> >>> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >>> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >>> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. >> >> I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. >> >> You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new >> struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. >> >> Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? >> There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). >> Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? >> >>> >>> Swap next checks since has_range_checks() may be expensive scanning loop body: >>> + // only process RCE'd main loops >>> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >>> >>> Ok, makes sense. >> >> Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. >> But, please, rename has_range_checks(cl) method to avoid confusion. >> >> Thanks, >> Vladimir >> >>> >>> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >>> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >>> >>> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >>> >>> >>> Why you need local copies?: >>> >>> - visited.Clear(); >>> - clones.clear(); >>> + Arena *a = Thread::current()->resource_area(); >>> + VectorSet visited(a); >>> + Node_Stack clones(a, main_head->back_control()->outcnt()); >>> >>> I will look into this, and see if it can be cleaned up. >>> >>> >>> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >>> >>> Ok, I will look into a version without PostLoopInfo. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>>> Here is an update after full testing, the webrev is: >>>> >>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>>> >>>> Please review and comment, >>>> >>>> Thanks, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev >>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>>> Berg, Michael C >>>> Sent: Wednesday, March 16, 2016 10:30 AM >>>> To: Vladimir Kozlov ; >>>> 'hotspot-compiler-dev at openjdk.java.net' >>>> >>>> Subject: RE: CR for RFR 8151573 >>>> >>>> Putting a hold on the review, retesting everything on my end. >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Wednesday, March 16, 2016 8:42 AM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>>> Vladimir: >>>>> >>>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>>> >>>> I understand that we can get some benefits. But in general case they will not be visible. >>>> >>>>> >>>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>>> >>>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>>> >>>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>>> >>>> Regards, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> As we all know we can always construct microbenchmarks which shows >>>>> 30% >>>>> - 50% difference. When in real application we will never see >>>>> difference. I still don't see a real reason why we should spend >>>>> time and optimize >>>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>>> >>>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>>> Correction below... >>>>>> >>>>>> -----Original Message----- >>>>>> From: hotspot-compiler-dev >>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>>> Of Berg, Michael C >>>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>>> Subject: RE: CR for RFR 8151573 >>>>>> >>>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>>> >>>>>> for(int i = 0; i < process_len; i++) >>>>>> { >>>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>>> } >>>>>> >>>>>> The above code makes 9 vector ops. >>>>>> >>>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>>> >>>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>>> >>>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>>> >>>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>>> >>>>>> Regards, >>>>>> Michael >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>>> Subject: Re: CR for RFR 8151573 >>>>>> >>>>>> Hi Michael, >>>>>> >>>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>>> Hi Folks, >>>>>>> >>>>>>> I would like to contribute multi-versioning post loops for range >>>>>>> check elimination. Beforehand cfg optimizations after register >>>>>>> allocation were where post loop optimizations were done for range >>>>>>> checks. I have added code which produces the desired effect much >>>>>>> earlier by introducing a safe transformation which will minimally >>>>>>> allow a range check free version of the final post loop to >>>>>>> execute up until the point it actually has to take a range check >>>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>>> If during optimization we discover that we know enough to remove >>>>>>> the range check version of the post loop, mostly by exposing the >>>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>>> will eliminate the range check post loop altogether much like cfg >>>>>>> optimizations did, but much earlier. This gives optimizations >>>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>>> mask vectors which map to the residual iterations. Programmable >>>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>>> >>>>>>> This code was tested as follows: >>>>>>> >>>>>>> >>>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>>> >>>>>>> >>>>>>> webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Michael >>>>>>> From vivek.r.deshpande at intel.com Tue Apr 5 06:25:46 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 5 Apr 2016 06:25:46 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> Hi Christian We have updated the patch as per the suggested changes. The webrev for the same is at this location for your review. http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ We will soon send another patch for CompilerDirectives changes. Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Tuesday, March 29, 2016 11:29 AM To: Rukmannagari, Shravya Cc: Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > wrote: Hi Christian, We would add separate files for each intrinsic. By splitting the CompilerDirectives, do you mean we have to add a separate file. Sorry I didn?t exactly get it. Oh, sorry, I wasn?t clear enough. Please file a new enhancement for the CompilerDirectives changes and integrate them separately. Thanks, Shravya Rukmannagari. From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, March 28, 2016 5:18 PM To: Deshpande, Vivek R > Cc: hotspot compiler >; Vladimir Kozlov >; Rukmannagari, Shravya > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 I left this comment in the bug: I think for the saneness of the macroAssembler_libm_x86_*.cpp files we should put every intrinsic in its own file, like we did for macroAssembler_x86_sha.cpp. They are already too big: $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp Also, can we split out the CompilerDirectives changes? On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > wrote: Hi all We would like to contribute a patch which optimizes tan and log10 X86 architecture using Intel LIBM library. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8152907 webrev: http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Tue Apr 5 07:11:40 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Apr 2016 09:11:40 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks Message-ID: <5703652C.6000000@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8151724 http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. Tested with JPRT and RBT. Thanks, Tobias From zoltan.majo at oracle.com Tue Apr 5 07:15:43 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 5 Apr 2016 09:15:43 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703652C.6000000@oracle.com> References: <5703652C.6000000@oracle.com> Message-ID: <5703661F.9040403@oracle.com> Hi Tobias, that looks good to me. Thank you for fixing this issue. Best regards, Zoltan On 04/05/2016 09:11 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8151724 > http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ > > The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. > > Tested with JPRT and RBT. > > Thanks, > Tobias From tobias.hartmann at oracle.com Tue Apr 5 07:22:25 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Apr 2016 09:22:25 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703661F.9040403@oracle.com> References: <5703652C.6000000@oracle.com> <5703661F.9040403@oracle.com> Message-ID: <570367B1.9080600@oracle.com> Hi Zoltan, thanks for the review! Best regards, Tobias On 05.04.2016 09:15, Zolt?n Maj? wrote: > Hi Tobias, > > > that looks good to me. Thank you for fixing this issue. > > Best regards, > > > Zoltan > > On 04/05/2016 09:11 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8151724 >> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ >> >> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. >> >> Tested with JPRT and RBT. >> >> Thanks, >> Tobias > From martin.doerr at sap.com Tue Apr 5 10:10:09 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 5 Apr 2016 10:10:09 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57020636.7010806@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> Message-ID: <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Tue Apr 5 13:54:44 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 5 Apr 2016 15:54:44 +0200 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <5702F14B.1020403@oracle.com> References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com> Message-ID: <5703C3A4.2040907@oracle.com> Hi Vladimir, On 2016-04-05 00:57, Vladimir Kozlov wrote: > 2 tests have -XX:+PrintCompilation flag added. Why you need it? > It helps a lot to have a compilation log to start with when these hard to reproduce failures happen. Those two tests test the compilation parts of the WB API. I ran into another issue in these tests - the compile()-method in CompilerWhiteBoxTest is not reliable unless the invocation counter decay is turned off. I added a check of the UseCounterDecay-flag in that method so that no one will miss it by accident. Best regards, Nils Eliasson > Thanks, > Vladimir > > On 4/1/16 6:55 AM, Nils Eliasson wrote: >> Hi all, >> >> Please review this fix. >> >> Summary: >> There is a mismatch in the CompilerWhiteBox testcases between the >> callable and the executable constructors. SimpleTestCase$Helper >> implements all constructors and methods that are tested. However since >> Helper is an inner class there will be an extra (javac created) >> constructor that has the parent class as an appended argument. The >> callable will invoke this constructor, but the executable will reference >> the normal constructor. >> >> Solution: >> Stop have the Helper as an inner class. Rename it to >> SimpleTestCaseHelper for some uniqueness in compiler commands and >> directives. >> >> Testing: >> Run all hotspot/compiler/whitebox tests on all platforms, and all >> hotspot/compiler tests on one platform. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 >> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ >> >> Best regards, >> Nils Eliasson From nils.eliasson at oracle.com Tue Apr 5 13:56:07 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 5 Apr 2016 15:56:07 +0200 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <5703C3A4.2040907@oracle.com> References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com> <5703C3A4.2040907@oracle.com> Message-ID: <5703C3F7.1020900@oracle.com> I forgot the webrev link: http://cr.openjdk.java.net/~neliasso/8151880/webrev.03/ Regards, Nils On 2016-04-05 15:54, Nils Eliasson wrote: > Hi Vladimir, > > On 2016-04-05 00:57, Vladimir Kozlov wrote: >> 2 tests have -XX:+PrintCompilation flag added. Why you need it? >> > > It helps a lot to have a compilation log to start with when these hard > to reproduce failures happen. Those two tests test the compilation > parts of the WB API. > > I ran into another issue in these tests - the compile()-method in > CompilerWhiteBoxTest is not reliable unless the invocation counter > decay is turned off. I added a check of the UseCounterDecay-flag in > that method so that no one will miss it by accident. > > Best regards, > Nils Eliasson > > >> Thanks, >> Vladimir >> >> On 4/1/16 6:55 AM, Nils Eliasson wrote: >>> Hi all, >>> >>> Please review this fix. >>> >>> Summary: >>> There is a mismatch in the CompilerWhiteBox testcases between the >>> callable and the executable constructors. SimpleTestCase$Helper >>> implements all constructors and methods that are tested. However since >>> Helper is an inner class there will be an extra (javac created) >>> constructor that has the parent class as an appended argument. The >>> callable will invoke this constructor, but the executable will >>> reference >>> the normal constructor. >>> >>> Solution: >>> Stop have the Helper as an inner class. Rename it to >>> SimpleTestCaseHelper for some uniqueness in compiler commands and >>> directives. >>> >>> Testing: >>> Run all hotspot/compiler/whitebox tests on all platforms, and all >>> hotspot/compiler tests on one platform. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ >>> >>> Best regards, >>> Nils Eliasson > From doug.simon at oracle.com Tue Apr 5 14:16:19 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 5 Apr 2016 16:16:19 +0200 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: > On 05 Apr 2016, at 02:00, Christian Thalinger wrote: > > >> On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: >> >> No, not good. We are failing a couple JVMCI tests. Looking into it? > > Ok, this got a little out of control but for the better: > > http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ > > The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. > > While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. > > Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. I agree. Can you modify your derivative webrev for that? Of course, we wouldn?t need the cast to HotSpotSpeculationLog in HotSpotCodeCacheProvider once you?ve made that change. -Doug From vladimir.x.ivanov at oracle.com Tue Apr 5 15:12:19 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 5 Apr 2016 18:12:19 +0300 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining Message-ID: <5703D5D3.7020801@oracle.com> http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8152590 Constant folding of stable field loads only happens during parsing. During incremental (post-parse) inlining some loads can become foldable, but they aren't optimized. Though the fix is pretty trivial (webrev.00.02), I decided to refactor relevant code and get rid of redundant parts. To ease the review I split the change into 4 parts: (1) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.00 * extracted all constant-related checks into ciField::constant_value()/ciField::constant_value_of(); * common constant folding logic into GraphKit::make_constant_from_field()/Type::make_constant_*(): Parse::do_get_xxx() / LibraryCallKit::inline_unsafe_access() GraphKit::make_constant_from_field(ciField*, Node*) Type::make_constant_from_field(ciField*, ...) Type::make_from_constant(ciConstant, ...) * fold_stable_ary_elem is moved to Type::make_constant_from_array_element() * check_mismatched_access is moved to type.cpp (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 Refactored constant folding logic for static final fields and unified folding logic with instance fields: is_constant() depends only on the flags and caller should check return value from ciField::constant_value() for validity (ciConstant.is_valid()) (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 Do constant folding for fields (both static and instance) in LoadNode::Value. (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 Mark CallSite::target field as constant. Also: * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. Thanks! Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Apr 5 15:20:08 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 08:20:08 -0700 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703652C.6000000@oracle.com> References: <5703652C.6000000@oracle.com> Message-ID: <5703D7A8.3090602@oracle.com> Looks good. Tobias, thank you for investigating asserts crashes. Thanks, Vladimir On 4/5/16 12:11 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8151724 > http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ > > The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. > > Tested with JPRT and RBT. > > Thanks, > Tobias > From tom.rodriguez at oracle.com Tue Apr 5 15:21:53 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 5 Apr 2016 08:21:53 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: looks good to me. tom > On Apr 1, 2016, at 11:28 AM, Igor Veresov wrote: > > When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 > Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ > > Thanks, > igor From tom.rodriguez at oracle.com Tue Apr 5 15:23:49 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 5 Apr 2016 08:23:49 -0700 Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should return dependencies_failed In-Reply-To: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com> References: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com> Message-ID: Thanks! tom > On Apr 1, 2016, at 1:18 PM, Igor Veresov wrote: > > Looks good. > > igor > >> On Apr 1, 2016, at 12:47 PM, Tom Rodriguez wrote: >> >> http://cr.openjdk.java.net/~never/8153315/webrev >> >> This fixes a minor issue which showed up while debugging Java code. evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies. Graal considers it an error to build invalid dependencies so it complained. Tested under the Eclipse debugger. >> >> tom > From tobias.hartmann at oracle.com Tue Apr 5 15:28:57 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Apr 2016 17:28:57 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703D7A8.3090602@oracle.com> References: <5703652C.6000000@oracle.com> <5703D7A8.3090602@oracle.com> Message-ID: <5703D9B9.6010206@oracle.com> Thanks, Vladimir! Best regards, Tobias On 05.04.2016 17:20, Vladimir Kozlov wrote: > Looks good. Tobias, thank you for investigating asserts crashes. > > Thanks, > Vladimir > > On 4/5/16 12:11 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8151724 >> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ >> >> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. >> >> Tested with JPRT and RBT. >> >> Thanks, >> Tobias >> From vladimir.kozlov at oracle.com Tue Apr 5 15:32:57 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 08:32:57 -0700 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <5703C3F7.1020900@oracle.com> References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com> <5703C3A4.2040907@oracle.com> <5703C3F7.1020900@oracle.com> Message-ID: <5703DAA9.7010707@oracle.com> Yes, adding UseCounterDecay make sense in this case. And I agree with PrintCompilation since it help diagnose problems. Reviewed. Thanks, Vladimir On 4/5/16 6:56 AM, Nils Eliasson wrote: > I forgot the webrev link: > http://cr.openjdk.java.net/~neliasso/8151880/webrev.03/ > > Regards, > Nils > > On 2016-04-05 15:54, Nils Eliasson wrote: >> Hi Vladimir, >> >> On 2016-04-05 00:57, Vladimir Kozlov wrote: >>> 2 tests have -XX:+PrintCompilation flag added. Why you need it? >>> >> >> It helps a lot to have a compilation log to start with when these hard >> to reproduce failures happen. Those two tests test the compilation >> parts of the WB API. >> >> I ran into another issue in these tests - the compile()-method in >> CompilerWhiteBoxTest is not reliable unless the invocation counter >> decay is turned off. I added a check of the UseCounterDecay-flag in >> that method so that no one will miss it by accident. >> >> Best regards, >> Nils Eliasson >> >> >>> Thanks, >>> Vladimir >>> >>> On 4/1/16 6:55 AM, Nils Eliasson wrote: >>>> Hi all, >>>> >>>> Please review this fix. >>>> >>>> Summary: >>>> There is a mismatch in the CompilerWhiteBox testcases between the >>>> callable and the executable constructors. SimpleTestCase$Helper >>>> implements all constructors and methods that are tested. However since >>>> Helper is an inner class there will be an extra (javac created) >>>> constructor that has the parent class as an appended argument. The >>>> callable will invoke this constructor, but the executable will >>>> reference >>>> the normal constructor. >>>> >>>> Solution: >>>> Stop have the Helper as an inner class. Rename it to >>>> SimpleTestCaseHelper for some uniqueness in compiler commands and >>>> directives. >>>> >>>> Testing: >>>> Run all hotspot/compiler/whitebox tests on all platforms, and all >>>> hotspot/compiler tests on one platform. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ >>>> >>>> Best regards, >>>> Nils Eliasson >> > From igor.veresov at oracle.com Tue Apr 5 15:57:41 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 08:57:41 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: Thanks, Tom! igor > On Apr 5, 2016, at 8:21 AM, Tom Rodriguez wrote: > > looks good to me. > > tom > >> On Apr 1, 2016, at 11:28 AM, Igor Veresov wrote: >> >> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >> >> Thanks, >> igor > From lois.foltan at oracle.com Tue Apr 5 16:30:27 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 12:30:27 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: <5703E823.8050400@oracle.com> Hi Igor, I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? Thanks, Lois On 4/1/2016 2:28 PM, Igor Veresov wrote: > When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 > Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ > > Thanks, > igor From igor.veresov at oracle.com Tue Apr 5 16:50:49 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 09:50:49 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703E823.8050400@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> Message-ID: <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> Hi Lois, Thanks for looking at it. Yes, it passes all hotspot jtreg tests. igor > On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: > > Hi Igor, > > I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? > > Thanks, > Lois > > On 4/1/2016 2:28 PM, Igor Veresov wrote: >> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >> >> Thanks, >> igor > From lois.foltan at oracle.com Tue Apr 5 17:34:15 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 13:34:15 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> Message-ID: <5703F717.702@oracle.com> On 4/5/2016 12:50 PM, Igor Veresov wrote: > Hi Lois, > > Thanks for looking at it. Yes, it passes all hotspot jtreg tests. > > igor Hi Igor, Thanks for waiting on this. A couple of comments: - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. Just curious did you also run the testbase default methods tests? Lois > >> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >> >> Hi Igor, >> >> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >> >> Thanks, >> Lois >> >> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>> >>> Thanks, >>> igor From igor.veresov at oracle.com Tue Apr 5 17:44:56 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 10:44:56 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703F717.702@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: > On Apr 5, 2016, at 10:34 AM, Lois Foltan wrote: > > > On 4/5/2016 12:50 PM, Igor Veresov wrote: >> Hi Lois, >> >> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >> >> igor > Hi Igor, > > Thanks for waiting on this. A couple of comments: > > - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. > > - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. > I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? > Just curious did you also run the testbase default methods tests? Yes, within the context of a closed project. That?s actually what made these changes necessary. igor > Lois > >> >>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>> >>> Hi Igor, >>> >>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>> >>> Thanks, >>> Lois >>> >>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>> >>>> Thanks, >>>> igor > From tom.rodriguez at oracle.com Tue Apr 5 17:55:04 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 5 Apr 2016 10:55:04 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703F717.702@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: <29A506A0-BEC9-4514-B8E9-92E6C29B1E40@oracle.com> > Thanks for waiting on this. A couple of comments: > > - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. > > - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. Unless i?m misunderstanding the code, I?d say that it was a bug that nostatics was being passed as false. The standard naming in LinkResolver follows a pattern where the name is associated with the byte code being used for the invoke. So if you are in {linktime,runtime}_resolve_{foo}_method then an invokefoo byte code is what?s being used. resolve_interface_method is currently not following that naming scheme, which I think should be fixed. Maybe resolve_method_in_interface? I do agree we should be sure that the paths leading to this call agree about the actual byte code being used, but I can?t see a path where a different byte code could have been passed in. tom > > Just curious did you also run the testbase default methods tests? > Lois > >> >>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>> >>> Hi Igor, >>> >>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>> >>> Thanks, >>> Lois >>> >>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>> >>>> Thanks, >>>> igor > From igor.veresov at oracle.com Tue Apr 5 17:56:55 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 10:56:55 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: > On Apr 5, 2016, at 10:44 AM, Igor Veresov wrote: > >> >> On Apr 5, 2016, at 10:34 AM, Lois Foltan wrote: >> >> >> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>> Hi Lois, >>> >>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>> >>> igor >> Hi Igor, >> >> Thanks for waiting on this. A couple of comments: >> >> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >> >> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >> > > I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. igor > >> Just curious did you also run the testbase default methods tests? > > Yes, within the context of a closed project. That?s actually what made these changes necessary. > > igor > >> Lois >> >>> >>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>>> >>>> Hi Igor, >>>> >>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>> >>>> Thanks, >>>> Lois >>>> >>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>> >>>>> Thanks, >>>>> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From karen.kinnear at oracle.com Tue Apr 5 18:12:41 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 5 Apr 2016 14:12:41 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: <01AE161D-FAE2-4F37-9E79-3951DB516EE0@oracle.com> Igor, I?d like to get back to you on this before you check in the change please. I need to sanity check the JVMS and the code. I have another set of tests written by Vladimir Ivanov that I will send you as well. thanks, Karen > On Apr 5, 2016, at 11:57 AM, Igor Veresov wrote: > > Thanks, Tom! > > igor > >> On Apr 5, 2016, at 8:21 AM, Tom Rodriguez wrote: >> >> looks good to me. >> >> tom >> >>> On Apr 1, 2016, at 11:28 AM, Igor Veresov wrote: >>> >>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>> >>> Thanks, >>> igor >> > From karen.kinnear at oracle.com Tue Apr 5 19:04:12 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 5 Apr 2016 15:04:12 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703F717.702@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: I am in agreement with Lois that the JVMS looks good with moving the exception. With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next meeting I will check one more time. It might be worth adding a comment. My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, so that you get the correct behavior depending on the requesting byte code. I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so I could use help studying this a bit more to understand if this really is resolution or is really selection. thanks, Karen > On Apr 5, 2016, at 1:34 PM, Lois Foltan wrote: > > > On 4/5/2016 12:50 PM, Igor Veresov wrote: >> Hi Lois, >> >> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >> >> igor > Hi Igor, > > Thanks for waiting on this. A couple of comments: > > - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. > > - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. > > Just curious did you also run the testbase default methods tests? > Lois > >> >>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>> >>> Hi Igor, >>> >>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>> >>> Thanks, >>> Lois >>> >>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>> >>>> Thanks, >>>> igor > From vladimir.kozlov at oracle.com Tue Apr 5 19:23:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 12:23:44 -0700 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining In-Reply-To: <5703D5D3.7020801@oracle.com> References: <5703D5D3.7020801@oracle.com> Message-ID: <570410C0.7080105@oracle.com> On 4/5/16 8:12 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00 > https://bugs.openjdk.java.net/browse/JDK-8152590 > > Constant folding of stable field loads only happens during parsing. > During incremental (post-parse) inlining some loads can become foldable, > but they aren't optimized. > > Though the fix is pretty trivial (webrev.00.02), I decided to refactor > relevant code and get rid of redundant parts. > > To ease the review I split the change into 4 parts: > > (1) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.00 > > * extracted all constant-related checks into > ciField::constant_value()/ciField::constant_value_of(); > > * common constant folding logic into > GraphKit::make_constant_from_field()/Type::make_constant_*(): > > Parse::do_get_xxx() / LibraryCallKit::inline_unsafe_access() > GraphKit::make_constant_from_field(ciField*, Node*) > Type::make_constant_from_field(ciField*, ...) > Type::make_from_constant(ciConstant, ...) > > * fold_stable_ary_elem is moved to > Type::make_constant_from_array_element() > > * check_mismatched_access is moved to type.cpp Type::make_constant_from_field() - is_stable_array and stable_dimension are needed only at the end for make_from_constant() call, move them there. check_mismatched_access() result is used only in assert. Should you put the call and assert under #ifdef ASSERT? > > > (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 > > Refactored constant folding logic for static final fields and unified > folding logic with instance fields: is_constant() depends only on the > flags and caller should check return value from > ciField::constant_value() for validity (ciConstant.is_valid()) > Good. > > (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 > > Do constant folding for fields (both static and instance) in > LoadNode::Value. Good. > > > (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 > > Mark CallSite::target field as constant. Okay. Thanks, Vladimir > > > Also: > > * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java > > Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. > > Thanks! > > Best regards, > Vladimir Ivanov From igor.veresov at oracle.com Tue Apr 5 19:43:51 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 12:43:51 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> > On Apr 5, 2016, at 12:04 PM, Karen Kinnear wrote: > > I am in agreement with Lois that the JVMS looks good with moving the exception. Thanks! > > With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next > meeting I will check one more time. It might be worth adding a comment. Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. > > My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks > if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. > That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). igor > I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the > corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, > so that you get the correct behavior depending on the requesting byte code. > > I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so > I could use help studying this a bit more to understand if this really is resolution or is really selection. > > thanks, > Karen > >> On Apr 5, 2016, at 1:34 PM, Lois Foltan wrote: >> >> >> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>> Hi Lois, >>> >>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>> >>> igor >> Hi Igor, >> >> Thanks for waiting on this. A couple of comments: >> >> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >> >> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >> >> Just curious did you also run the testbase default methods tests? >> Lois >> >>> >>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>>> >>>> Hi Igor, >>>> >>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>> >>>> Thanks, >>>> Lois >>>> >>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>> >>>>> Thanks, >>>>> igor >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Apr 5 19:47:26 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 5 Apr 2016 22:47:26 +0300 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining In-Reply-To: <570410C0.7080105@oracle.com> References: <5703D5D3.7020801@oracle.com> <570410C0.7080105@oracle.com> Message-ID: <5704164E.3040006@oracle.com> Thanks for review, Vladimir! Updated version: http://cr.openjdk.java.net/~vlivanov/8152590/webrev.01 >> * check_mismatched_access is moved to type.cpp > > Type::make_constant_from_field() - is_stable_array and stable_dimension > are needed only at the end for make_from_constant() call, move them there. Done. > check_mismatched_access() result is used only in assert. Should you put > the call and assert under #ifdef ASSERT? No, it's a bug in the change: con should be used instead of field_value. check_mismatched_access filters out invalid accesses and adjusts the value for unsigned loads. Fixed. Best regards, Vladimir Ivanov PS: I'll enhance the tests to catch unsigned field load case. > >> >> >> (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 >> >> Refactored constant folding logic for static final fields and unified >> folding logic with instance fields: is_constant() depends only on the >> flags and caller should check return value from >> ciField::constant_value() for validity (ciConstant.is_valid()) >> > > Good. > >> >> (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 >> >> Do constant folding for fields (both static and instance) in >> LoadNode::Value. > > Good. > >> >> >> (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 >> >> Mark CallSite::target field as constant. > > Okay. > > Thanks, > Vladimir > >> >> >> Also: >> >> * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java >> >> Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From lois.foltan at oracle.com Tue Apr 5 19:54:14 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 15:54:14 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: <570417E6.60107@oracle.com> On 4/5/2016 1:56 PM, Igor Veresov wrote: > >> On Apr 5, 2016, at 10:44 AM, Igor Veresov > > wrote: >> >>> >>> On Apr 5, 2016, at 10:34 AM, Lois Foltan >> > wrote: >>> >>> >>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>> Hi Lois, >>>> >>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>> >>>> igor >>> Hi Igor, >>> >>> Thanks for waiting on this. A couple of comments: >>> >>> - Section 6.5 Instructions for invokeinterface does indicate that a >>> "Linking Exceptions" the VM can throw an ICCE if the resolved method >>> is static or private. So I think moving this exception from runtime >>> to linktime is okay. >>> >>> - I'm concerned about the change on line #998, #1030, #1316. I >>> don't think you are necessarily guaranteed to have the bytecodes >>> that you are now passing to resolve_interface_method. For example, >>> line #998 within linktime_resolve_static_method, you may not have an >>> invokestatic here, you may have another invoke* bytecode trying to >>> invoke a static interface method. Passing in >>> Bytecodes::_invokestatic seems wrong, because even if the resolved >>> method is static, "nostatics" was set to false. >>> >> >> I looked at the call graphs of these guys and >> linktime_resolve_X_method() methods actually seem to only called >> within invokeX contexts. But may be I missed something. Can you give >> an example of a path that may cause, say, >> linktime_resolve_static_method() be invoked for non-invokestatic >> instruction? > > Actually, the easier way to think about it, would be: The answer > returned by resolve_interface_method() is result of a method > resolution in the interface class ?as if? it were invoked by the given > bytecode instruction. It of course doesn?t not mean that the > invocation is really a result of the said instruction. The context is. > As you may see the logic around ?nostatic? did not change and the > logic around resolve_interface_method() being called within the > invokeinterface context is what we want it to be. > > The same effect could have been achieved by adding another bool > argument to resolve_interface_method() to indicate a question within > the invokeinterface context. But passing a bytecode makes it an easier > to read code. Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. Lois > > igor > >> >>> Just curious did you also run the testbase default methods tests? >> >> Yes, within the context of a closed project. That?s actually what >> made these changes necessary. >> >> igor >> >>> Lois >>> >>>> >>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan >>>> > wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> I know you have two reviews for this but could you hold off >>>>> committing until I or Karen Kinnear have a chance to review. We >>>>> both worked in this area a lot to support default methods in JDK >>>>> 8. Also, have you run the hotspot/test/runtime/SelectionResolution >>>>> tests on this? >>>>> >>>>> Thanks, >>>>> Lois >>>>> >>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>> When invoking private interface methods with invokeinterface we >>>>>> throw ICCE. The check for that happens in the runtime part of the >>>>>> resolution, however, doing it at linktime seems like a better >>>>>> place, since the check doesn't depend on the receiver type. It >>>>>> also allows compiler interfaces that rely on linktime resolution >>>>>> to detect inconsistencies during parsing (see >>>>>> ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() >>>>>> (JVMCI) that are affected). >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>> >>>>>> >>>>>> Thanks, >>>>>> igor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 5 20:07:11 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 13:07:11 -0700 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining In-Reply-To: <5704164E.3040006@oracle.com> References: <5703D5D3.7020801@oracle.com> <570410C0.7080105@oracle.com> <5704164E.3040006@oracle.com> Message-ID: <57041AEF.4090907@oracle.com> This looks good. Thanks, Vladimir K On 4/5/16 12:47 PM, Vladimir Ivanov wrote: > Thanks for review, Vladimir! > > Updated version: > http://cr.openjdk.java.net/~vlivanov/8152590/webrev.01 >>> * check_mismatched_access is moved to type.cpp >> >> Type::make_constant_from_field() - is_stable_array and stable_dimension >> are needed only at the end for make_from_constant() call, move them >> there. > > Done. > >> check_mismatched_access() result is used only in assert. Should you put >> the call and assert under #ifdef ASSERT? > No, it's a bug in the change: con should be used instead of field_value. > check_mismatched_access filters out invalid accesses and adjusts the > value for unsigned loads. > > Fixed. > > Best regards, > Vladimir Ivanov > > PS: I'll enhance the tests to catch unsigned field load case. > >> >>> >>> >>> (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 >>> >>> Refactored constant folding logic for static final fields and unified >>> folding logic with instance fields: is_constant() depends only on the >>> flags and caller should check return value from >>> ciField::constant_value() for validity (ciConstant.is_valid()) >>> >> >> Good. >> >>> >>> (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 >>> >>> Do constant folding for fields (both static and instance) in >>> LoadNode::Value. >> >> Good. >> >>> >>> >>> (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 >>> >>> Mark CallSite::target field as constant. >> >> Okay. >> >> Thanks, >> Vladimir >> >>> >>> >>> Also: >>> >>> * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java >>> >>> Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Apr 5 20:33:34 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 13:33:34 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> Message-ID: <5704211E.5090007@oracle.com> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? I will start pre-integration testing. Thanks, Vladimir On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: > Hi Christian > > We have updated the patch as per the suggested changes. > > The webrev for the same is at this location for your review. > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ > > We will soon send another patch for CompilerDirectives changes. > > Regards, > > Vivek > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:* Tuesday, March 29, 2016 11:29 AM > *To:* Rukmannagari, Shravya > *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler > *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 > > On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > > wrote: > > Hi Christian, > > We would add separate files for each intrinsic. By splitting the > CompilerDirectives, do you mean we have to add a separate file. > Sorry I didn?t exactly get it. > > Oh, sorry, I wasn?t clear enough. Please file a new enhancement for > the CompilerDirectives changes and integrate them separately. > > > > Thanks, > > Shravya Rukmannagari. > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:*Monday, March 28, 2016 5:18 PM > *To:*Deshpande, Vivek R > > *Cc:*hotspot compiler >; Vladimir Kozlov > >; > Rukmannagari, Shravya > > *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I left this comment in the bug: > > I think for the saneness of the macroAssembler_libm_x86_*.cpp files we > should put every intrinsic in its own file, like we did for > macroAssembler_x86_sha.cpp. They are already too big: > > $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp > 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp > 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp > > Also, can we split out the CompilerDirectives changes? > > > > > On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > > > wrote: > > Hi all > > We would like to contribute a patch which optimizestan and log10 > X86architecture usingIntel LIBM library. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8152907 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ > > Thanks and regards, > > Vivek > From vivek.r.deshpande at intel.com Tue Apr 5 20:41:48 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 5 Apr 2016 20:41:48 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <5704211E.5090007@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> Hi Vladimir I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. Thank you for the review. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 05, 2016 1:34 PM To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? I will start pre-integration testing. Thanks, Vladimir On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: > Hi Christian > > We have updated the patch as per the suggested changes. > > The webrev for the same is at this location for your review. > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01 > / > > We will soon send another patch for CompilerDirectives changes. > > Regards, > > Vivek > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:* Tuesday, March 29, 2016 11:29 AM > *To:* Rukmannagari, Shravya > *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler > *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 > > On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > > wrote: > > Hi Christian, > > We would add separate files for each intrinsic. By splitting the > CompilerDirectives, do you mean we have to add a separate file. > Sorry I didn?t exactly get it. > > Oh, sorry, I wasn?t clear enough. Please file a new enhancement for > the CompilerDirectives changes and integrate them separately. > > > > Thanks, > > Shravya Rukmannagari. > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:*Monday, March 28, 2016 5:18 PM > *To:*Deshpande, Vivek R > > *Cc:*hotspot compiler >; Vladimir Kozlov > >; > Rukmannagari, Shravya > > *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I left this comment in the bug: > > I think for the saneness of the macroAssembler_libm_x86_*.cpp files we > should put every intrinsic in its own file, like we did for > macroAssembler_x86_sha.cpp. They are already too big: > > $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp > 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp > 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp > > Also, can we split out the CompilerDirectives changes? > > > > > On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > > > wrote: > > Hi all > > We would like to contribute a patch which optimizestan and log10 > X86architecture usingIntel LIBM library. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8152907 > webrev: > > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00 > / > > Thanks and regards, > > Vivek > From vladimir.kozlov at oracle.com Tue Apr 5 20:47:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 13:47:28 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> Message-ID: <57042460.5070306@oracle.com> I again can't apply changes because of CR at the end of lines in patch file. Vladimir On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: > > Hi Vladimir > > I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. > Thank you for the review. > > Regards, > Vivek > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:34 PM > To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? > > I will start pre-integration testing. > > Thanks, > Vladimir > > On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >> Hi Christian >> >> We have updated the patch as per the suggested changes. >> >> The webrev for the same is at this location for your review. >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01 >> / >> >> We will soon send another patch for CompilerDirectives changes. >> >> Regards, >> >> Vivek >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:* Tuesday, March 29, 2016 11:29 AM >> *To:* Rukmannagari, Shravya >> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >> > > wrote: >> >> Hi Christian, >> >> We would add separate files for each intrinsic. By splitting the >> CompilerDirectives, do you mean we have to add a separate file. >> Sorry I didn?t exactly get it. >> >> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >> the CompilerDirectives changes and integrate them separately. >> >> >> >> Thanks, >> >> Shravya Rukmannagari. >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Monday, March 28, 2016 5:18 PM >> *To:*Deshpande, Vivek R > > >> *Cc:*hotspot compiler > >; Vladimir Kozlov >> >; >> Rukmannagari, Shravya > > >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I left this comment in the bug: >> >> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we >> should put every intrinsic in its own file, like we did for >> macroAssembler_x86_sha.cpp. They are already too big: >> >> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >> >> Also, can we split out the CompilerDirectives changes? >> >> >> >> >> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >> > >> wrote: >> >> Hi all >> >> We would like to contribute a patch which optimizestan and log10 >> X86architecture usingIntel LIBM library. >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8152907 >> webrev: >> >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00 >> / >> >> Thanks and regards, >> >> Vivek >> From vivek.r.deshpande at intel.com Tue Apr 5 21:27:00 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 5 Apr 2016 21:27:00 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <57042460.5070306@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> HI Vladimir Sorry about that. Please check this webrev http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02/ I have updated it. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 05, 2016 1:47 PM To: Deshpande, Vivek R; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 I again can't apply changes because of CR at the end of lines in patch file. Vladimir On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: > > Hi Vladimir > > I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. > Thank you for the review. > > Regards, > Vivek > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:34 PM > To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? > > I will start pre-integration testing. > > Thanks, > Vladimir > > On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >> Hi Christian >> >> We have updated the patch as per the suggested changes. >> >> The webrev for the same is at this location for your review. >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >> 1 >> / >> >> We will soon send another patch for CompilerDirectives changes. >> >> Regards, >> >> Vivek >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:* Tuesday, March 29, 2016 11:29 AM >> *To:* Rukmannagari, Shravya >> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >> > > wrote: >> >> Hi Christian, >> >> We would add separate files for each intrinsic. By splitting the >> CompilerDirectives, do you mean we have to add a separate file. >> Sorry I didn?t exactly get it. >> >> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >> the CompilerDirectives changes and integrate them separately. >> >> >> >> Thanks, >> >> Shravya Rukmannagari. >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >> > >> *Cc:*hotspot compiler > >; Vladimir Kozlov >> >; >> Rukmannagari, Shravya > > >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I left this comment in the bug: >> >> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >> we should put every intrinsic in its own file, like we did for >> macroAssembler_x86_sha.cpp. They are already too big: >> >> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >> >> Also, can we split out the CompilerDirectives changes? >> >> >> >> >> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >> > >> wrote: >> >> Hi all >> >> We would like to contribute a patch which optimizestan and log10 >> X86architecture usingIntel LIBM library. >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8152907 >> webrev: >> >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >> 0 >> / >> >> Thanks and regards, >> >> Vivek >> From vladimir.kozlov at oracle.com Tue Apr 5 21:42:04 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 14:42:04 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> Message-ID: <5704312C.9000605@oracle.com> Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: error: use of undeclared identifier 'SharedRuntime' __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); Note templateInterpreterGenerator_x86_32.cpp has that #include. It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. Vladimir On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: > HI Vladimir > > Sorry about that. > Please check this webrev > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02/ > I have updated it. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:47 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I again can't apply changes because of CR at the end of lines in patch file. > > Vladimir > > On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >> >> Hi Vladimir >> >> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >> Thank you for the review. >> >> Regards, >> Vivek >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:34 PM >> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >> >> I will start pre-integration testing. >> >> Thanks, >> Vladimir >> >> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>> Hi Christian >>> >>> We have updated the patch as per the suggested changes. >>> >>> The webrev for the same is at this location for your review. >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >>> 1 >>> / >>> >>> We will soon send another patch for CompilerDirectives changes. >>> >>> Regards, >>> >>> Vivek >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>> *To:* Rukmannagari, Shravya >>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>> >> > wrote: >>> >>> Hi Christian, >>> >>> We would add separate files for each intrinsic. By splitting the >>> CompilerDirectives, do you mean we have to add a separate file. >>> Sorry I didn?t exactly get it. >>> >>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>> the CompilerDirectives changes and integrate them separately. >>> >>> >>> >>> Thanks, >>> >>> Shravya Rukmannagari. >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>> > >>> *Cc:*hotspot compiler >> >; Vladimir Kozlov >>> >; >>> Rukmannagari, Shravya >> > >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> I left this comment in the bug: >>> >>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>> we should put every intrinsic in its own file, like we did for >>> macroAssembler_x86_sha.cpp. They are already too big: >>> >>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>> >>> Also, can we split out the CompilerDirectives changes? >>> >>> >>> >>> >>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>> > >>> wrote: >>> >>> Hi all >>> >>> We would like to contribute a patch which optimizestan and log10 >>> X86architecture usingIntel LIBM library. >>> >>> Could you please review and sponsor this patch. >>> >>> Bug-id: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>> webrev: >>> >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >>> 0 >>> / >>> >>> Thanks and regards, >>> >>> Vivek >>> From igor.veresov at oracle.com Tue Apr 5 22:08:03 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 15:08:03 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <57041A5D.4040909@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <570417E6.60107@oracle.com> <5704192A.3010807@oracle.com> <57041A5D.4040909@oracle.com> Message-ID: <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> Coleen and Lois, So, ok to push? igor > On Apr 5, 2016, at 1:04 PM, Coleen Phillimore wrote: > > > > On 4/5/16 3:59 PM, Coleen Phillimore wrote: >> >> >> On 4/5/16 3:54 PM, Lois Foltan wrote: >>> >>> On 4/5/2016 1:56 PM, Igor Veresov wrote: >>>> >>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov > wrote: >>>>> >>>>>> >>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan > wrote: >>>>>> >>>>>> >>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>> Hi Lois, >>>>>>> >>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>> >>>>>>> igor >>>>>> Hi Igor, >>>>>> >>>>>> Thanks for waiting on this. A couple of comments: >>>>>> >>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>> >>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>> >>>>> >>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? >>>> >>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. >>>> >>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. >>> >>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. >> >> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148 > > No, sorry, this change passes the 'tag'. nvm. > > Coleen >> >> Just waiting for some test changes. >> >> Coleen >> >>> Lois >>> >>>> >>>> igor >>>> >>>>> >>>>>> Just curious did you also run the testbase default methods tests? >>>>> >>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary. >>>>> >>>>> igor >>>>> >>>>>> Lois >>>>>> >>>>>>> >>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>> >>>>>>>> Hi Igor, >>>>>>>> >>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Lois >>>>>>>> >>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>> >>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From lois.foltan at oracle.com Tue Apr 5 22:12:37 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 18:12:37 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <570417E6.60107@oracle.com> <5704192A.3010807@oracle.com> <57041A5D.4040909@oracle.com> <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> Message-ID: <57043855.1000405@oracle.com> On 4/5/2016 6:08 PM, Igor Veresov wrote: > Coleen and Lois, > > So, ok to push? Yes, for me. Have you addressed all of Karen's concerns as well? Lois > > igor > > >> On Apr 5, 2016, at 1:04 PM, Coleen Phillimore wrote: >> >> >> >> On 4/5/16 3:59 PM, Coleen Phillimore wrote: >>> >>> On 4/5/16 3:54 PM, Lois Foltan wrote: >>>> On 4/5/2016 1:56 PM, Igor Veresov wrote: >>>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov > wrote: >>>>>> >>>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan > wrote: >>>>>>> >>>>>>> >>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>>> Hi Lois, >>>>>>>> >>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>>> >>>>>>>> igor >>>>>>> Hi Igor, >>>>>>> >>>>>>> Thanks for waiting on this. A couple of comments: >>>>>>> >>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>>> >>>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>>> >>>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? >>>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. >>>>> >>>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. >>>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. >>> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148 >> No, sorry, this change passes the 'tag'. nvm. >> >> Coleen >>> Just waiting for some test changes. >>> >>> Coleen >>> >>>> Lois >>>> >>>>> igor >>>>> >>>>>>> Just curious did you also run the testbase default methods tests? >>>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary. >>>>>> >>>>>> igor >>>>>> >>>>>>> Lois >>>>>>> >>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>>> >>>>>>>>> Hi Igor, >>>>>>>>> >>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Lois >>>>>>>>> >>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>>> >>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> igor From igor.veresov at oracle.com Tue Apr 5 22:16:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 15:16:48 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <57043855.1000405@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <570417E6.60107@oracle.com> <5704192A.3010807@oracle.com> <57041A5D.4040909@oracle.com> <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> <57043855.1000405@oracle.com> Message-ID: > On Apr 5, 2016, at 3:12 PM, Lois Foltan wrote: > > > On 4/5/2016 6:08 PM, Igor Veresov wrote: >> Coleen and Lois, >> >> So, ok to push? > Yes, for me. Have you addressed all of Karen's concerns as well? Right.. Karen, is it alright? igor > Lois > >> >> igor >> >> >>> On Apr 5, 2016, at 1:04 PM, Coleen Phillimore wrote: >>> >>> >>> >>> On 4/5/16 3:59 PM, Coleen Phillimore wrote: >>>> >>>> On 4/5/16 3:54 PM, Lois Foltan wrote: >>>>> On 4/5/2016 1:56 PM, Igor Veresov wrote: >>>>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov > wrote: >>>>>>> >>>>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan > wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>>>> Hi Lois, >>>>>>>>> >>>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>>>> >>>>>>>>> igor >>>>>>>> Hi Igor, >>>>>>>> >>>>>>>> Thanks for waiting on this. A couple of comments: >>>>>>>> >>>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>>>> >>>>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>>>> >>>>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? >>>>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. >>>>>> >>>>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. >>>>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. >>>> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148 >>> No, sorry, this change passes the 'tag'. nvm. >>> >>> Coleen >>>> Just waiting for some test changes. >>>> >>>> Coleen >>>> >>>>> Lois >>>>> >>>>>> igor >>>>>> >>>>>>>> Just curious did you also run the testbase default methods tests? >>>>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary. >>>>>>> >>>>>>> igor >>>>>>> >>>>>>>> Lois >>>>>>>> >>>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>>>> >>>>>>>>>> Hi Igor, >>>>>>>>>> >>>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Lois >>>>>>>>>> >>>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>>>> >>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> igor > From karen.kinnear at oracle.com Tue Apr 5 22:33:17 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 5 Apr 2016 18:33:17 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> Message-ID: <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> Igor, Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter for instance? If so, I am ok with checking this in - further notes below. > On Apr 5, 2016, at 3:43 PM, Igor Veresov wrote: > > >> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >> >> I am in agreement with Lois that the JVMS looks good with moving the exception. > > Thanks! >> >> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >> meeting I will check one more time. It might be worth adding a comment. > > Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ > Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. > >> >> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >> > > That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 > In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. That is ok with me - I will add a note to the bug. Also: I see a ciMethod::check_call that has a comment - IT appears to fail when applied to an invoke interface call site. FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take the subtleties of invoke interface and invoke special into account. > > igor > >> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >> so that you get the correct behavior depending on the requesting byte code. >> >> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >> I could use help studying this a bit more to understand if this really is resolution or is really selection. >> >> thanks, >> Karen >> >>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>> >>> >>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>> Hi Lois, >>>> >>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>> >>>> igor >>> Hi Igor, >>> >>> Thanks for waiting on this. A couple of comments: >>> >>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>> >>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>> >>> Just curious did you also run the testbase default methods tests? >>> Lois >>> >>>> >>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>> >>>>> Thanks, >>>>> Lois >>>>> >>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>> >>>>>> Thanks, >>>>>> igor >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Tue Apr 5 23:22:37 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 16:22:37 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> Message-ID: <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> > On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: > > Igor, > > Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter > for instance? Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. > > If so, I am ok with checking this in - further notes below. > >> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >> >> >>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>> >>> I am in agreement with Lois that the JVMS looks good with moving the exception. >> >> Thanks! >>> >>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>> meeting I will check one more time. It might be worth adding a comment. >> >> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >> >>> >>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>> >> >> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). > > Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match > the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. > That is ok with me - I will add a note to the bug. Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? > > Also: I see a ciMethod::check_call that has a comment - > IT appears to fail when applied to an invoke interface call site. > FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. > This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? igor > Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take > the subtleties of invoke interface and invoke special into account. >> >> igor >> >>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>> so that you get the correct behavior depending on the requesting byte code. >>> >>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>> >>> thanks, >>> Karen >>> >>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>> >>>> >>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>> Hi Lois, >>>>> >>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>> >>>>> igor >>>> Hi Igor, >>>> >>>> Thanks for waiting on this. A couple of comments: >>>> >>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>> >>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>> >>>> Just curious did you also run the testbase default methods tests? >>>> Lois >>>> >>>>> >>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>> >>>>>> Hi Igor, >>>>>> >>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>> >>>>>> Thanks, >>>>>> Lois >>>>>> >>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>> >>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>> >>>>>>> Thanks, >>>>>>> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Apr 5 23:52:57 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 5 Apr 2016 13:52:57 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: > On Apr 5, 2016, at 4:16 AM, Doug Simon wrote: > >> >> On 05 Apr 2016, at 02:00, Christian Thalinger wrote: >> >> >>> On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: >>> >>> No, not good. We are failing a couple JVMCI tests. Looking into it? >> >> Ok, this got a little out of control but for the better: >> >> http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ >> >> The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. >> >> While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. >> >> Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. > > I agree. Can you modify your derivative webrev for that? Of course, we wouldn?t need the cast to HotSpotSpeculationLog in HotSpotCodeCacheProvider once you?ve made that change. Sure. Here is the new webrev: http://cr.openjdk.java.net/~twisti/8153439/webrev.02/ > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Wed Apr 6 01:37:30 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 5 Apr 2016 15:37:30 -1000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> Message-ID: <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> > On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R wrote: > > Hi Christian > > We have updated the patch as per the suggested changes. > The webrev for the same is at this location for your review. > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ There are: 73 #ifdef _LP64 368 #else 655 #endif in the new files but I don?t see them share any code. Maybe it would be better to have dedicated x86_32 and x86_64 files. Then the ifdefs are not required. > > We will soon send another patch for CompilerDirectives changes. > > Regards, > Vivek > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Tuesday, March 29, 2016 11:29 AM > To: Rukmannagari, Shravya > Cc: Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > > On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > wrote: > > Hi Christian, > We would add separate files for each intrinsic. By splitting the CompilerDirectives, do you mean we have to add a separate file. Sorry I didn?t exactly get it. > > Oh, sorry, I wasn?t clear enough. Please file a new enhancement for the CompilerDirectives changes and integrate them separately. > > > > Thanks, > Shravya Rukmannagari. > ? <> > From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Monday, March 28, 2016 5:18 PM > To: Deshpande, Vivek R > > Cc: hotspot compiler >; Vladimir Kozlov >; Rukmannagari, Shravya > > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I left this comment in the bug: > > I think for the saneness of the macroAssembler_libm_x86_*.cpp files we should put every intrinsic in its own file, like we did for macroAssembler_x86_sha.cpp. They are already too big: > > $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp > 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp > 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp > > Also, can we split out the CompilerDirectives changes? > > > > On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > wrote: > > Hi all > > We would like to contribute a patch which optimizes tan and log10 X86 architecture using Intel LIBM library. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8152907 > webrev: > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 6 01:46:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 18:46:44 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> Message-ID: <57046A84.6040707@oracle.com> Multiple files are not always good. May be in a future we can rewrite this code to use shared parts (code or data). I think current split is enough for these changes. Thanks, Vladimir On 4/5/16 6:37 PM, Christian Thalinger wrote: > >> On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R >> > wrote: >> >> Hi Christian >> We have updated the patch as per the suggested changes. >> The webrev for the same is at this location for your review. >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ > > There are: > > 73 #ifdef _LP64 > > 368 #else > > 655 #endif > > in the new files but I don?t see them share any code. Maybe it would be > better to have dedicated x86_32 and x86_64 files. Then the ifdefs are > not required. > >> We will soon send another patch for CompilerDirectives changes. >> Regards, >> Vivek >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Tuesday, March 29, 2016 11:29 AM >> *To:*Rukmannagari, Shravya >> *Cc:*Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >> > > wrote: >> Hi Christian, >> We would add separate files for each intrinsic. By splitting the >> CompilerDirectives, do you mean we have to add a separate file. >> Sorry I didn?t exactly get it. >> >> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >> the CompilerDirectives changes and integrate them separately. >> >> >> Thanks, >> Shravya Rukmannagari. >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Monday, March 28, 2016 5:18 PM >> *To:*Deshpande, Vivek R > > >> *Cc:*hotspot compiler > >; Vladimir Kozlov >> >; >> Rukmannagari, Shravya > > >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> I left this comment in the bug: >> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we >> should put every intrinsic in its own file, like we did for >> macroAssembler_x86_sha.cpp. They are already too big: >> >> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >> Also, can we split out the CompilerDirectives changes? >> >> >> >> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >> > >> wrote: >> Hi all >> We would like to contribute a patch which optimizestan and log10 >> X86architecture usingIntel LIBM library. >> Could you please review and sponsor this patch. >> Bug-id: >> https://bugs.openjdk.java.net/browse/JDK-8152907 >> webrev: >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ >> Thanks and regards, >> Vivek > From jamsheed.c.m at oracle.com Wed Apr 6 08:10:52 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Wed, 6 Apr 2016 13:40:52 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> Message-ID: <5704C48C.2070502@oracle.com> Thanks for the reply. trying to understand stuffs. > void nmethod::add_handler_for_exception_and_pc(Handle exception, > address pc, address handler) { > // There are potential race conditions during exception cache > updates, so we > // must own the ExceptionCache_lock before doing ANY modifications. > Because > // we don't lock during reads, it is possible to have several > threads attempt > // to update the cache with the same data. We need to check for > already inserted > // copies of the current data before adding it. > > MutexLocker ml(ExceptionCache_lock); > ExceptionCache* target_entry = > exception_cache_entry_for_exception(exception); > > if (target_entry == NULL || > !target_entry->add_address_and_handler(pc,handler)) { > target_entry = new ExceptionCache(exception,pc,handler); > add_exception_cache_entry(target_entry); > } > } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). > address nmethod::handler_for_exception_and_pc(Handle exception, > address pc) { > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. > ExceptionCache* ec = exception_cache(); > while (ec != NULL) { > address ret_val; > if ((ret_val = ec->match(exception,pc)) != NULL) { > return ret_val; > } > ec = ec->next(); > } > return NULL; > } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > thanks for pointing me to it. Interesting that you have found such a > problem so shortly before me J > > My webrev addresses some aspects which are not covered by your fix: > > -add_handler_for_exception_and_pc adds a new ExceptionCache instance > in the other case. They need to get released as well. > > -The readers of the _exception_cache field are not safe, yet. As > Andrew Haley pointed out, optimizers may modify load accesses for > non-volatile fields. > > So I think my change is still needed. > > And after taking a closer look at your change, I think the _count > field which is addressed by your fix needs to be volatile as well. I > can incorporate that in my change if you like. > > Would you agree? > > Best regards, > > Martin > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of > *Jamsheed C m > *Sent:* Montag, 4. April 2016 08:14 > *To:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > "nmethod's exception cache not multi-thread safe" bug is fixed in b107 > bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 > fix changeset: > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 > discussion link: > http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html > > Best Regards, > Jamsheed > > On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s exception > cache. Readers of the cache may read stale data on weak memory > platforms. > > The writers of the cache are synchronized by locks, but there may > be concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache without > locking. > > Therefore, the nmethod's field _exception_cache needs to be > volatile and adding new entries must be done by releasing stores. > (Loading seems to be fine without acquire because there's an > address dependency from the load of the cache to the usage of its > contents which is sufficient to ensure ordering on all openjdk > platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to read > the volatile field _state only once. It is certainly undesired to > force the compiler to load it from memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Wed Apr 6 09:19:26 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 6 Apr 2016 09:19:26 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <5704C48C.2070502@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> Message-ID: Hi Jamsheed, here are the cases of add_handler_for_exception_and_pc we should talk about: Case 1: A new ExceptionCache instance needs to get added. The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below. The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL. One could argue that this is not critical, but I guess this was not intended? At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do). I think releasing the completely initialized ExceptionCache instance is a much cleaner design. Case 2: An existing ExceptionCache instance gets a new entry. In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths. I have added the acquire barrier for the _count field here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ Does this answer your questions or is anything still unclear? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 10:11 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thanks for the reply. trying to understand stuffs. void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) { // There are potential race conditions during exception cache updates, so we // must own the ExceptionCache_lock before doing ANY modifications. Because // we don't lock during reads, it is possible to have several threads attempt // to update the cache with the same data. We need to check for already inserted // copies of the current data before adding it. MutexLocker ml(ExceptionCache_lock); ExceptionCache* target_entry = exception_cache_entry_for_exception(exception); if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) { target_entry = new ExceptionCache(exception,pc,handler); add_exception_cache_entry(target_entry); } } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). address nmethod::handler_for_exception_and_pc(Handle exception, address pc) { // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. ExceptionCache* ec = exception_cache(); while (ec != NULL) { address ret_val; if ((ret_val = ec->match(exception,pc)) != NULL) { return ret_val; } ec = ec->next(); } return NULL; } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Wed Apr 6 10:53:14 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Apr 2016 12:53:14 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of Message-ID: <5704EA9A.7020202@oracle.com> Hi, please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: https://bugs.openjdk.java.net/browse/JDK-8153514 http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. Thanks, Tobias From zoltan.majo at oracle.com Wed Apr 6 10:59:08 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 6 Apr 2016 12:59:08 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704EA9A.7020202@oracle.com> References: <5704EA9A.7020202@oracle.com> Message-ID: <5704EBFC.5010804@oracle.com> Hi Tobias, that looks good to me! Best regards, Zoltan On 04/06/2016 12:53 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: > > https://bugs.openjdk.java.net/browse/JDK-8153514 > http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ > http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ > > The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. > > I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. > > Thanks, > Tobias From tobias.hartmann at oracle.com Wed Apr 6 11:05:52 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Apr 2016 13:05:52 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704EBFC.5010804@oracle.com> References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com> Message-ID: <5704ED90.6050803@oracle.com> Thanks, Zoltan! Best regards, Tobias On 06.04.2016 12:59, Zolt?n Maj? wrote: > Hi Tobias, > > > that looks good to me! > > Best regards, > > > Zoltan > > > On 04/06/2016 12:53 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >> >> https://bugs.openjdk.java.net/browse/JDK-8153514 >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >> >> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >> >> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >> >> Thanks, >> Tobias > From igor.ignatyev at oracle.com Wed Apr 6 11:38:42 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 6 Apr 2016 14:38:42 +0300 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704ED90.6050803@oracle.com> References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com> <5704ED90.6050803@oracle.com> Message-ID: Hi Tobias, looks good to me, thanks for implementing this. Thanks, ? Igor > On Apr 6, 2016, at 2:05 PM, Tobias Hartmann wrote: > > Thanks, Zoltan! > > Best regards, > Tobias > > On 06.04.2016 12:59, Zolt?n Maj? wrote: >> Hi Tobias, >> >> >> that looks good to me! >> >> Best regards, >> >> >> Zoltan >> >> >> On 04/06/2016 12:53 PM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153514 >>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >>> >>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >>> >>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >>> >>> Thanks, >>> Tobias >> From tobias.hartmann at oracle.com Wed Apr 6 11:39:42 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Apr 2016 13:39:42 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com> <5704ED90.6050803@oracle.com> Message-ID: <5704F57E.5000505@oracle.com> Thanks, Igor! Best regards, Tobias On 06.04.2016 13:38, Igor Ignatyev wrote: > Hi Tobias, > > looks good to me, thanks for implementing this. > > Thanks, > ? Igor >> On Apr 6, 2016, at 2:05 PM, Tobias Hartmann wrote: >> >> Thanks, Zoltan! >> >> Best regards, >> Tobias >> >> On 06.04.2016 12:59, Zolt?n Maj? wrote: >>> Hi Tobias, >>> >>> >>> that looks good to me! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>> >>> On 04/06/2016 12:53 PM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153514 >>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >>>> >>>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >>>> >>>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >>>> >>>> Thanks, >>>> Tobias >>> > From jamsheed.c.m at oracle.com Wed Apr 6 11:54:02 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Wed, 6 Apr 2016 17:24:02 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> Message-ID: <5704F8DA.9030000@oracle.com> Hi Martin, On 4/6/2016 2:49 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > here are the cases of add_handler_for_exception_and_pc we should talk > about: > > Case 1: A new ExceptionCache instance needs to get added. > > The storestore barrier you have added is used in the constructor of > the ExceptionCache and it releases the most critical fields of it. I > think this is what you explained in [1] in your email below. > > The new values of _count and _next fields are written afterwards and > hence not covered by this release barrier. Readers of the > _exception_cache may read _count==0 or _next==NULL. > > One could argue that this is not critical, but I guess this was not > intended? > > At least the _exception_cache field needs to be volatile to prevent > optimizers from breaking anything. This is always needed for fields > which are accessed concurrently by multiple threads without locks (as > the readers do). > > I think releasing the completely initialized ExceptionCache instance > is a much cleaner design. > Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again). Quoting comment from read path. > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. Weak memory platforms may have a few more false negatives. but isn't that OK ? This helps us, as we can remove volatile from picture, and actually good for read paths. > Case 2: An existing ExceptionCache instance gets a new entry. > > In this case your storestore barrier is good to release all updated > fields. However, we need to consider the readers, too. The _count > field needs to be volatile and the load must acquire. Otherwise, stale > data may get read by processors which perform loads on speculative paths. > storestore mem barrier handles this, as count <= no of real entries. and there is always locked slow path to check again. As said before, there may be a few more false negatives in weak memory platforms than strong ones. Best Regards, Jamsheed > I have added the acquire barrier for the _count field here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ > > > Does this answer your questions or is anything still unclear? > > Best regards, > > Martin > > *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com] > *Sent:* Mittwoch, 6. April 2016 10:11 > *To:* Doerr, Martin ; > hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Thanks for the reply. trying to understand stuffs. > > > void nmethod::add_handler_for_exception_and_pc(Handle exception, > address pc, address handler) { > // There are potential race conditions during exception cache > updates, so we > // must own the ExceptionCache_lock before doing ANY > modifications. Because > // we don't lock during reads, it is possible to have several > threads attempt > // to update the cache with the same data. We need to check for > already inserted > // copies of the current data before adding it. > > MutexLocker ml(ExceptionCache_lock); > ExceptionCache* target_entry = > exception_cache_entry_for_exception(exception); > > if (target_entry == NULL || > !target_entry->add_address_and_handler(pc,handler)) { > target_entry = new ExceptionCache(exception,pc,handler); > add_exception_cache_entry(target_entry); > } > } > > > [1]there is a storestore mem barrier before count is updated in > add_address_and_handler > this ensure exception pc and handler address are updated before count > is incremented and Exception cache entry is updated at ( > nm->_exception_cache or in the list ec->_next ). > > > address nmethod::handler_for_exception_and_pc(Handle exception, > address pc) { > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. > ExceptionCache* ec = exception_cache(); > while (ec != NULL) { > address ret_val; > if ((ret_val = ec->match(exception,pc)) != NULL) { > return ret_val; > } > ec = ec->next(); > } > return NULL; > } > > > and in read logic. we first check ec entry is available (non null > check) before proceeding further. > if ec is non null and ec_type,excpetion pc, and handler are available > by[1]. though count can be reordered and not updated with new value. > > this fixes the issue. why you think it doesn't? > > Best Regards, > Jamsheed > > On 4/5/2016 3:40 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > thanks for pointing me to it. Interesting that you have found such > a problem so shortly before me J > > My webrev addresses some aspects which are not covered by your fix: > > -add_handler_for_exception_and_pc adds a new ExceptionCache > instance in the other case. They need to get released as well. > > -The readers of the _exception_cache field are not safe, yet. As > Andrew Haley pointed out, optimizers may modify load accesses for > non-volatile fields. > > So I think my change is still needed. > > And after taking a closer look at your change, I think the _count > field which is addressed by your fix needs to be volatile as well. > I can incorporate that in my change if you like. > > Would you agree? > > Best regards, > > Martin > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf > Of *Jamsheed C m > *Sent:* Montag, 4. April 2016 08:14 > *To:* hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > "nmethod's exception cache not multi-thread safe" bug is fixed in > b107 > bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 > fix changeset: > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 > discussion link: > http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html > > Best Regards, > Jamsheed > > On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s > exception cache. Readers of the cache may read stale data on > weak memory platforms. > > The writers of the cache are synchronized by locks, but there > may be concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache > without locking. > > Therefore, the nmethod's field _exception_cache needs to be > volatile and adding new entries must be done by releasing > stores. (Loading seems to be fine without acquire because > there's an address dependency from the load of the cache to > the usage of its contents which is sufficient to ensure > ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to > read the volatile field _state only once. It is certainly > undesired to force the compiler to load it from memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Wed Apr 6 13:24:54 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 6 Apr 2016 13:24:54 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <5704F8DA.9030000@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> Message-ID: Hi Jamsheed and all, thanks for your explanation. About Case 1: I basically agree with that reading _next==NULL or _count==0 only leads to false negatives and is not critical. Yes, we could live with a few more false negatives on weak memory model platforms (even though this is not my preferred design). About Case 2: What I'm missing on the reader's side of the _count field is something which prevents processors from speculatively loading the contents of the ExceptionCache. In ExceptionCache::test_address, the _count only affects the control flow. PPC and ARM processors can predict branches which depend on the _count field and load speculatively from the pc and handler fields (which may be stale data!). Due to out-of-order execution of the loads, it can actually happen, that the new _count value is observed, but stale data is read from pc and handler fields. I guess it is highly unlikely that we will ever observe this, but there's no guarantee. I think my concern about using non-volatile fields for the _exception_cache is also still valid. Nothing prevents C++ Compilers from loading the pointer twice from memory. They may expect to get the pointer to the same instance both times but actually get two different ones. For example, this may lead to the situation that handler_for_exception_and_pc uses one ExceptionCache instance for calling the match function and another one (du to reload of non-volatile field) for calling next(). May other people would like to comment on this lengthy discussion as well? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 13:54 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, On 4/6/2016 2:49 PM, Doerr, Martin wrote: Hi Jamsheed, here are the cases of add_handler_for_exception_and_pc we should talk about: Case 1: A new ExceptionCache instance needs to get added. The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below. The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL. One could argue that this is not critical, but I guess this was not intended? At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do). I think releasing the completely initialized ExceptionCache instance is a much cleaner design. Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again). Quoting comment from read path. // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. Weak memory platforms may have a few more false negatives. but isn't that OK ? This helps us, as we can remove volatile from picture, and actually good for read paths. Case 2: An existing ExceptionCache instance gets a new entry. In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths. storestore mem barrier handles this, as count <= no of real entries. and there is always locked slow path to check again. As said before, there may be a few more false negatives in weak memory platforms than strong ones. Best Regards, Jamsheed I have added the acquire barrier for the _count field here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ Does this answer your questions or is anything still unclear? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 10:11 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thanks for the reply. trying to understand stuffs. void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) { // There are potential race conditions during exception cache updates, so we // must own the ExceptionCache_lock before doing ANY modifications. Because // we don't lock during reads, it is possible to have several threads attempt // to update the cache with the same data. We need to check for already inserted // copies of the current data before adding it. MutexLocker ml(ExceptionCache_lock); ExceptionCache* target_entry = exception_cache_entry_for_exception(exception); if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) { target_entry = new ExceptionCache(exception,pc,handler); add_exception_cache_entry(target_entry); } } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). address nmethod::handler_for_exception_and_pc(Handle exception, address pc) { // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. ExceptionCache* ec = exception_cache(); while (ec != NULL) { address ret_val; if ((ret_val = ec->match(exception,pc)) != NULL) { return ret_val; } ec = ec->next(); } return NULL; } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Wed Apr 6 16:01:19 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Apr 2016 17:01:19 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: Add Arrays.fill stub code In-Reply-To: References: Message-ID: <570532CF.1050903@redhat.com> On 04/06/2016 01:51 PM, Long Chen wrote: > Please review this patch for generating stub code for ArrayFill on aarch64 > platform. > > Performance test case: > http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.java > Testing result: http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.html > Patch: http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.patch > > At same time, refactoring ClearArrayNode's code generation, as it can be > used by Array fill too. Looks good. Minor nit: + // Generate stub for disjoint fill. If "aligned" is true, the + // "to" address is assumed to be heapword aligned. "disjoint" doesn't make any sense here. > Following up I would like to propose a patch to use DC ZVA for large array > zeroing. OK. I haven't seen much advantage to using DC ZVA, but I'm prepared to listen. Andrew. From vladimir.kozlov at oracle.com Wed Apr 6 16:50:34 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Apr 2016 09:50:34 -0700 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704EA9A.7020202@oracle.com> References: <5704EA9A.7020202@oracle.com> Message-ID: <57053E5A.6050705@oracle.com> Looks good. Thanks, Vladimir On 4/6/16 3:53 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: > > https://bugs.openjdk.java.net/browse/JDK-8153514 > http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ > http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ > > The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. > > I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. > > Thanks, > Tobias > From vivek.r.deshpande at intel.com Wed Apr 6 17:14:58 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 6 Apr 2016 17:14:58 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <5704312C.9000605@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> <5704312C.9000605@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> Hi Vladimir Please let me know, if I need to provide an updated patch with this change. Thanks for all your help. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 05, 2016 2:42 PM To: Deshpande, Vivek R; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: error: use of undeclared identifier 'SharedRuntime' __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); Note templateInterpreterGenerator_x86_32.cpp has that #include. It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. Vladimir On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: > HI Vladimir > > Sorry about that. > Please check this webrev > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02 > / > I have updated it. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:47 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I again can't apply changes because of CR at the end of lines in patch file. > > Vladimir > > On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >> >> Hi Vladimir >> >> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >> Thank you for the review. >> >> Regards, >> Vivek >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:34 PM >> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >> >> I will start pre-integration testing. >> >> Thanks, >> Vladimir >> >> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>> Hi Christian >>> >>> We have updated the patch as per the suggested changes. >>> >>> The webrev for the same is at this location for your review. >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>> 0 >>> 1 >>> / >>> >>> We will soon send another patch for CompilerDirectives changes. >>> >>> Regards, >>> >>> Vivek >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>> *To:* Rukmannagari, Shravya >>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>> >> > wrote: >>> >>> Hi Christian, >>> >>> We would add separate files for each intrinsic. By splitting the >>> CompilerDirectives, do you mean we have to add a separate file. >>> Sorry I didn?t exactly get it. >>> >>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>> the CompilerDirectives changes and integrate them separately. >>> >>> >>> >>> Thanks, >>> >>> Shravya Rukmannagari. >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>> > >>> *Cc:*hotspot compiler >> >; Vladimir Kozlov >>> >; >>> Rukmannagari, Shravya >> > >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> I left this comment in the bug: >>> >>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>> we should put every intrinsic in its own file, like we did for >>> macroAssembler_x86_sha.cpp. They are already too big: >>> >>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>> >>> Also, can we split out the CompilerDirectives changes? >>> >>> >>> >>> >>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>> > >>> wrote: >>> >>> Hi all >>> >>> We would like to contribute a patch which optimizestan and log10 >>> X86architecture usingIntel LIBM library. >>> >>> Could you please review and sponsor this patch. >>> >>> Bug-id: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>> webrev: >>> >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>> 0 >>> 0 >>> / >>> >>> Thanks and regards, >>> >>> Vivek >>> From christian.thalinger at oracle.com Wed Apr 6 17:14:56 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 6 Apr 2016 07:14:56 -1000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <57046A84.6040707@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> <57046A84.6040707@oracle.com> Message-ID: > On Apr 5, 2016, at 3:46 PM, Vladimir Kozlov wrote: > > Multiple files are not always good. May be in a future we can rewrite this code to use shared parts (code or data). I think current split is enough for these changes. Alright. > > Thanks, > Vladimir > > On 4/5/16 6:37 PM, Christian Thalinger wrote: >> >>> On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R >>> > wrote: >>> >>> Hi Christian >>> We have updated the patch as per the suggested changes. >>> The webrev for the same is at this location for your review. >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ >> >> There are: >> >> 73 #ifdef _LP64 >> >> 368 #else >> >> 655 #endif >> >> in the new files but I don?t see them share any code. Maybe it would be >> better to have dedicated x86_32 and x86_64 files. Then the ifdefs are >> not required. >> >>> We will soon send another patch for CompilerDirectives changes. >>> Regards, >>> Vivek >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Tuesday, March 29, 2016 11:29 AM >>> *To:*Rukmannagari, Shravya >>> *Cc:*Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>> >> > wrote: >>> Hi Christian, >>> We would add separate files for each intrinsic. By splitting the >>> CompilerDirectives, do you mean we have to add a separate file. >>> Sorry I didn?t exactly get it. >>> >>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>> the CompilerDirectives changes and integrate them separately. >>> >>> >>> Thanks, >>> Shravya Rukmannagari. >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Monday, March 28, 2016 5:18 PM >>> *To:*Deshpande, Vivek R >> > >>> *Cc:*hotspot compiler >> >; Vladimir Kozlov >>> >; >>> Rukmannagari, Shravya >> > >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> I left this comment in the bug: >>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we >>> should put every intrinsic in its own file, like we did for >>> macroAssembler_x86_sha.cpp. They are already too big: >>> >>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>> Also, can we split out the CompilerDirectives changes? >>> >>> >>> >>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>> > >>> wrote: >>> Hi all >>> We would like to contribute a patch which optimizestan and log10 >>> X86architecture usingIntel LIBM library. >>> Could you please review and sponsor this patch. >>> Bug-id: >>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>> webrev: >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ >>> Thanks and regards, >>> Vivek >> From vladimir.kozlov at oracle.com Wed Apr 6 17:17:47 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Apr 2016 10:17:47 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> <5704312C.9000605@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> Message-ID: <570544BB.6040509@oracle.com> I added #include myself and PIT testing passed. I need to know who is author or contributor. Thanks, Vladimir On 4/6/16 10:14 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please let me know, if I need to provide an updated patch with this change. > Thanks for all your help. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 2:42 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: > > hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: > error: use of undeclared identifier 'SharedRuntime' > __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); > > Note templateInterpreterGenerator_x86_32.cpp has that #include. > > It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. > > Vladimir > > > > On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: >> HI Vladimir >> >> Sorry about that. >> Please check this webrev >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02 >> / >> I have updated it. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:47 PM >> To: Deshpande, Vivek R; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I again can't apply changes because of CR at the end of lines in patch file. >> >> Vladimir >> >> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >>> >>> Hi Vladimir >>> >>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >>> Thank you for the review. >>> >>> Regards, >>> Vivek >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, April 05, 2016 1:34 PM >>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >>> Cc: hotspot compiler >>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >>> >>> I will start pre-integration testing. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>>> Hi Christian >>>> >>>> We have updated the patch as per the suggested changes. >>>> >>>> The webrev for the same is at this location for your review. >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 1 >>>> / >>>> >>>> We will soon send another patch for CompilerDirectives changes. >>>> >>>> Regards, >>>> >>>> Vivek >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>>> *To:* Rukmannagari, Shravya >>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>>> >>> > wrote: >>>> >>>> Hi Christian, >>>> >>>> We would add separate files for each intrinsic. By splitting the >>>> CompilerDirectives, do you mean we have to add a separate file. >>>> Sorry I didn?t exactly get it. >>>> >>>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>>> the CompilerDirectives changes and integrate them separately. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Shravya Rukmannagari. >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>>> > >>>> *Cc:*hotspot compiler >>> >; Vladimir Kozlov >>>> >; >>>> Rukmannagari, Shravya >>> > >>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> I left this comment in the bug: >>>> >>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>>> we should put every intrinsic in its own file, like we did for >>>> macroAssembler_x86_sha.cpp. They are already too big: >>>> >>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>>> >>>> Also, can we split out the CompilerDirectives changes? >>>> >>>> >>>> >>>> >>>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>>> > >>>> wrote: >>>> >>>> Hi all >>>> >>>> We would like to contribute a patch which optimizestan and log10 >>>> X86architecture usingIntel LIBM library. >>>> >>>> Could you please review and sponsor this patch. >>>> >>>> Bug-id: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>>> webrev: >>>> >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 0 >>>> / >>>> >>>> Thanks and regards, >>>> >>>> Vivek >>>> From vivek.r.deshpande at intel.com Wed Apr 6 17:24:01 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 6 Apr 2016 17:24:01 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <570544BB.6040509@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> <5704312C.9000605@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> <570544BB.6040509@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB87@ORSMSX106.amr.corp.intel.com> Thanks Vladimir. Code contributed by: Shravya Rukmannagari (shravya.rukmannagari at intel.com) and Vivek Deshpande (vivek.r.deshpande at intel.com) Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 06, 2016 10:18 AM To: Deshpande, Vivek R; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 I added #include myself and PIT testing passed. I need to know who is author or contributor. Thanks, Vladimir On 4/6/16 10:14 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please let me know, if I need to provide an updated patch with this change. > Thanks for all your help. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 2:42 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: > > hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: > error: use of undeclared identifier 'SharedRuntime' > __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, > SharedRuntime::dexp))); > > Note templateInterpreterGenerator_x86_32.cpp has that #include. > > It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. > > Vladimir > > > > On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: >> HI Vladimir >> >> Sorry about that. >> Please check this webrev >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >> 2 >> / >> I have updated it. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:47 PM >> To: Deshpande, Vivek R; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I again can't apply changes because of CR at the end of lines in patch file. >> >> Vladimir >> >> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >>> >>> Hi Vladimir >>> >>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >>> Thank you for the review. >>> >>> Regards, >>> Vivek >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, April 05, 2016 1:34 PM >>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >>> Cc: hotspot compiler >>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >>> >>> I will start pre-integration testing. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>>> Hi Christian >>>> >>>> We have updated the patch as per the suggested changes. >>>> >>>> The webrev for the same is at this location for your review. >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 1 >>>> / >>>> >>>> We will soon send another patch for CompilerDirectives changes. >>>> >>>> Regards, >>>> >>>> Vivek >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>>> *To:* Rukmannagari, Shravya >>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>>> >>> > wrote: >>>> >>>> Hi Christian, >>>> >>>> We would add separate files for each intrinsic. By splitting the >>>> CompilerDirectives, do you mean we have to add a separate file. >>>> Sorry I didn?t exactly get it. >>>> >>>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement >>>> for the CompilerDirectives changes and integrate them separately. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Shravya Rukmannagari. >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>>> > >>>> *Cc:*hotspot compiler >>> >; Vladimir Kozlov >>>> >; >>>> Rukmannagari, Shravya >>> > >>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> I left this comment in the bug: >>>> >>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>>> we should put every intrinsic in its own file, like we did for >>>> macroAssembler_x86_sha.cpp. They are already too big: >>>> >>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>>> >>>> Also, can we split out the CompilerDirectives changes? >>>> >>>> >>>> >>>> >>>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>>> > >>>> wrote: >>>> >>>> Hi all >>>> >>>> We would like to contribute a patch which optimizestan and log10 >>>> X86architecture usingIntel LIBM library. >>>> >>>> Could you please review and sponsor this patch. >>>> >>>> Bug-id: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>>> webrev: >>>> >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 0 >>>> / >>>> >>>> Thanks and regards, >>>> >>>> Vivek >>>> From igor.veresov at oracle.com Wed Apr 6 17:49:31 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 6 Apr 2016 10:49:31 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> Message-ID: <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately. Thanks, igor > On Apr 5, 2016, at 4:22 PM, Igor Veresov wrote: > > >> On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: >> >> Igor, >> >> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter >> for instance? > > Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. > >> >> If so, I am ok with checking this in - further notes below. >> >>> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >>> >>> >>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>>> >>>> I am in agreement with Lois that the JVMS looks good with moving the exception. >>> >>> Thanks! >>>> >>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>>> meeting I will check one more time. It might be worth adding a comment. >>> >>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >>> >>>> >>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>>> >>> >>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). >> >> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match >> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. >> That is ok with me - I will add a note to the bug. > > Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? > >> >> Also: I see a ciMethod::check_call that has a comment - >> IT appears to fail when applied to an invoke interface call site. >> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. >> > > This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? > > igor > >> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take >> the subtleties of invoke interface and invoke special into account. >>> >>> igor >>> >>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>>> so that you get the correct behavior depending on the requesting byte code. >>>> >>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>>> >>>> thanks, >>>> Karen >>>> >>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>>> >>>>> >>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>> Hi Lois, >>>>>> >>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>> >>>>>> igor >>>>> Hi Igor, >>>>> >>>>> Thanks for waiting on this. A couple of comments: >>>>> >>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>> >>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>> >>>>> Just curious did you also run the testbase default methods tests? >>>>> Lois >>>>> >>>>>> >>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>> >>>>>>> Hi Igor, >>>>>>> >>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>> >>>>>>> Thanks, >>>>>>> Lois >>>>>>> >>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>> >>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>> >>>>>>>> Thanks, >>>>>>>> igor > From tobias.hartmann at oracle.com Thu Apr 7 06:13:06 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 7 Apr 2016 08:13:06 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <57053E5A.6050705@oracle.com> References: <5704EA9A.7020202@oracle.com> <57053E5A.6050705@oracle.com> Message-ID: <5705FA72.4030304@oracle.com> Thanks, Vladimir. Best regards, Tobias On 06.04.2016 18:50, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/6/16 3:53 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >> >> https://bugs.openjdk.java.net/browse/JDK-8153514 >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >> >> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >> >> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >> >> Thanks, >> Tobias >> From rahul.v.raghavan at oracle.com Thu Apr 7 06:43:08 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 6 Apr 2016 23:43:08 -0700 (PDT) Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Message-ID: > -----Original Message----- > From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: Tuesday, April 05, 2016 1:35 AM > > FYI > > -----Original Message----- > From: Berg, Michael C > Sent: Monday, April 04, 2016 12:42 PM > To: 'Rahul Raghavan' > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > > Looks ok Rahul. Thank you Michael. > > Thanks, > Michael > > -----Original Message----- > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > Sent: Monday, April 04, 2016 1:09 AM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Dean Long ; Berg, Michael C ; Tobias Hartmann > ; Vladimir Ivanov > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > > Hi, > > Please review the revised fix for JDK- 8149488. > > : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ > > Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting > the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). > > Points from Michael for the record - " > > I believe Dean is right, I have debugged this and analyzed the usage model, > > we never made use of the upper components > > and register allocation has been right for VecZ for a good deal of time. > > > > All we need for a change is, > > Regmask.cpp: > > > > uint RegMask::Size() const { > > extern uint8_t bitsInByte[256]; > > > > A one line change. > > > > -Michael. > > > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > > where we make use of VecZ and the upper bank of registers." > > So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. > > Confirmed no issues with 'JPRT -testset hotspot' run. > > Thanks, > Rahul > > > -----Original Message----- > > From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > > > > Michael, isn't the correct size for this table 256? I missed how VecZ > > relates to the table size. > > > > dl > > > > On 3/31/2016 9:58 AM, Berg, Michael C wrote: > > > Up until now we have gotten along with the size constraint only. > > > Let us have both the size and the table though for completeness. > > > I think we can leave the name though. > > > > > > -Michael > > > > > > -----Original Message----- > > > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > > > Sent: Thursday, March 31, 2016 9:18 AM > > > To: Dean Long ; > > > hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > > > > > > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in > > > regmask.cpp > > > > > > Hi Michael, > > > > > > With respect to below thread, request help with some questions. > > > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet > Size. > > > Also comment got was for requirement to extend bitsInByte table to > > > 512 size, for consistent mapping for VecZ register also, on > > targets that support it. > > > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > > > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > > > > > > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > > > (without extending current bitsInByte array contents) (Anyhow at > > > present values above 0xFF is never indexed for bitsInByte in > > RegMask::Size()) > > > > > > ----- src/share/vm/libadt/vectset.hpp > > > +#define BITS_IN_BYTE_ARRAY_SIZE 256 > > > + > > > > > > ----- src/share/vm/opto/regmask.cpp > > > - extern uint8_t bitsInByte[512]; > > > + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > > > > > > ----- src/share/vm/libadt/vectset.cpp > > > -uint8_t bitsInByte[256] = { > > > +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > > > > > > I can send revised webrev for above if all okay. Please tell me if I am missing something. > > > > > > > > > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > > > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > > > > > > Thanks, > > > Rahul > > > > > >> -----Original Message----- > > >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > > >> > > >>> -----Original Message----- > > >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > > >>> > > >>> When do we access elements 256 .. 511? Wouldn't that mean we have > > >>> 9-bit bytes? > > >> Got your point Dean, Thanks. > > >> I too got some questions here now; will check and reply soon. > > >> > > >> -Rahul > > >> > > >>> dl > > >>> > > >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > > >>>> Hi, > > >>>> > > >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > > >>>> > > >>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > > >>>> : > > >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > > >>>> > > >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > > >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > > >>>> Confirmed no issues with 'JPRT -testset hotspot' run. > > >>>> > > >>>> Thanks, > > >>>> Rahul > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > > >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > > >>> dev at openjdk.java.net > > >>>>> Should we not extend: > > >>>>> > > >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > > >>>>> uint8_t bitsInByte[256] = { // ... > > >>>>> > > >>>>> to 512 > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > > >>>>> > > >>>>> So how do we intend to map a VecZ register without 512 bits? > > >>>>> > > >>>>> -Michael > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: hotspot-compiler-dev > > >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > > >>>>> Of Vladimir Ivanov > > >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > > >>>>> hotspot-compiler-dev at openjdk.java.net > > >>>>> > > >>>>> Rahul, > > >>>>> > > >>>>> Can we define a constant instead and use it in both places? > > >>>>> > > >>>>> Best regards, > > >>>>> Vladimir Ivanov > > >>>>> > > >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > > >>>>>> Hi, > > >>>>>> > > >>>>>> Please review the patch for JDK- 8149488. > > >>>>>> > > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > > >>>>>> Webrev: > > >>>>>> http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > > >>>>>> > > >>>>>> Corrected the bitsInByte array size in declaration. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Rahul > > >>>>>> > > From rahul.v.raghavan at oracle.com Thu Apr 7 06:43:42 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 6 Apr 2016 23:43:42 -0700 (PDT) Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: <5702B3C5.8070507@oracle.com> References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> <5702B3C5.8070507@oracle.com> Message-ID: <224945da-cb88-4653-af5d-2865ac42ee08@default> > -----Original Message----- > From: Dean Long > Sent: Tuesday, April 05, 2016 12:05 AM > > Looks OK. Thank you Dean. > > dl > > On 4/4/2016 1:09 AM, Rahul Raghavan wrote: > > Hi, > > > > Please review the revised fix for JDK- 8149488. > > > > : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ > > > > Based on further checking and thanks to clarifications from Michael, > > it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', > > (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). > > > > Points from Michael for the record - " > > > I believe Dean is right, I have debugged this and analyzed the usage model, > > > we never made use of the upper components > > > and register allocation has been right for VecZ for a good deal of time. > > > > > > All we need for a change is, > > > Regmask.cpp: > > > > > > uint RegMask::Size() const { > > > extern uint8_t bitsInByte[256]; > > > > > > A one line change. > > > > > > -Michael. > > > > > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > > > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > > > where we make use of VecZ and the upper bank of registers." > > > > So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. > > > > Confirmed no issues with 'JPRT -testset hotspot' run. > > > > Thanks, > > Rahul > > > >> -----Original Message----- > >> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > >> > >> Michael, isn't the correct size for this table 256? I missed how VecZ > >> relates to the table size. > >> > >> dl > >> > >> On 3/31/2016 9:58 AM, Berg, Michael C wrote: > >>> Up until now we have gotten along with the size constraint only. > >>> Let us have both the size and the table though for completeness. > >>> I think we can leave the name though. > >>> > >>> -Michael > >>> > >>> -----Original Message----- > >>> From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > >>> Sent: Thursday, March 31, 2016 9:18 AM > >>> To: Dean Long ; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > >>> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > >>> > >>> Hi Michael, > >>> > >>> With respect to below thread, request help with some questions. > >>> Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet > Size. > >>> Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on > >> targets that support it. > >>> But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > >>> Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > >>> > >>> So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > >>> (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in > >> RegMask::Size()) > >>> ----- src/share/vm/libadt/vectset.hpp > >>> +#define BITS_IN_BYTE_ARRAY_SIZE 256 > >>> + > >>> > >>> ----- src/share/vm/opto/regmask.cpp > >>> - extern uint8_t bitsInByte[512]; > >>> + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > >>> > >>> ----- src/share/vm/libadt/vectset.cpp > >>> -uint8_t bitsInByte[256] = { > >>> +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > >>> > >>> I can send revised webrev for above if all okay. Please tell me if I am missing something. > >>> > >>> > >>> OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > >>> (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > >>> > >>> Thanks, > >>> Rahul > >>> > >>>> -----Original Message----- > >>>> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > >>>> > >>>>> -----Original Message----- > >>>>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > >>>>> > >>>>> When do we access elements 256 .. 511? Wouldn't that mean we have > >>>>> 9-bit bytes? > >>>> Got your point Dean, Thanks. > >>>> I too got some questions here now; will check and reply soon. > >>>> > >>>> -Rahul > >>>> > >>>>> dl > >>>>> > >>>>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > >>>>>> Hi, > >>>>>> > >>>>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > >>>>>> > >>>>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>> : > >>>>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > >>>>>> > >>>>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > >>>>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > >>>>>> Confirmed no issues with 'JPRT -testset hotspot' run. > >>>>>> > >>>>>> Thanks, > >>>>>> Rahul > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > >>>>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > >>>>> dev at openjdk.java.net > >>>>>>> Should we not extend: > >>>>>>> > >>>>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > >>>>>>> uint8_t bitsInByte[256] = { // ... > >>>>>>> > >>>>>>> to 512 > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > >>>>>>> > >>>>>>> So how do we intend to map a VecZ register without 512 bits? > >>>>>>> > >>>>>>> -Michael > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: hotspot-compiler-dev > >>>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > >>>>>>> Of Vladimir Ivanov > >>>>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > >>>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>>> > >>>>>>> Rahul, > >>>>>>> > >>>>>>> Can we define a constant instead and use it in both places? > >>>>>>> > >>>>>>> Best regards, > >>>>>>> Vladimir Ivanov > >>>>>>> > >>>>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Please review the patch for JDK- 8149488. > >>>>>>>> > >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > >>>>>>>> > >>>>>>>> Corrected the bitsInByte array size in declaration. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Rahul > >>>>>>>> > From aleksey.shipilev at oracle.com Thu Apr 7 07:30:36 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 10:30:36 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <56FE87C2.50002@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> Message-ID: <57060C9C.4000000@oracle.com> On 04/01/2016 05:37 PM, Aleksey Shipilev wrote: > On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >> I would like to solicit comments for C1 support for new >> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >> The rest of new Unsafe methods that are not intrinsified by C1 are >> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >> emulated with existing APIs. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8152753 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ > > Update: > http://cr.openjdk.java.net/~shade/8152753/webrev.01/ > > Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some > other cleanups. > > Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT > hs-comp testset (some unrelated timeouts on SPARC). Anyone? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From jamsheed.c.m at oracle.com Thu Apr 7 08:41:34 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Thu, 7 Apr 2016 14:11:34 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> Message-ID: <57061D3E.8050408@oracle.com> Hi Martin, one comment: the count increment update should use release store + atomic update. ref: https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming Best Regards Jamsheed On 4/6/2016 6:54 PM, Doerr, Martin wrote: > > Hi Jamsheed and all, > > thanks for your explanation. > > About Case 1: > > I basically agree with that reading _next==NULL or _count==0 only > leads to false negatives and is not critical. > > Yes, we could live with a few more false negatives on weak memory > model platforms (even though this is not my preferred design). > > About Case 2: > > What I?m missing on the reader?s side of the _count field is something > which prevents processors from speculatively loading the contents of > the ExceptionCache. > > In ExceptionCache::test_address, the _count only affects the control flow. > > PPC and ARM processors can predict branches which depend on the _count > field and load speculatively from the pc and handler fields (which may > be stale data!). > > Due to out-of-order execution of the loads, it can actually happen, > that the new _count value is observed, but stale data is read from pc > and handler fields. > > I guess it is highly unlikely that we will ever observe this, but > there?s no guarantee. > > I think my concern about using non-volatile fields for the > _exception_cache is also still valid. > > Nothing prevents C++ Compilers from loading the pointer twice from > memory. They may expect to get the pointer to the same instance both > times but actually get two different ones. > > For example, this may lead to the situation that > handler_for_exception_and_pc uses one ExceptionCache instance for > calling the match function and another one (du to reload of > non-volatile field) for calling next(). > > May other people would like to comment on this lengthy discussion as well? > > Best regards, > > Martin > > *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com] > *Sent:* Mittwoch, 6. April 2016 13:54 > *To:* Doerr, Martin ; > hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > On 4/6/2016 2:49 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > here are the cases of add_handler_for_exception_and_pc we should > talk about: > > Case 1: A new ExceptionCache instance needs to get added. > > The storestore barrier you have added is used in the constructor > of the ExceptionCache and it releases the most critical fields of > it. I think this is what you explained in [1] in your email below. > > The new values of _count and _next fields are written afterwards > and hence not covered by this release barrier. Readers of the > _exception_cache may read _count==0 or _next==NULL. > > One could argue that this is not critical, but I guess this was > not intended? > > At least the _exception_cache field needs to be volatile to > prevent optimizers from breaking anything. This is always needed > for fields which are accessed concurrently by multiple threads > without locks (as the readers do). > > I think releasing the completely initialized ExceptionCache > instance is a much cleaner design. > > Having count < actual entries, or having _next = null is OK (as there > is always (locked)slow path to check again). > Quoting comment from read path. > > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. > > Weak memory platforms may have a few more false negatives. but isn't > that OK ? > This helps us, as we can remove volatile from picture, and actually > good for read paths. > > > Case 2: An existing ExceptionCache instance gets a new entry. > > In this case your storestore barrier is good to release all > updated fields. However, we need to consider the readers, too. The > _count field needs to be volatile and the load must acquire. > Otherwise, stale data may get read by processors which perform > loads on speculative paths. > > storestore mem barrier handles this, as count <= no of real entries. > and there is always locked slow path to check again. > As said before, there may be a few more false negatives in weak memory > platforms than strong ones. > > Best Regards, > Jamsheed > > > I have added the acquire barrier for the _count field here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ > > > Does this answer your questions or is anything still unclear? > > Best regards, > > Martin > > *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com] > *Sent:* Mittwoch, 6. April 2016 10:11 > *To:* Doerr, Martin > ; > hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Thanks for the reply. trying to understand stuffs. > > > > void nmethod::add_handler_for_exception_and_pc(Handle > exception, address pc, address handler) { > // There are potential race conditions during exception > cache updates, so we > // must own the ExceptionCache_lock before doing ANY > modifications. Because > // we don't lock during reads, it is possible to have > several threads attempt > // to update the cache with the same data. We need to check > for already inserted > // copies of the current data before adding it. > > MutexLocker ml(ExceptionCache_lock); > ExceptionCache* target_entry = > exception_cache_entry_for_exception(exception); > > if (target_entry == NULL || > !target_entry->add_address_and_handler(pc,handler)) { > target_entry = new ExceptionCache(exception,pc,handler); > add_exception_cache_entry(target_entry); > } > } > > > [1]there is a storestore mem barrier before count is updated in > add_address_and_handler > this ensure exception pc and handler address are updated before > count is incremented and Exception cache entry is updated at ( > nm->_exception_cache or in the list ec->_next ). > > > > address nmethod::handler_for_exception_and_pc(Handle > exception, address pc) { > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen > during > // the first few exception lookups for a given nmethod. > ExceptionCache* ec = exception_cache(); > while (ec != NULL) { > address ret_val; > if ((ret_val = ec->match(exception,pc)) != NULL) { > return ret_val; > } > ec = ec->next(); > } > return NULL; > } > > > and in read logic. we first check ec entry is available (non null > check) before proceeding further. > if ec is non null and ec_type,excpetion pc, and handler are > available by[1]. though count can be reordered and not updated > with new value. > > this fixes the issue. why you think it doesn't? > > Best Regards, > Jamsheed > > > On 4/5/2016 3:40 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > thanks for pointing me to it. Interesting that you have found > such a problem so shortly before me J > > My webrev addresses some aspects which are not covered by your > fix: > > -add_handler_for_exception_and_pc adds a new ExceptionCache > instance in the other case. They need to get released as well. > > -The readers of the _exception_cache field are not safe, yet. > As Andrew Haley pointed out, optimizers may modify load > accesses for non-volatile fields. > > So I think my change is still needed. > > And after taking a closer look at your change, I think the > _count field which is addressed by your fix needs to be > volatile as well. I can incorporate that in my change if you like. > > Would you agree? > > Best regards, > > Martin > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On > Behalf Of *Jamsheed C m > *Sent:* Montag, 4. April 2016 08:14 > *To:* hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > "nmethod's exception cache not multi-thread safe" bug is > fixed in b107 > bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 > fix changeset: > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 > discussion link: > http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html > > Best Regards, > Jamsheed > > On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s > exception cache. Readers of the cache may read stale data > on weak memory platforms. > > The writers of the cache are synchronized by locks, but > there may be concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache > without locking. > > Therefore, the nmethod's field _exception_cache needs to > be volatile and adding new entries must be done by > releasing stores. (Loading seems to be fine without > acquire because there's an address dependency from the > load of the cache to the usage of its contents which is > sufficient to ensure ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive > to read the volatile field _state only once. It is > certainly undesired to force the compiler to load it from > memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Thu Apr 7 08:51:34 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 09:51:34 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57060C9C.4000000@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> Message-ID: <57061F96.6080001@redhat.com> I'm very tempted to review this, but there is a rather odd thing: the bug does not explain the motivation for this change. Andrew. From aleksey.shipilev at oracle.com Thu Apr 7 08:55:32 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 11:55:32 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57061F96.6080001@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> Message-ID: <57062084.4020209@oracle.com> On 04/07/2016 11:51 AM, Andrew Haley wrote: > I'm very tempted to review this, but there is a rather odd thing: > the bug does not explain the motivation for this change. Not following you. What motivation do you need apart from "...the rest of new Unsafe methods that are not intrinsified by C1 are handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From martin.doerr at sap.com Thu Apr 7 09:08:08 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Apr 2016 09:08:08 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57061D3E.8050408@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> Message-ID: Hi Jamsheed, atomic update for the _count would only be required if there were multiply threads which attempt to increment it concurrently. However, updates are under lock, so we only have concurrent readers which is ok. I still think "volatile" does what we need here. Especially the xlC compiler on AIX tends to reload variables from memory. Exactly this can be prevented by making the field volatile. People who don't like volatile should come up with a different solution, please. Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Donnerstag, 7. April 2016 10:42 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, one comment: the count increment update should use release store + atomic update. ref: https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming Best Regards Jamsheed On 4/6/2016 6:54 PM, Doerr, Martin wrote: Hi Jamsheed and all, thanks for your explanation. About Case 1: I basically agree with that reading _next==NULL or _count==0 only leads to false negatives and is not critical. Yes, we could live with a few more false negatives on weak memory model platforms (even though this is not my preferred design). About Case 2: What I'm missing on the reader's side of the _count field is something which prevents processors from speculatively loading the contents of the ExceptionCache. In ExceptionCache::test_address, the _count only affects the control flow. PPC and ARM processors can predict branches which depend on the _count field and load speculatively from the pc and handler fields (which may be stale data!). Due to out-of-order execution of the loads, it can actually happen, that the new _count value is observed, but stale data is read from pc and handler fields. I guess it is highly unlikely that we will ever observe this, but there's no guarantee. I think my concern about using non-volatile fields for the _exception_cache is also still valid. Nothing prevents C++ Compilers from loading the pointer twice from memory. They may expect to get the pointer to the same instance both times but actually get two different ones. For example, this may lead to the situation that handler_for_exception_and_pc uses one ExceptionCache instance for calling the match function and another one (du to reload of non-volatile field) for calling next(). May other people would like to comment on this lengthy discussion as well? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 13:54 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, On 4/6/2016 2:49 PM, Doerr, Martin wrote: Hi Jamsheed, here are the cases of add_handler_for_exception_and_pc we should talk about: Case 1: A new ExceptionCache instance needs to get added. The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below. The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL. One could argue that this is not critical, but I guess this was not intended? At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do). I think releasing the completely initialized ExceptionCache instance is a much cleaner design. Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again). Quoting comment from read path. // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. Weak memory platforms may have a few more false negatives. but isn't that OK ? This helps us, as we can remove volatile from picture, and actually good for read paths. Case 2: An existing ExceptionCache instance gets a new entry. In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths. storestore mem barrier handles this, as count <= no of real entries. and there is always locked slow path to check again. As said before, there may be a few more false negatives in weak memory platforms than strong ones. Best Regards, Jamsheed I have added the acquire barrier for the _count field here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ Does this answer your questions or is anything still unclear? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 10:11 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thanks for the reply. trying to understand stuffs. void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) { // There are potential race conditions during exception cache updates, so we // must own the ExceptionCache_lock before doing ANY modifications. Because // we don't lock during reads, it is possible to have several threads attempt // to update the cache with the same data. We need to check for already inserted // copies of the current data before adding it. MutexLocker ml(ExceptionCache_lock); ExceptionCache* target_entry = exception_cache_entry_for_exception(exception); if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) { target_entry = new ExceptionCache(exception,pc,handler); add_exception_cache_entry(target_entry); } } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). address nmethod::handler_for_exception_and_pc(Handle exception, address pc) { // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. ExceptionCache* ec = exception_cache(); while (ec != NULL) { address ret_val; if ((ret_val = ec->match(exception,pc)) != NULL) { return ret_val; } ec = ec->next(); } return NULL; } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Thu Apr 7 09:33:23 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 10:33:23 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57062084.4020209@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> Message-ID: <57062963.5070403@redhat.com> On 07/04/16 09:55, Aleksey Shipilev wrote: > On 04/07/2016 11:51 AM, Andrew Haley wrote: >> I'm very tempted to review this, but there is a rather odd thing: >> the bug does not explain the motivation for this change. > > Not following you. > > What motivation do you need apart from "...the rest of new Unsafe > methods that are not intrinsified by C1 are handled by Java fallbacks in > Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? As far as I know all of these are handled by native methods. Is that not correct? They seem to be. Andrew. From aleksey.shipilev at oracle.com Thu Apr 7 09:48:48 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 12:48:48 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57062963.5070403@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> Message-ID: <57062D00.5000609@oracle.com> On 04/07/2016 12:33 PM, Andrew Haley wrote: > On 07/04/16 09:55, Aleksey Shipilev wrote: >> On 04/07/2016 11:51 AM, Andrew Haley wrote: >>> I'm very tempted to review this, but there is a rather odd thing: >>> the bug does not explain the motivation for this change. >> >> Not following you. >> >> What motivation do you need apart from "...the rest of new Unsafe >> methods that are not intrinsified by C1 are handled by Java fallbacks in >> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? > > As far as I know all of these are handled by native methods. > Is that not correct? They seem to be. Yes. This is not a question on implementing CompareAndExchange: it is handled by unsafe.cpp bits well, and this is why runtime can work even without getting compilers in picture. This is about getting CompareAndExchange support in C1 in sync with CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native calls, and should do the same for new CompareAndExchange. In other words, out of the realm of VarHandles accessor methods, we are either covered by Unsafe.java shortcuts, or C1/C2 intrinsics that replace unsafe.cpp calls. Except for CompareAndExchange, until this RFE. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From aph at redhat.com Thu Apr 7 10:14:23 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 11:14:23 +0100 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> Message-ID: <570632FF.7090103@redhat.com> On 07/04/16 10:08, Doerr, Martin wrote: > atomic update for the _count would only be required if there were > multiply threads which attempt to increment it > concurrently. However, updates are under lock, so we only have > concurrent readers which is ok. > > I still think ?volatile? does what we need here. Especially the xlC > compiler on AIX tends to reload variables from memory. Exactly this > can be prevented by making the field volatile. I think your latest patch is OK. Whether volatile is really good enough, I don't know. The new(ish) C++ memory model treats this as a race, and therefore undefined behaviour. Old C++ didn't have a memory model, so the best we can do with racy code is guess about what our compilers might do. I certainly much prefer a release_store to the storestore fence used in the fix for 8143897. Andrew. From aph at redhat.com Thu Apr 7 10:17:15 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 11:17:15 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57062D00.5000609@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> <57062D00.5000609@oracle.com> Message-ID: <570633AB.6040704@redhat.com> On 07/04/16 10:48, Aleksey Shipilev wrote: > On 04/07/2016 12:33 PM, Andrew Haley wrote: >> On 07/04/16 09:55, Aleksey Shipilev wrote: >>> On 04/07/2016 11:51 AM, Andrew Haley wrote: >>>> I'm very tempted to review this, but there is a rather odd thing: >>>> the bug does not explain the motivation for this change. >>> >>> Not following you. >>> >>> What motivation do you need apart from "...the rest of new Unsafe >>> methods that are not intrinsified by C1 are handled by Java fallbacks in >>> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? >> >> As far as I know all of these are handled by native methods. >> Is that not correct? They seem to be. > > Yes. This is not a question on implementing CompareAndExchange: it is > handled by unsafe.cpp bits well, and this is why runtime can work even > without getting compilers in picture. I thought so. > This is about getting CompareAndExchange support in C1 in sync with > CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native > calls, and should do the same for new CompareAndExchange. So, this is entirely about making C1-generated code more efficient, by avoiding native calls. Right? Andrew. From aleksey.shipilev at oracle.com Thu Apr 7 10:26:28 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 13:26:28 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <570633AB.6040704@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> <57062D00.5000609@oracle.com> <570633AB.6040704@redhat.com> Message-ID: <570635D4.8020308@oracle.com> On 04/07/2016 01:17 PM, Andrew Haley wrote: > On 07/04/16 10:48, Aleksey Shipilev wrote: >> On 04/07/2016 12:33 PM, Andrew Haley wrote: >>> On 07/04/16 09:55, Aleksey Shipilev wrote: >>>> On 04/07/2016 11:51 AM, Andrew Haley wrote: >>>>> I'm very tempted to review this, but there is a rather odd thing: >>>>> the bug does not explain the motivation for this change. >>>> >>>> Not following you. >>>> >>>> What motivation do you need apart from "...the rest of new Unsafe >>>> methods that are not intrinsified by C1 are handled by Java fallbacks in >>>> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? >>> >>> As far as I know all of these are handled by native methods. >>> Is that not correct? They seem to be. >> >> Yes. This is not a question on implementing CompareAndExchange: it is >> handled by unsafe.cpp bits well, and this is why runtime can work even >> without getting compilers in picture. > > I thought so. > >> This is about getting CompareAndExchange support in C1 in sync with >> CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native >> calls, and should do the same for new CompareAndExchange. > > So, this is entirely about making C1-generated code more efficient, > by avoiding native calls. Right? Yes. The same reason why CompareAndSwap and {get,put}* are intrinsified by C1. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From martin.doerr at sap.com Thu Apr 7 10:51:41 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Apr 2016 10:51:41 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <570632FF.7090103@redhat.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> Message-ID: <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> Hi Andrew, Jamsheed and all, thank you very much for your input. As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also correct). My change still contains a releasing store for newly created ExceptionCache instances. As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms. I think having the release doesn't hurt too much and makes the design a little cleaner. I also added comments based on your input. The new webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ Please review. I will also need a sponsor from Oracle, please. Thanks again and best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 7. April 2016 12:14 To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On 07/04/16 10:08, Doerr, Martin wrote: > atomic update for the _count would only be required if there were > multiply threads which attempt to increment it > concurrently. However, updates are under lock, so we only have > concurrent readers which is ok. > > I still think "volatile" does what we need here. Especially the xlC > compiler on AIX tends to reload variables from memory. Exactly this > can be prevented by making the field volatile. I think your latest patch is OK. Whether volatile is really good enough, I don't know. The new(ish) C++ memory model treats this as a race, and therefore undefined behaviour. Old C++ didn't have a memory model, so the best we can do with racy code is guess about what our compilers might do. I certainly much prefer a release_store to the storestore fence used in the fix for 8143897. Andrew. From aph at redhat.com Thu Apr 7 11:34:54 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 12:34:54 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <570635D4.8020308@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> <57062D00.5000609@oracle.com> <570633AB.6040704@redhat.com> <570635D4.8020308@oracle.com> Message-ID: <570645DE.4070708@redhat.com> On 04/07/2016 11:26 AM, Aleksey Shipilev wrote: >> So, this is entirely about making C1-generated code more efficient, >> > by avoiding native calls. Right? > Yes. The same reason why CompareAndSwap and {get,put}* are intrinsified > by C1. OK, I'll look at this today. Andrew. From pavel.punegov at oracle.com Thu Apr 7 13:32:53 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Thu, 7 Apr 2016 16:32:53 +0300 Subject: RFR(XS): 8140354: Unquarantine tests that failed with OutOfMemoryError Message-ID: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> Hi, please review this fix to unquarantine CompilerControl tests after the JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2] The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM. [1] https://bugs.openjdk.java.net/browse/JDK-8140354 [2] https://bugs.openjdk.java.net/browse/JDK-8144621 -- webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/ bug https://bugs.openjdk.java.net/browse/JDK-8153661 ? Pavel. -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Thu Apr 7 13:43:01 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 7 Apr 2016 16:43:01 +0300 Subject: RFR(XS): 8140354: Unquarantine tests that failed with OutOfMemoryError In-Reply-To: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> References: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> Message-ID: <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com> Pavel, looks good to me ? Igor > On Apr 7, 2016, at 4:32 PM, Pavel Punegov wrote: > > Hi, > > please review this fix to unquarantine CompilerControl tests after the JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2] > The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM. > > [1] https://bugs.openjdk.java.net/browse/JDK-8140354 > [2] https://bugs.openjdk.java.net/browse/JDK-8144621 > -- > webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/ > bug https://bugs.openjdk.java.net/browse/JDK-8153661 > > ? Pavel. > From aph at redhat.com Thu Apr 7 14:29:31 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 15:29:31 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57060C9C.4000000@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> Message-ID: <57066ECB.2010502@redhat.com> On 04/07/2016 08:30 AM, Aleksey Shipilev wrote: > On 04/01/2016 05:37 PM, Aleksey Shipilev wrote: >> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >>> I would like to solicit comments for C1 support for new >>> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >>> The rest of new Unsafe methods that are not intrinsified by C1 are >>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >>> emulated with existing APIs. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8152753 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ >> >> Update: >> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ >> >> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some >> other cleanups. >> >> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT >> hs-comp testset (some unrelated timeouts on SPARC). > > Anyone? Reviewed, OK. Andrew. From felix.yang at linaro.org Thu Apr 7 15:01:37 2016 From: felix.yang at linaro.org (Felix Yang) Date: Thu, 7 Apr 2016 23:01:37 +0800 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair Message-ID: Hi, Please review webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.00/ JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713 Currently, C2 compiler generate independent stores to clear short arrays whose size is no bigger than parameter InitArrayShortSize (refer to ClearArrayNode::Ideal function). For the aarch64 port, we have store pair instruction which can zero two memory words at a time and this will be good for code size and maybe performance for some micro-archs. For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions are generated with the patch, which mean about 2KB reduction in codesize. Tested with JTreg hotspot, langtools and jdk. Is it OK? Thanks, Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Apr 7 15:22:39 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 7 Apr 2016 18:22:39 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <56FE87C2.50002@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> Message-ID: <57067B3F.4060403@oracle.com> Aleksey, Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not Compiler::is_intrinsic_supported? vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you effectively completely disable the intrinsics on all non-x86 platforms. I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and check it in Compiler::is_intrinsic_supported. Best regards, Vladimir Ivanov On 4/1/16 5:37 PM, Aleksey Shipilev wrote: > On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >> I would like to solicit comments for C1 support for new >> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >> The rest of new Unsafe methods that are not intrinsified by C1 are >> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >> emulated with existing APIs. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8152753 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ > > Update: > http://cr.openjdk.java.net/~shade/8152753/webrev.01/ > > Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some > other cleanups. > > Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT > hs-comp testset (some unrelated timeouts on SPARC). > > Thanks, > -Aleksey > From aleksey.shipilev at oracle.com Thu Apr 7 15:34:08 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 18:34:08 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57067B3F.4060403@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57067B3F.4060403@oracle.com> Message-ID: <57067DF0.5090707@oracle.com> On 04/07/2016 06:22 PM, Vladimir Ivanov wrote: > Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not > Compiler::is_intrinsic_supported? > > vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you > effectively completely disable the intrinsics on all non-x86 platforms. > > I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and > check it in Compiler::is_intrinsic_supported. I actually did that originally, see: http://cr.openjdk.java.net/~shade/8152753/webrev.00/ ...but then moved that to vmIntrinsics::is_disabled_by_flags by John's suggestion -- all flag sensing is done there. It is a matter of approach, really, and I think current version aligns better with the existing intrinsic flags. Non-x86 platforms have not yet implemented CAE intrinsics, and this forces their hand to implement both C1 and C2 parts before flipping the platform-dependent flag. Which may or may not be a good thing, but I don't have preference either way. -Aleksey > On 4/1/16 5:37 PM, Aleksey Shipilev wrote: >> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >>> I would like to solicit comments for C1 support for new >>> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >>> The rest of new Unsafe methods that are not intrinsified by C1 are >>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >>> emulated with existing APIs. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8152753 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ >> >> Update: >> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ >> >> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some >> other cleanups. >> >> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT >> hs-comp testset (some unrelated timeouts on SPARC). >> >> Thanks, >> -Aleksey >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From aph at redhat.com Thu Apr 7 15:35:14 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 16:35:14 +0100 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair In-Reply-To: References: Message-ID: <57067E32.3010403@redhat.com> On 04/07/2016 04:01 PM, Felix Yang wrote: > Please review webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.00/ > JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713 > > Currently, C2 compiler generate independent stores to clear > short arrays whose size is no bigger than parameter > InitArrayShortSize (refer to ClearArrayNode::Ideal function). > For the aarch64 port, we have store pair instruction which can > zero two memory words at a time and this will be good for code > size and maybe performance for some micro-archs. > > For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions > are generated with the patch, which mean about 2KB reduction in > codesize. > > Tested with JTreg hotspot, langtools and jdk. Is it OK? It looks reasonable. It's rather a big slab of code for aarch64.ad, and I think that it should be in MacroAssembler. Long Chen created MacroAssembler::zero_words, and you should create an overload of zero_words which takes a constant int as an argument. Andrew. From rwestrel at redhat.com Thu Apr 7 15:41:27 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 7 Apr 2016 17:41:27 +0200 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57060C9C.4000000@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> Message-ID: <57067FA7.7030907@redhat.com> >> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ That c1_LIR.cpp change: 931 if (!opCompareAndSwap->_exchange) do_input(opCompareAndSwap->_cmp_value); doesn't seem right. Why do you need it? Roland. From aleksey.shipilev at oracle.com Thu Apr 7 15:46:04 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 18:46:04 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57067FA7.7030907@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> Message-ID: <570680BC.7030305@oracle.com> On 04/07/2016 06:41 PM, Roland Westrelin wrote: > >>> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ > > That c1_LIR.cpp change: > > 931 if (!opCompareAndSwap->_exchange) > do_input(opCompareAndSwap->_cmp_value); > > doesn't seem right. Why do you need it? CompareAndSwap produces boolean result, and kills cmp_value and new_value. CompareAndExchange produces the "old"/null value result, which is stored at the same position as cmp_value. So, if you omit that line, LinearScan asserts when you are trying to use the result of CompareAndExchange. AFAIU, removing the "input" property from cmp_value, but leaving "temp" makes things back in order for CompareAndExchange. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From vladimir.x.ivanov at oracle.com Thu Apr 7 16:01:31 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 7 Apr 2016 19:01:31 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57067DF0.5090707@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57067B3F.4060403@oracle.com> <57067DF0.5090707@oracle.com> Message-ID: <5706845B.2080007@oracle.com> Ok, I thought C2 support is already there on non-x86 platforms. I'm fine with both approaches then. Best regards, Vladimir Ivanov On 4/7/16 6:34 PM, Aleksey Shipilev wrote: > On 04/07/2016 06:22 PM, Vladimir Ivanov wrote: >> Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not >> Compiler::is_intrinsic_supported? >> >> vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you >> effectively completely disable the intrinsics on all non-x86 platforms. >> >> I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and >> check it in Compiler::is_intrinsic_supported. > > I actually did that originally, see: > http://cr.openjdk.java.net/~shade/8152753/webrev.00/ > > ...but then moved that to vmIntrinsics::is_disabled_by_flags by John's > suggestion -- all flag sensing is done there. It is a matter of > approach, really, and I think current version aligns better with the > existing intrinsic flags. > > Non-x86 platforms have not yet implemented CAE intrinsics, and this > forces their hand to implement both C1 and C2 parts before flipping the > platform-dependent flag. Which may or may not be a good thing, but I > don't have preference either way. > > -Aleksey > >> On 4/1/16 5:37 PM, Aleksey Shipilev wrote: >>> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >>>> I would like to solicit comments for C1 support for new >>>> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >>>> The rest of new Unsafe methods that are not intrinsified by C1 are >>>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >>>> emulated with existing APIs. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8152753 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ >>> >>> Update: >>> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ >>> >>> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some >>> other cleanups. >>> >>> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT >>> hs-comp testset (some unrelated timeouts on SPARC). >>> >>> Thanks, >>> -Aleksey >>> > > From doug.simon at oracle.com Thu Apr 7 20:27:32 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 7 Apr 2016 22:27:32 +0200 Subject: RFR: 8153782: update JVMCI sources to Eclipse 4.5.2 format style Message-ID: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8153782 http://cr.openjdk.java.net/~dnsimon/8153782/ -Doug From christian.thalinger at oracle.com Thu Apr 7 20:51:04 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 7 Apr 2016 10:51:04 -1000 Subject: RFR: 8153782: update JVMCI sources to Eclipse 4.5.2 format style In-Reply-To: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com> References: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com> Message-ID: Looks good. > On Apr 7, 2016, at 10:27 AM, Doug Simon wrote: > > https://bugs.openjdk.java.net/browse/JDK-8153782 > http://cr.openjdk.java.net/~dnsimon/8153782/ > > -Doug From bharadwaj.yadavalli at oracle.com Thu Apr 7 23:36:25 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Thu, 7 Apr 2016 19:36:25 -0400 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options Message-ID: <5706EEF9.4030900@oracle.com> Backing out the change [1] that fixed [2]. Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ Testing: Ran the tests in bug report successfully using product build. Thanks, Bharadwaj [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b [2] https://bugs.openjdk.java.net/browse/JDK-8145348 From bharadwaj.yadavalli at oracle.com Fri Apr 8 00:01:59 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Thu, 7 Apr 2016 20:01:59 -0400 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options In-Reply-To: <5706F338.9010301@oracle.com> References: <5706EEF9.4030900@oracle.com> <5706F338.9010301@oracle.com> Message-ID: <5706F4F7.2050602@oracle.com> Thanks, Jesper! Bharadwaj On 04/07/2016 07:54 PM, Jesper Wilhelmsson wrote: > Looks good! > /Jesper > > Den 8/4/16 kl. 01:36, skrev S. Bharadwaj Yadavalli: >> Backing out the change [1] that fixed [2]. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 >> webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ >> >> Testing: Ran the tests in bug report successfully using product build. >> >> Thanks, >> >> Bharadwaj >> >> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b >> [2] https://bugs.openjdk.java.net/browse/JDK-8145348 >> >> From vladimir.kozlov at oracle.com Fri Apr 8 00:20:30 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Apr 2016 17:20:30 -0700 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options In-Reply-To: <5706EEF9.4030900@oracle.com> References: <5706EEF9.4030900@oracle.com> Message-ID: <5706F94E.2070803@oracle.com> No. This is very wrong change! The bug states that -XX:+UnlockDiagnosticVMOptions flag should be added to tests which miss it and not revert 8145348 changes. Vladimir On 4/7/16 4:36 PM, S. Bharadwaj Yadavalli wrote: > Backing out the change [1] that fixed [2]. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 > webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ > > Testing: Ran the tests in bug report successfully using product build. > > Thanks, > > Bharadwaj > > [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b > [2] https://bugs.openjdk.java.net/browse/JDK-8145348 > > From vladimir.kozlov at oracle.com Fri Apr 8 01:11:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Apr 2016 18:11:16 -0700 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options In-Reply-To: <5706F94E.2070803@oracle.com> References: <5706EEF9.4030900@oracle.com> <5706F94E.2070803@oracle.com> Message-ID: <57070534.80509@oracle.com> I talked with Bharadwaj and we decided to push backout with different bug: 8153816: Backout changes for JDK-8145348 till 8153655 is fixed and use 8153655 for real fix as its synopsis say. Thanks, Vladimir On 4/7/16 5:20 PM, Vladimir Kozlov wrote: > No. This is very wrong change! The bug states that -XX:+UnlockDiagnosticVMOptions flag should be added to tests which miss it and not > revert 8145348 changes. > > Vladimir > > On 4/7/16 4:36 PM, S. Bharadwaj Yadavalli wrote: >> Backing out the change [1] that fixed [2]. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 >> webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ >> >> Testing: Ran the tests in bug report successfully using product build. >> >> Thanks, >> >> Bharadwaj >> >> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b >> [2] https://bugs.openjdk.java.net/browse/JDK-8145348 >> >> From bharadwaj.yadavalli at oracle.com Fri Apr 8 02:39:51 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Thu, 7 Apr 2016 22:39:51 -0400 Subject: RFR: 8153816: [BACKOUT] Make intrinsics flags diagnostic Message-ID: <570719F7.7010404@oracle.com> Backing out the change [1] that fixed [2]. This is a sub-task of [3]. Bug: https://bugs.openjdk.java.net/browse/JDK-8153816 webrev: http://cr.openjdk.java.net/~bharadwaj/8153816/webrev/ Testing: Ran the tests in bug report successfully using product build. Thanks, Bharadwaj [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b [2] https://bugs.openjdk.java.net/browse/JDK-8145348 [3] https://bugs.openjdk.java.net/browse/JDK-8153655 From vladimir.kozlov at oracle.com Fri Apr 8 02:48:00 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Apr 2016 19:48:00 -0700 Subject: RFR: 8153816: [BACKOUT] Make intrinsics flags diagnostic In-Reply-To: <570719F7.7010404@oracle.com> References: <570719F7.7010404@oracle.com> Message-ID: <57071BE0.5000708@oracle.com> Looks good. Thanks, Vladimir On 4/7/16 7:39 PM, S. Bharadwaj Yadavalli wrote: > Backing out the change [1] that fixed [2]. This is a sub-task of [3]. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153816 > webrev: http://cr.openjdk.java.net/~bharadwaj/8153816/webrev/ > > Testing: Ran the tests in bug report successfully using product build. > > Thanks, > > Bharadwaj > > [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b > [2] https://bugs.openjdk.java.net/browse/JDK-8145348 > [3] https://bugs.openjdk.java.net/browse/JDK-8153655 > From HORII at jp.ibm.com Fri Apr 8 10:53:48 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 8 Apr 2016 10:53:48 +0000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 Message-ID: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Dear all: Can I please request reviews for the following change? This change was created for JDK 9 and ppc64. Description: This change adds options of compare-and-exchange for POWER architecture. As described in atomic_linux_ppc.inline.hpp, the current implementation of cmpxchg is fence_cmpxchg_acquire. This implementation is useful for general purposes because twice calls of sync before and after cmpxchg will keep consistency. However, they sometimes cause overheads because sync instructions are very expensive in the current POWER chip design. With this change, callers can explicitly specify to run fence and acquire with two additional bool parameters. Because their default values are "true", it is not necessary to modify existing cmpxchg calls. In addition, with the new parameters of cmpxchg, this change improves performance of copy_to_survivor in the parallel GC. copy_to_survivor changes forward pointers by using cmpxchg. This operation doesn't require any sync instructions, in my understanding. A pointer is changed at most once in a GC and when cmpxchg fails, the latest pointer is available for the caller. When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly doesn't support new version format of Java 9), pause time of young GC was reduced from 10% to 20%. Summary of source code changes: * src/share/vm/runtime/atomic.hpp * src/share/vm/runtime/atomic.cpp * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp - Add two arguments of fence and acquire to cmpxchg only for PPC64. Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, they are reduced while inlining to callers. * src/share/vm/oops/oop.inline.hpp - Changed cas_set_mark to call cmpxchg without fence and acquire. cas_set_mark is called only by cas_forward_to that is called only by copy_to_survivor_space and oop_promotion_failed in psPromotionManager. Code change: Please see an attached diff file that was generated with "hg diff -g" under the latest hotspot directory. Passed test: SPECjbb2013 (customized) * I believe some other cmpxchg will be optimized by reducing fence or acquire because twice calls of sync are too conservative to implement Java memory model. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64_cmpxchg_opt.diff Type: application/octet-stream Size: 8837 bytes Desc: not available URL: From nils.eliasson at oracle.com Fri Apr 8 12:27:27 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 8 Apr 2016 14:27:27 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out Message-ID: <5707A3AF.3040807@oracle.com> Hi, Please review this small fix of the BlockingCompilation test. Summary: Add method enqueued for compilation with WB API may be removed from the compile queue as stale. Solution: Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets stale while the test is running. (Also added some extra checks that may spare us from waiting until timeout for failing.) This is an workaround but we should consider fixing something permanent for WB API compiles - like tagging the compile task with info about the origin of the compile. The comment field has this information - but then it needs to be converted to an enum. Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ Best regards, Nils Eliasson From rwestrel at redhat.com Fri Apr 8 13:54:32 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 8 Apr 2016 15:54:32 +0200 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <570680BC.7030305@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> <570680BC.7030305@oracle.com> Message-ID: <5707B818.6070405@redhat.com> > CompareAndSwap produces boolean result, and kills cmp_value and > new_value. CompareAndExchange produces the "old"/null value result, > which is stored at the same position as cmp_value. > > So, if you omit that line, LinearScan asserts when you are trying to use > the result of CompareAndExchange. AFAIU, removing the "input" property > from cmp_value, but leaving "temp" makes things back in order for > CompareAndExchange. That assert seems too restrictive but making that change to c1_LIR.cpp is asking for trouble, I think. I would suggest moving the final move to the result register to the platform dependent code (see attached patch). Also, I noticed you don't pass the result as the correct argument of cas_*. Roland. -------------- next part -------------- A non-text attachment was scrubbed... Name: cas.patch Type: text/x-patch Size: 2704 bytes Desc: not available URL: From nils.eliasson at oracle.com Fri Apr 8 13:47:22 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 8 Apr 2016 15:47:22 +0200 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <56BB7E47.4000703@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> <56A13176.804@oracle.com> <56A230D7.9060606@oracle.com> <56A27B55.6050502@oracle.com> <56BB095A.2090500@oracle.com> <56BB7E47.4000703@oracle.com> Message-ID: <5707B66A.4020006@oracle.com> Hi, Picking up this thread again. On 2016-02-10 19:15, Vladimir Kozlov wrote: > This looks almost good. > > There is " at the end but there is no first ": > That was just the end of mark of the cut out of the hs_err file. > MaxNodeLimit:80000" > > "Compiling with directive:" --> "Compiling with directives:". "No > inline rules in directive.", again "directives". > > Also the list of values is difficult to navigate. To have one per line > is definitely overkill but organizing them in some kind of patter > would be nice (3 per line with the same indent, for example). It could > be hard to do but at least order them alphabetically. The printout is generated by a macro for simplicity. Any sorting or formatting require a hand tuned print function or a macro that builds a list that is sorted and printed by the print function. I don't think it is worth the effort to have in the hs_err file. > > I asked before and I again forgot what it means "Enable:true > Exclude:false". This means you need to add more info "Enable > directives:true"? What is "Exclude" again? Enable - Is the directive ok to use (otherwise disabled as in not available) Exclude - this method can not be compiled, as in CompileCommand=exclude I'll add comments to the flag table: #define compilerdirectives_common_flags(cflags) \ cflags(Enable, bool, true, X) /* false -> directive disabled from use */ \ cflags(Exclude, bool, false, X) /* true -> don't compile method */ \ > > DisableIntrinsic: does not have value so it should not be on list. > Similar for others when they don't have values. DisableIntrinsic appear to be empty because it contains the empty list. I can add an "" for clarity. All options have a value. > > Again what * means in "*PrintInlining:true"? Is it because it is > present on command line? A * shows that it was set with a directive. Regards, Nils > > Thanks, > Vladimir > > On 2/10/16 1:56 AM, Nils Eliasson wrote: >> Hi, >> >> New webrev including Vladimirs suggestions: >> >> http://cr.openjdk.java.net/~neliasso/8138756/webrev.04/ >> >> Now it will look like this printing the directive when there are no >> compile commands for inlining: >> >> "--------------- T H R E A D --------------- >> >> Current thread (0x00007f53f0468000): JavaThread "C1 >> CompilerThread10" daemon [_thread_in_native, id=8398, >> stack(0x00007f52e6163000,0x00007f52e6264000)] >> >> Current CompileTask: >> C1: 228 1 3 java.lang.String::isLatin1 (19 bytes) >> >> Compiling with directive: >> No inline rules in directive. >> Enable:true Exclude:false BreakAtExecute:false >> BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >> PrintNMethods:false ReplayInline:false DumpReplay:false >> DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false >> PrintIntrinsics:false TraceOptoPipelining:false >> TraceOptoOutput:false TraceSpilling:false Vectorize:false >> VectorizeDebug:false CloneMapDebug:false >> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >> >> >> >> And like this when there are: >> >> >> "--------------- T H R E A D --------------- >> >> Current thread (0x00007feda4468800): JavaThread "C1 >> CompilerThread10" daemon [_thread_in_native, id=8314, >> stack(0x00007fec9a751000,0x00007fec9a852000)] >> >> Current CompileTask: >> C1: 227 1 3 java.lang.String::isLatin1 (19 bytes) >> >> Compiling with directive: >> No inline rules in directive. Following compile commands are in use: >> inline: b.b, a.a >> dontinline: c.c >> exclude: d.d >> compileonly: *.* >> Enable:true Exclude:false BreakAtExecute:false >> BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >> PrintNMethods:false ReplayInline:false DumpReplay:false >> DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false >> PrintIntrinsics:false TraceOptoPipelining:false >> TraceOptoOutput:false TraceSpilling:false Vectorize:false >> VectorizeDebug:false CloneMapDebug:false >> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >> >> Regards, >> Nils >> >> On 2016-01-22 19:56, Vladimir Kozlov wrote: >>> "no inline - compile commands may apply" is confusing to me (and for >>> others who not familiar with directives). What >>> does it mean? :) >>> Does it mean no 'inline' directives were used or opposite: >>> -XX:-Inline flag was specified (or corresponding directive). >>> >>> If it is switch off inlining then I think it should be "don't inline". >>> So what "compile commands may apply" means? >>> >>> > I updated the print output to mark all options in the directive >>> that are >>> > not default with a '*'. That makes it quicker to see if any special >>> >>> Yes, it is better but I still did not get this. I see that command >>> line has PrintInlining command and it is in the >>> list: *PrintInlining:true. >>> But I don't see PrintCompilation on the list but it is specified on >>> command line. On other hand PrintIntrinsics:false >>> is there. >>> >>> > It only prints the directive that is used for the current compile >>> task >>> > (that caused the crash). (Thats why I put them together in the >>> hs_err file) >>> >>> What do you mean "is used"? >>> >>> "Print *which* directive (and options) were in use if compiler crash. >>> Print *if* directives were used at some point if other crash?" >>> >>> Should we replace "in use"/"were used" with "were set"? >>> >>> Thanks, >>> Vladimir >>> >>> On 1/22/16 5:38 AM, Nils Eliasson wrote: >>>> Hi, Vladimir >>>> >>>> On 2016-01-21 20:28, Vladimir Kozlov wrote: >>>>> Passing directives through ciEnv is fine. >>>>> My question is about output in hs_err file. How those directives were >>>>> selected in your example? >>>> >>>> It only prints the directive that is used for the current compile task >>>> (that caused the crash). (Thats why I put them together in the >>>> hs_err file) >>>> >>>>> I found it strange to see mixed flags values and oracle commands. >>>>> "Enable:true Exclude:false" - which these correspond to, for example? >>>> >>>> These are all options from the directive - and they are set with >>>> directives (highest priority), compilecommmand or vmflags (lowest >>>> priority). >>>> >>>>> >>>>> Should we not print directives/flags which are not set explicitly? >>>> >>>> I updated the print output to mark all options in the directive >>>> that are >>>> not default with a '*'. That makes it quicker to see if any special >>>> options was applied. It will also print if the directive is the >>>> unmodified default directive. >>>> >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/ >>>> Example output: >>>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt >>>> >>>> Regards, >>>> Nils >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/21/16 2:31 AM, Nils Eliasson wrote: >>>>>> This is how it looks: >>>>>> >>>>>> [...] >>>>>> >>>>>> --------------- T H R E A D --------------- >>>>>> >>>>>> Current thread (0x00007f071046a000): JavaThread "C1 >>>>>> CompilerThread10" daemon [_thread_in_native, id=20033, >>>>>> stack(0x00007f05d7afb000,0x00007f05d7bfc000)] >>>>>> >>>>>> Current CompileTask: >>>>>> C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) >>>>>> >>>>>> Current compiler directive: >>>>>> inline: - >>>>>> Enable:true Exclude:false BreakAtExecute:false >>>>>> BreakAtCompile:false Log:false PrintAssembly:false >>>>>> PrintInlining:false PrintNMethods:false ReplayInline:false >>>>>> DumpReplay:false DumpInline:false >>>>>> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: >>>>>> BlockLayoutByFrequency:true PrintOptoAssembly:false >>>>>> PrintIntrinsics:false TraceOptoPipelining:false >>>>>> TraceOptoOutput:false >>>>>> TraceSpilling:false Vectorize:false VectorizeDebug:false >>>>>> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false >>>>>> IGVPrintLevel:0 MaxNodeLimit:80000 >>>>>> >>>>>> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], >>>>>> sp=0x00007f05d7bfa5d0, free space=1021k >>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>>>>> C=native code) >>>>>> V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, >>>>>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, >>>>>> char const*, int, unsigned long)+0x182 >>>>>> V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char >>>>>> const*, int, char const*, char const*, __va_list_tag*)+0x4a >>>>>> V [libjvm.so+0x908cca] report_vm_error(char const*, int, char >>>>>> const*, char const*, ...)+0xea >>>>>> V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, >>>>>> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 >>>>>> V [libjvm.so+0x88ec5a] >>>>>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a >>>>>> V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 >>>>>> V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 >>>>>> V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 >>>>>> V [libjvm.so+0x10189aa] java_start(Thread*)+0xca >>>>>> C [libpthread.so.0+0x8182] start_thread+0xc2 >>>>>> >>>>>> [...] >>>>>> >>>>>> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt >>>>>> >>>>>> Regards, >>>>>> Nils >>>>>> >>>>>> On 2016-01-21 11:25, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small change. The diff looks big but most of the >>>>>>> change is just changing how the directive are >>>>>>> passed to the compilers. Directives are set in the ciEnv and then >>>>>>> passed to the compilers. The compilers can then >>>>>>> choose to add it to any internal compilation object for >>>>>>> convenience. >>>>>>> The hs_err printing routine in vmError.cpp loads >>>>>>> the directive from the ciEnv. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >>>>>>> >>>>>>> Regards, >>>>>>> Nils >>>>>> >>>> >> From felix.yang at linaro.org Fri Apr 8 14:36:02 2016 From: felix.yang at linaro.org (Felix Yang) Date: Fri, 8 Apr 2016 22:36:02 +0800 Subject: RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode Message-ID: Hi, Please review webrev: *http://cr.openjdk.java.net/~fyang/8153837/webrev.00/ * JIRA Issue: *https://bugs.openjdk.java.net/browse/JDK-8153837 * Patch handles code generation for special cases where one arg is -1/0/1 of MaxINode & MinINode eliminating one extra mov instruction. As shown in the JIRA Issue, I saw some occurrences of the specail cases in specJBB2005 and other benchmarks. The patch does something like this: min(x, 1) => cmp x, 0 csinc x, x, zr, le min(x, -1) => cmp x, 0 csinv x, x, zr, lt max(x, 1) => cmp x, 0 csinc x, x, zr, gt max(x, -1) => cmp x, 0 csinv x, x, zr, ge Tested with JTreg hotspot, langtools. Is it OK? Thanks, Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.punegov at oracle.com Fri Apr 8 14:40:12 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Fri, 8 Apr 2016 17:40:12 +0300 Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package Message-ID: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> Hi, please review the following change to JITtester: - rewrite TypeUtil and move to utils package - add javadoc for each method in the class bug: https://bugs.openjdk.java.net/browse/JDK-8153852 webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/ ? Thanks, Pavel Punegov -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Fri Apr 8 14:42:34 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 8 Apr 2016 15:42:34 +0100 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: References: Message-ID: <5707C35A.2060000@redhat.com> On 04/08/2016 03:36 PM, Felix Yang wrote: > Is it OK? It looks good, but I'm surely going to need a jtreg test case which exercises all these patterns in C2. Thanks, Andrew. From vladimir.x.ivanov at oracle.com Fri Apr 8 16:47:49 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 8 Apr 2016 19:47:49 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes Message-ID: <5707E0B5.5080501@oracle.com> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8153540 Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken. The proposed fix is to perform necessary checks in Java code before calling the intrinsic. I did some performance measurements [1] and reflection (non-constant class) case (non-constant class) regressed ~5-10% due to new guards added. I also experimented with a hotspot-only fix [2], but it looks uglier. So, if you consider the regression in reflective case non-critical, I'd prefer to go with JDK checks. Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), microbenchmarks. Thanks! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java Baseline: AllocInstance.testConstant avgt 25 3.736 ? 0.054 ns/op AllocInstance.testReflective avgt 25 5.880 ? 0.080 ns/op JDK fix: AllocInstance.testConstant avgt 25 3.959 ? 0.205 ns/op AllocInstance.testReflective avgt 25 6.274 ? 0.180 ns/op [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op From vladimir.kozlov at oracle.com Fri Apr 8 17:04:29 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:04:29 -0700 Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package In-Reply-To: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> References: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> Message-ID: <5707E49D.4040400@oracle.com> Looks good. Thanks, Vladimir On 4/8/16 7:40 AM, Pavel Punegov wrote: > Hi, > > please review the following change to JITtester: > - rewrite TypeUtil and move to utils package > - add javadoc for each method in the class > > bug: https://bugs.openjdk.java.net/browse/JDK-8153852 > webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/ > > ? Thanks, > Pavel Punegov > From vladimir.kozlov at oracle.com Fri Apr 8 17:09:27 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:09:27 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5707A3AF.3040807@oracle.com> References: <5707A3AF.3040807@oracle.com> Message-ID: <5707E5C7.3000000@oracle.com> What do you mean "stale"? I would prefer to see the real fix as you suggested to avoid removing WB comp tasks from queue. Adding timeout is not reliable. Thanks, Vladimir On 4/8/16 5:27 AM, Nils Eliasson wrote: > Hi, > > Please review this small fix of the BlockingCompilation test. > > Summary: > Add method enqueued for compilation with WB API may be removed from the compile queue as stale. > > Solution: > Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets stale while the test is running. (Also added some extra > checks that may spare us from waiting until timeout for failing.) > > This is an workaround but we should consider fixing something permanent for WB API compiles - like tagging the compile > task with info about the origin of the compile. The comment field has this information - but then it needs to be > converted to an enum. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 > Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ > > Best regards, > Nils Eliasson > > > > From aleksey.shipilev at oracle.com Fri Apr 8 17:28:43 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 8 Apr 2016 20:28:43 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <5707B818.6070405@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> <570680BC.7030305@oracle.com> <5707B818.6070405@redhat.com> Message-ID: <5707EA4B.8030306@oracle.com> On 04/08/2016 04:54 PM, Roland Westrelin wrote: > >> CompareAndSwap produces boolean result, and kills cmp_value and >> new_value. CompareAndExchange produces the "old"/null value result, >> which is stored at the same position as cmp_value. >> >> So, if you omit that line, LinearScan asserts when you are trying to use >> the result of CompareAndExchange. AFAIU, removing the "input" property >> from cmp_value, but leaving "temp" makes things back in order for >> CompareAndExchange. > > That assert seems too restrictive but making that change to c1_LIR.cpp > is asking for trouble, I think. I would suggest moving the final move to > the result register to the platform dependent code (see attached patch). > Also, I noticed you don't pass the result as the correct argument of cas_*. D'oh. Thank you, Roland! Merged your changes here: http://cr.openjdk.java.net/~shade/8152753/webrev.02/ Re-spinning the RBT testing now. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Fri Apr 8 17:30:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:30:16 -0700 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707E0B5.5080501@oracle.com> References: <5707E0B5.5080501@oracle.com> Message-ID: <5707EAA8.5020005@oracle.com> > Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken. But it should not allocate arrays. Right? This is what your java changes do now. Should it be allocateInstance0 ?: + // public native Object Unsafe.allocateInstance(Class cls); You removed next stop check. Is it because java code will cat the NULL?: Node* cls = null_check(argument(1)); if (stopped()) return true; The test misses bug number @bug Thanks, Vladimir K On 4/8/16 9:47 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ > > https://bugs.openjdk.java.net/browse/JDK-8153540 > > Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken. > > The proposed fix is to perform necessary checks in Java code before calling the intrinsic. > > I did some performance measurements [1] and reflection (non-constant class) case (non-constant class) regressed ~5-10% > due to new guards added. > > I also experimented with a hotspot-only fix [2], but it looks uglier. So, if you consider the regression in reflective > case non-critical, I'd prefer to go with JDK checks. > > Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), microbenchmarks. > > Thanks! > > Best regards, > Vladimir Ivanov > > [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java > > Baseline: > AllocInstance.testConstant avgt 25 3.736 ? 0.054 ns/op > AllocInstance.testReflective avgt 25 5.880 ? 0.080 ns/op > > JDK fix: > AllocInstance.testConstant avgt 25 3.959 ? 0.205 ns/op > AllocInstance.testReflective avgt 25 6.274 ? 0.180 ns/op > > [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path > > AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op > AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op > From vladimir.kozlov at oracle.com Fri Apr 8 17:39:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:39:20 -0700 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <5707B66A.4020006@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> <56A13176.804@oracle.com> <56A230D7.9060606@oracle.com> <56A27B55.6050502@oracle.com> <56BB095A.2090500@oracle.com> <56BB7E47.4000703@oracle.com> <5707B66A.4020006@oracle.com> Message-ID: <5707ECC8.7000107@oracle.com> On 4/8/16 6:47 AM, Nils Eliasson wrote: > Hi, > > Picking up this thread again. > > On 2016-02-10 19:15, Vladimir Kozlov wrote: >> This looks almost good. >> >> There is " at the end but there is no first ": >> > That was just the end of mark of the cut out of the hs_err file. > >> MaxNodeLimit:80000" >> >> "Compiling with directive:" --> "Compiling with directives:". "No inline rules in directive.", again "directives". >> >> Also the list of values is difficult to navigate. To have one per line is definitely overkill but organizing them in >> some kind of patter would be nice (3 per line with the same indent, for example). It could be hard to do but at least >> order them alphabetically. > > The printout is generated by a macro for simplicity. Any sorting or formatting require a hand tuned print function or a > macro that builds a list that is sorted and printed by the print function. I don't think it is worth the effort to have > in the hs_err file. > This is really unfortunate. :( >> >> I asked before and I again forgot what it means "Enable:true Exclude:false". This means you need to add more info >> "Enable directives:true"? What is "Exclude" again? > > Enable - Is the directive ok to use (otherwise disabled as in not available) > Exclude - this method can not be compiled, as in CompileCommand=exclude > > I'll add comments to the flag table: > > #define compilerdirectives_common_flags(cflags) \ > cflags(Enable, bool, true, X) /* false -> directive disabled from use */ \ > cflags(Exclude, bool, false, X) /* true -> don't compile method */ \ Should we rename these flags to make them more clear: EnableDirective ExcludeCompile > >> >> DisableIntrinsic: does not have value so it should not be on list. Similar for others when they don't have values. > DisableIntrinsic appear to be empty because it contains the empty list. I can add an "" for clarity. All options have a > value. Don't crate more noise when it is not needed. It is hs_err file - it is used to understand what happened. Useless information does not help. Even if DisableIntrinsic is specified in directive it should not be listed if it does not have value. Actually I think it is bug - we should not accept directive or flag with empty string value. > >> >> Again what * means in "*PrintInlining:true"? Is it because it is present on command line? > > A * shows that it was set with a directive. Okay. Thanks, Vladimir > > Regards, > Nils > >> >> Thanks, >> Vladimir >> >> On 2/10/16 1:56 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev including Vladimirs suggestions: >>> >>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.04/ >>> >>> Now it will look like this printing the directive when there are no compile commands for inlining: >>> >>> "--------------- T H R E A D --------------- >>> >>> Current thread (0x00007f53f0468000): JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=8398, >>> stack(0x00007f52e6163000,0x00007f52e6264000)] >>> >>> Current CompileTask: >>> C1: 228 1 3 java.lang.String::isLatin1 (19 bytes) >>> >>> Compiling with directive: >>> No inline rules in directive. >>> Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >>> PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false >>> TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false >>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >>> >>> >>> >>> And like this when there are: >>> >>> >>> "--------------- T H R E A D --------------- >>> >>> Current thread (0x00007feda4468800): JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=8314, >>> stack(0x00007fec9a751000,0x00007fec9a852000)] >>> >>> Current CompileTask: >>> C1: 227 1 3 java.lang.String::isLatin1 (19 bytes) >>> >>> Compiling with directive: >>> No inline rules in directive. Following compile commands are in use: >>> inline: b.b, a.a >>> dontinline: c.c >>> exclude: d.d >>> compileonly: *.* >>> Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >>> PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false >>> TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false >>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >>> >>> Regards, >>> Nils >>> >>> On 2016-01-22 19:56, Vladimir Kozlov wrote: >>>> "no inline - compile commands may apply" is confusing to me (and for others who not familiar with directives). What >>>> does it mean? :) >>>> Does it mean no 'inline' directives were used or opposite: -XX:-Inline flag was specified (or corresponding directive). >>>> >>>> If it is switch off inlining then I think it should be "don't inline". >>>> So what "compile commands may apply" means? >>>> >>>> > I updated the print output to mark all options in the directive that are >>>> > not default with a '*'. That makes it quicker to see if any special >>>> >>>> Yes, it is better but I still did not get this. I see that command line has PrintInlining command and it is in the >>>> list: *PrintInlining:true. >>>> But I don't see PrintCompilation on the list but it is specified on command line. On other hand PrintIntrinsics:false >>>> is there. >>>> >>>> > It only prints the directive that is used for the current compile task >>>> > (that caused the crash). (Thats why I put them together in the hs_err file) >>>> >>>> What do you mean "is used"? >>>> >>>> "Print *which* directive (and options) were in use if compiler crash. >>>> Print *if* directives were used at some point if other crash?" >>>> >>>> Should we replace "in use"/"were used" with "were set"? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/22/16 5:38 AM, Nils Eliasson wrote: >>>>> Hi, Vladimir >>>>> >>>>> On 2016-01-21 20:28, Vladimir Kozlov wrote: >>>>>> Passing directives through ciEnv is fine. >>>>>> My question is about output in hs_err file. How those directives were >>>>>> selected in your example? >>>>> >>>>> It only prints the directive that is used for the current compile task >>>>> (that caused the crash). (Thats why I put them together in the hs_err file) >>>>> >>>>>> I found it strange to see mixed flags values and oracle commands. >>>>>> "Enable:true Exclude:false" - which these correspond to, for example? >>>>> >>>>> These are all options from the directive - and they are set with >>>>> directives (highest priority), compilecommmand or vmflags (lowest >>>>> priority). >>>>> >>>>>> >>>>>> Should we not print directives/flags which are not set explicitly? >>>>> >>>>> I updated the print output to mark all options in the directive that are >>>>> not default with a '*'. That makes it quicker to see if any special >>>>> options was applied. It will also print if the directive is the >>>>> unmodified default directive. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/ >>>>> Example output: >>>>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 1/21/16 2:31 AM, Nils Eliasson wrote: >>>>>>> This is how it looks: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> --------------- T H R E A D --------------- >>>>>>> >>>>>>> Current thread (0x00007f071046a000): JavaThread "C1 >>>>>>> CompilerThread10" daemon [_thread_in_native, id=20033, >>>>>>> stack(0x00007f05d7afb000,0x00007f05d7bfc000)] >>>>>>> >>>>>>> Current CompileTask: >>>>>>> C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) >>>>>>> >>>>>>> Current compiler directive: >>>>>>> inline: - >>>>>>> Enable:true Exclude:false BreakAtExecute:false >>>>>>> BreakAtCompile:false Log:false PrintAssembly:false >>>>>>> PrintInlining:false PrintNMethods:false ReplayInline:false >>>>>>> DumpReplay:false DumpInline:false >>>>>>> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: >>>>>>> BlockLayoutByFrequency:true PrintOptoAssembly:false >>>>>>> PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false >>>>>>> TraceSpilling:false Vectorize:false VectorizeDebug:false >>>>>>> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false >>>>>>> IGVPrintLevel:0 MaxNodeLimit:80000 >>>>>>> >>>>>>> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], >>>>>>> sp=0x00007f05d7bfa5d0, free space=1021k >>>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>>>>>> C=native code) >>>>>>> V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, >>>>>>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, >>>>>>> char const*, int, unsigned long)+0x182 >>>>>>> V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char >>>>>>> const*, int, char const*, char const*, __va_list_tag*)+0x4a >>>>>>> V [libjvm.so+0x908cca] report_vm_error(char const*, int, char >>>>>>> const*, char const*, ...)+0xea >>>>>>> V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, >>>>>>> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 >>>>>>> V [libjvm.so+0x88ec5a] >>>>>>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a >>>>>>> V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 >>>>>>> V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 >>>>>>> V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 >>>>>>> V [libjvm.so+0x10189aa] java_start(Thread*)+0xca >>>>>>> C [libpthread.so.0+0x8182] start_thread+0xc2 >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt >>>>>>> >>>>>>> Regards, >>>>>>> Nils >>>>>>> >>>>>>> On 2016-01-21 11:25, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please review this small change. The diff looks big but most of the >>>>>>>> change is just changing how the directive are >>>>>>>> passed to the compilers. Directives are set in the ciEnv and then >>>>>>>> passed to the compilers. The compilers can then >>>>>>>> choose to add it to any internal compilation object for convenience. >>>>>>>> The hs_err printing routine in vmError.cpp loads >>>>>>>> the directive from the ciEnv. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nils >>>>>>> >>>>> >>> > From vladimir.kozlov at oracle.com Fri Apr 8 18:37:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 11:37:40 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. Message-ID: <5707FA74.5060207@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8153818 webrev: http://cr.openjdk.java.net/~kvn/8153818/ Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. Regular testing. Thanks, Vladimir From aleksey.shipilev at oracle.com Fri Apr 8 18:42:14 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 8 Apr 2016 21:42:14 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707E0B5.5080501@oracle.com> References: <5707E0B5.5080501@oracle.com> Message-ID: <5707FB86.1020408@oracle.com> On 04/08/2016 07:47 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ > > https://bugs.openjdk.java.net/browse/JDK-8153540 > > Unsafe.allocateInstance intrinsic can instantiate arrays, but the > allocation logic is broken. > > The proposed fix is to perform necessary checks in Java code before > calling the intrinsic. I like Java-side fix better. > I did some performance measurements [1] and reflection (non-constant > class) case (non-constant class) regressed ~5-10% due to new guards added. My quick perfasm runs seems to show this is because a subtle difference: http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm http://cr.openjdk.java.net/~shade/8153540/patched.perfasm If you compare these, then the difference seems to be the instruction scheduling and a branch in the guards code. Baseline: 0.60% 0.53% 0x00007fafafd34dac: mov 0xc(%rsi),%r10d 3.43% 4.47% 0x00007fafafd34db0: mov 0x50(%r10),%rsi 0.30% 0.24% 0x00007fafafd34db4: movzbl 0x172(%rsi),%r8d 1.40% 2.09% 0x00007fafafd34dbc: mov 0x8(%rsi),%r10d 0.98% 1.48% 0x00007fafafd34dc0: add $0xfffffffc,%r8d 2.71% 4.28% 0x00007fafafd34dc4: mov %r10d,%r11d 0.03% 0.04% 0x00007fafafd34dc7: and $0x1,%r11d 1.02% 1.41% 0x00007fafafd34dcb: or %r11d,%r8d 2.51% 2.49% 0x00007fafafd34dce: test %r8d,%r8d Patched: 0.59% 0.76% 0x00007fd1c4de1c2c: mov 0xc(%rsi),%r11d 3.48% 3.83% 0x00007fd1c4de1c30: mov 0x50(%r11),%rsi 0.35% 0.36% 0x00007fd1c4de1c34: mov 0x8(%rsi),%r10d 1.47% 1.69% 0x00007fd1c4de1c38: test %r10d,%r10d 0x00007fd1c4de1c3b: jl 0x00007fd1c4de1ce5 1.18% 1.48% 0x00007fd1c4de1c41: movzbl 0x172(%rsi),%r11d 2.82% 3.93% 0x00007fd1c4de1c49: mov %r10d,%r9d 0.01% 0.03% 0x00007fd1c4de1c4c: and $0x1,%r9d 0.33% 0.59% 0x00007fd1c4de1c50: add $0xfffffffc,%r11d 0.93% 0.92% 0x00007fd1c4de1c54: or %r9d,%r11d 2.65% 2.47% 0x00007fd1c4de1c57: test %r11d,%r11d Unfortunately, a simple fix of replacing "||" with "|" explodes the generated code. Maybe something else is doable there. > [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java Suggestions to improve fidelity: * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on @Benchmarks if you want to use -prof perfasm Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From igor.veresov at oracle.com Fri Apr 8 19:18:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 8 Apr 2016 12:18:48 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <5707FA74.5060207@oracle.com> References: <5707FA74.5060207@oracle.com> Message-ID: <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com> Looks good. igor > On Apr 8, 2016, at 11:37 AM, Vladimir Kozlov wrote: > > https://bugs.openjdk.java.net/browse/JDK-8153818 > webrev: > http://cr.openjdk.java.net/~kvn/8153818/ > > Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. > > Regular testing. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Fri Apr 8 19:21:31 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 12:21:31 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com> References: <5707FA74.5060207@oracle.com> <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com> Message-ID: <570804BB.5080001@oracle.com> Thank you Igor for reviews. Vladimir On 4/8/16 12:18 PM, Igor Veresov wrote: > Looks good. > > igor > >> On Apr 8, 2016, at 11:37 AM, Vladimir Kozlov wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8153818 >> webrev: >> http://cr.openjdk.java.net/~kvn/8153818/ >> >> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. >> >> Regular testing. >> >> Thanks, >> Vladimir > From christian.thalinger at oracle.com Fri Apr 8 20:14:30 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 8 Apr 2016 10:14:30 -1000 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <5707FA74.5060207@oracle.com> References: <5707FA74.5060207@oracle.com> Message-ID: <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com> Looks good. > On Apr 8, 2016, at 8:37 AM, Vladimir Kozlov wrote: > > https://bugs.openjdk.java.net/browse/JDK-8153818 > webrev: > http://cr.openjdk.java.net/~kvn/8153818/ > > Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. > > Regular testing. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Fri Apr 8 20:17:46 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 13:17:46 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com> References: <5707FA74.5060207@oracle.com> <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com> Message-ID: <570811EA.3040203@oracle.com> Thank you, Chris, for reviews Vladimir On 4/8/16 1:14 PM, Christian Thalinger wrote: > Looks good. > >> On Apr 8, 2016, at 8:37 AM, Vladimir Kozlov wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8153818 >> webrev: >> http://cr.openjdk.java.net/~kvn/8153818/ >> >> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. >> >> Regular testing. >> >> Thanks, >> Vladimir > From rwestrel at redhat.com Mon Apr 11 07:57:10 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 11 Apr 2016 09:57:10 +0200 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <5707EA4B.8030306@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> <570680BC.7030305@oracle.com> <5707B818.6070405@redhat.com> <5707EA4B.8030306@oracle.com> Message-ID: <570B58D6.90408@redhat.com> > Merged your changes here: > http://cr.openjdk.java.net/~shade/8152753/webrev.02/ That looks good to me. Roland. From vladimir.x.ivanov at oracle.com Mon Apr 11 11:07:26 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 11 Apr 2016 14:07:26 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707EAA8.5020005@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707EAA8.5020005@oracle.com> Message-ID: <570B856E.3000206@oracle.com> Thanks for the feedback, Vladimir. > > Unsafe.allocateInstance intrinsic can instantiate arrays, but the > allocation logic is broken. > > But it should not allocate arrays. Right? This is what your java changes > do now. Yes, instance allocation logic doesn't work for arrays. It allocates broken array instances. That's why I decided to filter out arrays. > > Should it be allocateInstance0 ?: > > + // public native Object Unsafe.allocateInstance(Class cls); It is: @HotSpotIntrinsicCandidate - public native Object allocateInstance(Class cls) - throws InstantiationException; + private native Object allocateInstance0(Class cls) throws InstantiationException; > > You removed next stop check. Is it because java code will cat the NULL?: > Node* cls = null_check(argument(1)); > if (stopped()) return true; Yes, cls.isPrimitive() does null checks on both cls and klass. > > The test misses bug number @bug Ok, will add. Best regards, Vladimir Ivanov > > Thanks, > Vladimir K > > On 4/8/16 9:47 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ >> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ >> >> https://bugs.openjdk.java.net/browse/JDK-8153540 >> >> Unsafe.allocateInstance intrinsic can instantiate arrays, but the >> allocation logic is broken. >> >> The proposed fix is to perform necessary checks in Java code before >> calling the intrinsic. >> >> I did some performance measurements [1] and reflection (non-constant >> class) case (non-constant class) regressed ~5-10% >> due to new guards added. >> >> I also experimented with a hotspot-only fix [2], but it looks uglier. >> So, if you consider the regression in reflective >> case non-critical, I'd prefer to go with JDK checks. >> >> Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), >> microbenchmarks. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >> >> Baseline: >> AllocInstance.testConstant avgt 25 3.736 ? 0.054 ns/op >> AllocInstance.testReflective avgt 25 5.880 ? 0.080 ns/op >> >> JDK fix: >> AllocInstance.testConstant avgt 25 3.959 ? 0.205 ns/op >> AllocInstance.testReflective avgt 25 6.274 ? 0.180 ns/op >> >> [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path >> >> AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op >> AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op >> From felix.yang at linaro.org Mon Apr 11 11:47:32 2016 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 11 Apr 2016 19:47:32 +0800 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: <5707C35A.2060000@redhat.com> References: <5707C35A.2060000@redhat.com> Message-ID: Hi, Thanks for reviewing the patch. I find that the cases the patch tries to catch here are the result of loop transformations. And it's hard to produce a test case to for it simply using the Math.min/max API. (Seems C2 will not create a MaxINode/MinINode for a call for these APIs) But I do noticed some JTReg hotspot test cases that already generates the pattern. Example JTReg test cases: hotspot/test/compiler/rangechecks/TestExplicitRangeChecks.java hotspot/test/compiler/rangechecks/TestBadFoldCompare.java hotspot/test/compiler/rangechecks/PowerOf2SizedArraysChecks.java hotspot/test/compiler/rangechecks/TestRangeCheckSmearing.java For the first test, I saw the following instructions in C1 JIT code: $ grep "csel" JTwork/compiler/rangechecks/TestExplicitRangeChecks.jtr | grep zr | grep gt 0x0000007f9e127e8c: csel w14, w10, wzr, gt 0x0000007f9e129568: csel w11, w12, wzr, gt ;*aload_1 {reexecute=0 rethrow=0 return_oop=0} 0x0000007f9e147c78: csel w12, w11, wzr, gt 0x0000007f9e15c75c: csel w12, w13, wzr, gt 0x0000007f9e1689e4: csel w16, w22, wzr, gt 0x0000007f9e18a570: csel w13, w11, wzr, gt 0x0000007f7e13e278: csel w12, w11, wzr, gt 0x0000007f7e14f55c: csel w13, w13, wzr, gt $ grep "csinc" JTwork/compiler/rangechecks/TestExplicitRangeChecks.jtr 0x0000007f9e112860: csinc w12, w12, wzr, gt 0x0000007f9e120e40: csinc w11, w11, wzr, gt 0x0000007f7e114de0: csinc w12, w12, wzr, gt 0x0000007f7e123440: csinc w11, w11, wzr, gt I also searched the C2 JIT code of specJBB2005 & Spark Terasort and I saw the following csel/csinc/csinv generated with the patch: 1. Spark Terasort: 0x0000007f990ec898: csinc w14, w0, wzr, le 0x0000007f990f3a40: csinv w13, w11, wzr, ge 0x0000007f990f3a94: csinv w11, w13, wzr, ge 0x0000007f9912c1a8: csinc w14, w13, wzr, le 0x0000007f9912afe8: csinc w12, w10, wzr, le ;*aload_1 0x0000007f99137f90: csinv w12, w12, wzr, ge 0x0000007f99137fe4: csinv w13, w12, wzr, ge 0x0000007f99132ff8: csinc w11, w10, wzr, le ;*aload_1 0x0000007f9917fdfc: csinc w12, w15, wzr, le 0x0000007f991a5e3c: csinv w11, w11, wzr, ge 0x0000007f991a5e90: csinv w12, w11, wzr, ge 0x0000007f991133bc: csinc w0, w12, wzr, le 0x0000007f9918e548: csinc w12, w18, wzr, le ;*aload_0 0x0000007f991639f8: csinc w16, w12, wzr, le 0x0000007f99115508: csinc w3, w13, wzr, le 0x0000007f991d7e38: csinc w13, w14, wzr, le ;*aload_1 0x0000007f992f7e48: csinc w12, w13, wzr, le ;*aload_0 0x0000007f992dd578: csinv w13, w10, wzr, ge 0x0000007f992e7370: csinv w17, w14, wzr, ge 0x0000007f99222ec8: csinc w12, w13, wzr, le ;*aload_0 0x0000007f993ae208: csinc w10, w15, wzr, le 0x0000007f99405604: csinc w10, w13, wzr, le 0x0000007f9931da84: csinc w11, w13, wzr, le 0x0000007f9941eb04: csinv w15, w11, wzr, ge ;*lload_0 0x0000007f9941ec4c: csinv w15, w14, wzr, ge ;*iload 0x0000007f994a3110: csinc w11, w13, wzr, le 0x0000007f990e78d8: csel w11, w11, wzr, gt 0x0000007f990dc8e0: csel w11, w12, wzr, gt 0x0000007f990dc8f0: csel w11, w11, wzr, gt 0x0000007f990f5f00: csel w12, w12, wzr, gt 0x0000007f990ff83c: csel w11, w10, wzr, gt 0x0000007f9914e0dc: csel w13, w12, wzr, gt 0x0000007f991504f0: csel w10, w11, wzr, gt 0x0000007f9918cbac: csel w20, w20, wzr, gt 0x0000007f991a32dc: csel w10, w10, wzr, gt 0x0000007f990f3a64: csel w13, w13, wzr, gt 0x0000007f990f3ad0: csel w10, w10, wzr, gt 0x0000007f99114d98: csel w16, w14, wzr, gt 0x0000007f991f0434: csel w0, w10, wzr, gt 0x0000007f9920753c: csel w10, w11, wzr, gt 0x0000007f9920754c: csel w10, w10, wzr, gt 0x0000007f99213270: csel w12, w12, wzr, gt 0x0000007f9923a9f8: csel w18, w15, wzr, gt 0x0000007f99210ad4: csel w11, w11, wzr, gt 0x0000007f9926a524: csel w12, w11, wzr, gt 0x0000007f9929e3d0: csel w10, w11, wzr, gt 0x0000007f9929e3ec: csel w11, w11, wzr, gt 0x0000007f992a6c90: csel w11, w12, wzr, gt 0x0000007f99214044: csel w13, w11, wzr, gt 0x0000007f99242d04: csel w11, w12, wzr, gt 0x0000007f99420260: csel w10, w11, wzr, gt ;*checkcast 0x0000007f993b1d14: csel w12, w12, wzr, gt 0x0000007f993fe15c: csel w11, w11, wzr, gt ;*checkcast 0x0000007f99460688: csel w10, w11, wzr, gt 2. specJBB2005: 0x0000007f7e1233e0: csinc w12, w12, wzr, gt 0x0000007f7e1632f8: csinc w10, w10, wzr, gt 0x0000007f7e1c8a1c: csinv w10, w2, wzr, ge 0x0000007f7e14d2cc: csel w0, w12, wzr, gt 0x0000007f7e19982c: csel w10, w10, wzr, gt ;*baload {reexecute=0 rethrow=0 return_oop=0} 0x0000007f7e1bad4c: csel w12, w3, wzr, gt 0x0000007f7e27e25c: csel w10, w10, wzr, gt ;*baload {reexecute=0 rethrow=0 return_oop=0} So the patch got tested for the most part and is not causing us trouble. On 8 April 2016 at 22:42, Andrew Haley wrote: > On 04/08/2016 03:36 PM, Felix Yang wrote: > > Is it OK? > > It looks good, but I'm surely going to need a jtreg test case > which exercises all these patterns in C2. > > Thanks, > > Andrew. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Mon Apr 11 11:50:03 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 11 Apr 2016 14:50:03 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707FB86.1020408@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> Message-ID: <570B8F6B.2030206@oracle.com> Thanks for the feedback, Aleksey. >> I did some performance measurements [1] and reflection (non-constant >> class) case (non-constant class) regressed ~5-10% due to new guards added. > > My quick perfasm runs seems to show this is because a subtle difference: > http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm > http://cr.openjdk.java.net/~shade/8153540/patched.perfasm > > If you compare these, then the difference seems to be the instruction > scheduling and a branch in the guards code. > > Baseline: ... > Patched: ... > > Unfortunately, a simple fix of replacing "||" with "|" explodes the > generated code. Maybe something else is doable there. Yes, C2 can't fuse Class.isArray with slow bit check from Klass::layout_helper. (Partly, because they dispatch to different places: !Class.isArray() case dispatches to explicit exception instantiation and slow path calls into runtime). Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort. Ideally, something like [1] (which requires 2 new intrinsics): * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers; * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object. Best regards, Vladimir Ivanov [1] @ForceInline public Object allocateInstance(Class cls) throws InstantiationException { // Interfaces and abstract classes are handled by the intrinsic. if (isFastAllocatable(cls)) { return allocateInstance0(cls); } else { return allocateInstanceSlow(cls); } } @HotSpotIntrinsicCandidate private native boolean isFastAllocatable(Class cls); @HotSpotIntrinsicCandidate private native Object allocateInstance0(Class cls) throws InstantiationException; // Calls into modified OptoRuntime::new_instance_C @HotSpotIntrinsicCandidate private native Object allocateInstanceSlow(Class cls) throws InstantiationException; > >> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java > > Suggestions to improve fidelity: > * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance > * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on > @Benchmarks if you want to use -prof perfasm > > Thanks, > -Aleksey > > From aph at redhat.com Mon Apr 11 11:56:02 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Apr 2016 12:56:02 +0100 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: References: <5707C35A.2060000@redhat.com> Message-ID: <570B90D2.5020900@redhat.com> Hi, On 04/11/2016 12:47 PM, Felix Yang wrote: > > Thanks for reviewing the patch. > I find that the cases the patch tries to catch here are the > result of loop transformations. > And it's hard to produce a test case to for it simply using the > Math.min/max API. (Seems C2 will not create a MaxINode/MinINode > for a call for these APIs) But I do noticed some JTReg hotspot > test cases that already generates the pattern. > > So the patch got tested for the most part and is not causing us trouble. Sure, but that doesn't necessarily give us test coverage of the changes you've made. Is the problem that you don't know how to write test cases to cover your changes? Andrew. From vladimir.kempik at oracle.com Mon Apr 11 12:00:28 2016 From: vladimir.kempik at oracle.com (Vladimir Kempik) Date: Mon, 11 Apr 2016 15:00:28 +0300 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space Message-ID: <570B91DC.2040904@oracle.com> Hello Please review this backport of 8130309 to jdk8u. Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. Testing: jprt, failing test. Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ Thanks -Vladimir From nils.eliasson at oracle.com Mon Apr 11 12:09:29 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 11 Apr 2016 14:09:29 +0200 Subject: RFR(S): 8153885: [TESTBUG] few regression tests failed after 8151880 changes Message-ID: <570B93F9.5030005@oracle.com> Hi, Please review this fix. Summary: 1) Add -XX:-UseCounterDecay to three tests that where using the compile()-method through the getBCI method. 2) Fix CompileCommand patterns and comments still using the old SimpleTestCase$Helper pattern. Testing: Running all compiler JTREG tests on linux-x64. Bug: https://bugs.openjdk.java.net/browse/JDK-8153885 Webrev: http://cr.openjdk.java.net/~neliasso/8153885/webrev.01/ Regards, Nils Eliasson From tobias.hartmann at oracle.com Mon Apr 11 12:26:43 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Apr 2016 14:26:43 +0200 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <570B91DC.2040904@oracle.com> References: <570B91DC.2040904@oracle.com> Message-ID: <570B9803.2030509@oracle.com> Hi Vladimir, On 11.04.2016 14:00, Vladimir Kempik wrote: > Hello > > Please review this backport of 8130309 to jdk8u. > > Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. > > Testing: jprt, failing test. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 > Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ Looks good to me. Thanks for backporting this! Best regards, Tobias > > Thanks > -Vladimir > From felix.yang at linaro.org Mon Apr 11 12:48:02 2016 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 11 Apr 2016 20:48:02 +0800 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: <570B90D2.5020900@redhat.com> References: <5707C35A.2060000@redhat.com> <570B90D2.5020900@redhat.com> Message-ID: Hi, Yes, I haven't looked into the details of the loop transformation code and find it hard to produce a test case at least for now. And it seems to me necessary to do so in order to produce a good JTReg test case for the patch. But I am not quite sure if I can come up with a test case which covers all the templates added. Any suggestions? Thanks, Felix On 11 April 2016 at 19:56, Andrew Haley wrote: > Hi, > > On 04/11/2016 12:47 PM, Felix Yang wrote: > > > > Thanks for reviewing the patch. > > > I find that the cases the patch tries to catch here are the > > result of loop transformations. > > And it's hard to produce a test case to for it simply using the > > Math.min/max API. (Seems C2 will not create a MaxINode/MinINode > > for a call for these APIs) But I do noticed some JTReg hotspot > > test cases that already generates the pattern. > > > > So the patch got tested for the most part and is not causing us > trouble. > > Sure, but that doesn't necessarily give us test coverage of the > changes you've made. Is the problem that you don't know how to write > test cases to cover your changes? > > Andrew. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.punegov at oracle.com Mon Apr 11 14:02:06 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Mon, 11 Apr 2016 17:02:06 +0300 Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package In-Reply-To: <5707E49D.4040400@oracle.com> References: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> <5707E49D.4040400@oracle.com> Message-ID: <35C50286-0D4C-4F3E-8862-FFDA32334390@oracle.com> Thanks for review, Vladimir > On 08 Apr 2016, at 20:04, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 4/8/16 7:40 AM, Pavel Punegov wrote: >> Hi, >> >> please review the following change to JITtester: >> - rewrite TypeUtil and move to utils package >> - add javadoc for each method in the class >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8153852 >> webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/ >> >> ? Thanks, >> Pavel Punegov >> From pavel.punegov at oracle.com Mon Apr 11 14:02:57 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Mon, 11 Apr 2016 17:02:57 +0300 Subject: RFR(XS): 8140354: Unquarantine tests that failed with OutOfMemoryError In-Reply-To: <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com> References: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com> Message-ID: <1940774F-D052-4995-97A7-3E2485229CA3@oracle.com> Thanks for review, Igor > On 07 Apr 2016, at 16:43, Igor Ignatyev wrote: > > Pavel, > > looks good to me > > ? Igor >> On Apr 7, 2016, at 4:32 PM, Pavel Punegov wrote: >> >> Hi, >> >> please review this fix to unquarantine CompilerControl tests after the JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2] >> The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8140354 >> [2] https://bugs.openjdk.java.net/browse/JDK-8144621 >> -- >> webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/ >> bug https://bugs.openjdk.java.net/browse/JDK-8153661 >> >> ? Pavel. >> > From vladimir.kozlov at oracle.com Mon Apr 11 19:13:42 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Apr 2016 12:13:42 -0700 Subject: RFR(S): 8153885: [TESTBUG] few regression tests failed after 8151880 changes In-Reply-To: <570B93F9.5030005@oracle.com> References: <570B93F9.5030005@oracle.com> Message-ID: <570BF766.9020300@oracle.com> Looks good. thanks, Vladimir On 4/11/16 5:09 AM, Nils Eliasson wrote: > Hi, > > Please review this fix. > > Summary: > 1) Add -XX:-UseCounterDecay to three tests that where using the > compile()-method through the getBCI method. > 2) Fix CompileCommand patterns and comments still using the old > SimpleTestCase$Helper pattern. > > Testing: > Running all compiler JTREG tests on linux-x64. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153885 > Webrev: http://cr.openjdk.java.net/~neliasso/8153885/webrev.01/ > > Regards, > Nils Eliasson From christian.thalinger at oracle.com Mon Apr 11 22:09:24 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Apr 2016 12:09:24 -1000 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570B8F6B.2030206@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> Message-ID: <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> > On Apr 11, 2016, at 1:50 AM, Vladimir Ivanov wrote: > > Thanks for the feedback, Aleksey. > >>> I did some performance measurements [1] and reflection (non-constant >>> class) case (non-constant class) regressed ~5-10% due to new guards added. >> >> My quick perfasm runs seems to show this is because a subtle difference: >> http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm >> http://cr.openjdk.java.net/~shade/8153540/patched.perfasm >> >> If you compare these, then the difference seems to be the instruction >> scheduling and a branch in the guards code. >> >> Baseline: > ... >> Patched: > ... >> >> Unfortunately, a simple fix of replacing "||" with "|" explodes the >> generated code. Maybe something else is doable there. > > Yes, C2 can't fuse Class.isArray with slow bit check from Klass::layout_helper. (Partly, because they dispatch to different places: !Class.isArray() case dispatches to explicit exception instantiation and slow path calls into runtime). > > Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort. > > Ideally, something like [1] (which requires 2 new intrinsics): I would advise against that. We are fixing a long-standing bug here and although we see a regression the code we produced before was just wrong. Comparing against something that was wrong in the first place is moot. Take the hit; I doubt it will show up at customer applications. > > * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers; > > * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object. > > Best regards, > Vladimir Ivanov > > [1] > @ForceInline > public Object allocateInstance(Class cls) throws InstantiationException { > // Interfaces and abstract classes are handled by the intrinsic. > if (isFastAllocatable(cls)) { > return allocateInstance0(cls); > } else { > return allocateInstanceSlow(cls); > } > } > > @HotSpotIntrinsicCandidate > private native boolean isFastAllocatable(Class cls); > > @HotSpotIntrinsicCandidate > private native Object allocateInstance0(Class cls) throws InstantiationException; > > // Calls into modified OptoRuntime::new_instance_C > @HotSpotIntrinsicCandidate > private native Object allocateInstanceSlow(Class cls) throws InstantiationException; > > >> >>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >> >> Suggestions to improve fidelity: >> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance >> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >> @Benchmarks if you want to use -prof perfasm >> >> Thanks, >> -Aleksey >> >> From igor.ignatyev at oracle.com Tue Apr 12 00:03:30 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 11 Apr 2016 17:03:30 -0700 Subject: RFR(XS) : 8152376 : [TESTBUG] compiler/floatingpoint/Test15FloatJNIArgs should use run main/othervm/native Message-ID: <53A08B91-81FD-4C36-9FF2-780AAE3D99CB@oracle.com> http://cr.openjdk.java.net/~iignatyev/8152376/webrev.00/ > 3 lines changed: 0 ins; 0 del; 3 mod; Hi all, could you please review this small fix which adds '/native? option to all main actions? the test uses native library, all such tests should be marked w/ ?/native? option so jtreg would be able to handle them accordingly. JBS: https://bugs.openjdk.java.net/browse/JDK-8152376 webrev: http://cr.openjdk.java.net/~iignatyev/8152376/webrev.00/ Thanks, ? Igor From martin.doerr at sap.com Tue Apr 12 09:45:54 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 12 Apr 2016 09:45:54 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> Message-ID: <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> Hi, I think we have come to a common understanding and there was no complaint about my latest webrev: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ Can I consider it reviewed? Can somebody sponsor, please? Thanks and best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin Sent: Donnerstag, 7. April 2016 12:52 To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Andrew, Jamsheed and all, thank you very much for your input. As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also correct). My change still contains a releasing store for newly created ExceptionCache instances. As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms. I think having the release doesn't hurt too much and makes the design a little cleaner. I also added comments based on your input. The new webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ Please review. I will also need a sponsor from Oracle, please. Thanks again and best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 7. April 2016 12:14 To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On 07/04/16 10:08, Doerr, Martin wrote: > atomic update for the _count would only be required if there were > multiply threads which attempt to increment it > concurrently. However, updates are under lock, so we only have > concurrent readers which is ok. > > I still think "volatile" does what we need here. Especially the xlC > compiler on AIX tends to reload variables from memory. Exactly this > can be prevented by making the field volatile. I think your latest patch is OK. Whether volatile is really good enough, I don't know. The new(ish) C++ memory model treats this as a race, and therefore undefined behaviour. Old C++ didn't have a memory model, so the best we can do with racy code is guess about what our compilers might do. I certainly much prefer a release_store to the storestore fence used in the fix for 8143897. Andrew. From felix.yang at linaro.org Tue Apr 12 10:49:33 2016 From: felix.yang at linaro.org (Felix Yang) Date: Tue, 12 Apr 2016 18:49:33 +0800 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair In-Reply-To: <57067E32.3010403@redhat.com> References: <57067E32.3010403@redhat.com> Message-ID: Done. New webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.01 Tested with JTreg hotspot, langtools and jdk. Thanks, Felix On 7 April 2016 at 23:35, Andrew Haley wrote: > On 04/07/2016 04:01 PM, Felix Yang wrote: > > Please review webrev: > http://cr.openjdk.java.net/~fyang/8153713/webrev.00/ > > JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713 > > > > Currently, C2 compiler generate independent stores to clear > > short arrays whose size is no bigger than parameter > > InitArrayShortSize (refer to ClearArrayNode::Ideal function). > > For the aarch64 port, we have store pair instruction which can > > zero two memory words at a time and this will be good for code > > size and maybe performance for some micro-archs. > > > > For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions > > are generated with the patch, which mean about 2KB reduction in > > codesize. > > > > Tested with JTreg hotspot, langtools and jdk. Is it OK? > > It looks reasonable. It's rather a big slab of code for aarch64.ad, > and I think that it should be in MacroAssembler. Long Chen created > MacroAssembler::zero_words, and you should create an overload of > zero_words which takes a constant int as an argument. > > Andrew. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Apr 12 11:07:46 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Apr 2016 14:07:46 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> Message-ID: <570CD702.4070909@oracle.com> >> Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort. >> >> Ideally, something like [1] (which requires 2 new intrinsics): > > I would advise against that. We are fixing a long-standing bug here and although we see a regression the code we produced before was just wrong. Comparing against something that was wrong in the first place is moot. It wasn't intended as a call for action, but more like a backup plan if there's a need to speed up the reflection case. I'd like to keep the fix simple and current version looks good enough: http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00 Any Reviews, please? Best regards, Vladimir Ivanov > > Take the hit; I doubt it will show up at customer applications. > >> >> * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers; >> >> * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object. >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> @ForceInline >> public Object allocateInstance(Class cls) throws InstantiationException { >> // Interfaces and abstract classes are handled by the intrinsic. >> if (isFastAllocatable(cls)) { >> return allocateInstance0(cls); >> } else { >> return allocateInstanceSlow(cls); >> } >> } >> >> @HotSpotIntrinsicCandidate >> private native boolean isFastAllocatable(Class cls); >> >> @HotSpotIntrinsicCandidate >> private native Object allocateInstance0(Class cls) throws InstantiationException; >> >> // Calls into modified OptoRuntime::new_instance_C >> @HotSpotIntrinsicCandidate >> private native Object allocateInstanceSlow(Class cls) throws InstantiationException; >> >> >>> >>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >>> >>> Suggestions to improve fidelity: >>> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance >>> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >>> @Benchmarks if you want to use -prof perfasm >>> >>> Thanks, >>> -Aleksey >>> >>> > From aph at redhat.com Tue Apr 12 12:30:22 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Apr 2016 13:30:22 +0100 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair In-Reply-To: References: <57067E32.3010403@redhat.com> Message-ID: <570CEA5E.2040208@redhat.com> On 04/12/2016 11:49 AM, Felix Yang wrote: > Done. New webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.01 That looks fine. Thanks. Andrew. From nils.eliasson at oracle.com Tue Apr 12 13:30:19 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 12 Apr 2016 15:30:19 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5707E5C7.3000000@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> Message-ID: <570CF86B.3050804@oracle.com> Tasks get evicted from the compile_queue if their invocation counter hasn't increased during TieredCompileTaskTimeout. (AdvancedThresholdPolicy::is_stale(...)). I'll do a proper fix, it is the right thing to do and should be pretty quick. I'll change the comment to an enum that represent who submitted the compile, and add a table for the comments. This could be useful in other settings to. Regards, Nils On 2016-04-08 19:09, Vladimir Kozlov wrote: > What do you mean "stale"? > I would prefer to see the real fix as you suggested to avoid removing > WB comp tasks from queue. Adding timeout is not reliable. > > Thanks, > Vladimir > > On 4/8/16 5:27 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this small fix of the BlockingCompilation test. >> >> Summary: >> Add method enqueued for compilation with WB API may be removed from >> the compile queue as stale. >> >> Solution: >> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >> stale while the test is running. (Also added some extra >> checks that may spare us from waiting until timeout for failing.) >> >> This is an workaround but we should consider fixing something >> permanent for WB API compiles - like tagging the compile >> task with info about the origin of the compile. The comment field has >> this information - but then it needs to be >> converted to an enum. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >> >> Best regards, >> Nils Eliasson >> >> >> >> From vladimir.kozlov at oracle.com Tue Apr 12 16:33:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Apr 2016 09:33:12 -0700 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570CD702.4070909@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> Message-ID: <570D2348.5000204@oracle.com> You did not fix comment: + // public native Object Unsafe.allocateInstance(Class cls); should be: + // private native Object allocateInstance0(Class cls) throws InstantiationException; An other question: does it really throw InstantiationException? Thanks, Vladimir On 4/12/16 4:07 AM, Vladimir Ivanov wrote: > >>> Additional flag in a mirror (j.l.Class) which marks instance klasses >>> could help here, but I'm still not sure it's worth the effort. >>> >>> Ideally, something like [1] (which requires 2 new intrinsics): >> >> I would advise against that. We are fixing a long-standing bug here >> and although we see a regression the code we produced before was just >> wrong. Comparing against something that was wrong in the first place >> is moot. > > It wasn't intended as a call for action, but more like a backup plan if > there's a need to speed up the reflection case. > > I'd like to keep the fix simple and current version looks good enough: > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00 > > Any Reviews, please? > > Best regards, > Vladimir Ivanov > >> >> Take the hit; I doubt it will show up at customer applications. >> >>> >>> * isFastAllocatable() performs all necessary checks: null checks on >>> cls, not primitive, not array, not interface, not abstract, fully >>> initialized, no finalizers; >>> >>> * allocateInstanceSlow() handles all cases the intrisic doesn't >>> handle: either throws IE or does necessary operations (e.g., >>> initialize the class or register a finalizer) when instantiating an >>> object. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] >>> @ForceInline >>> public Object allocateInstance(Class cls) throws >>> InstantiationException { >>> // Interfaces and abstract classes are handled by the intrinsic. >>> if (isFastAllocatable(cls)) { >>> return allocateInstance0(cls); >>> } else { >>> return allocateInstanceSlow(cls); >>> } >>> } >>> >>> @HotSpotIntrinsicCandidate >>> private native boolean isFastAllocatable(Class cls); >>> >>> @HotSpotIntrinsicCandidate >>> private native Object allocateInstance0(Class cls) throws >>> InstantiationException; >>> >>> // Calls into modified OptoRuntime::new_instance_C >>> @HotSpotIntrinsicCandidate >>> private native Object allocateInstanceSlow(Class cls) throws >>> InstantiationException; >>> >>> >>>> >>>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >>>> >>>> Suggestions to improve fidelity: >>>> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves >>>> variance >>>> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >>>> @Benchmarks if you want to use -prof perfasm >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> >> From vladimir.kozlov at oracle.com Tue Apr 12 16:55:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Apr 2016 09:55:05 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570CF86B.3050804@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> Message-ID: <570D2869.5030206@oracle.com> On 4/12/16 6:30 AM, Nils Eliasson wrote: > Tasks get evicted from the compile_queue if their invocation counter > hasn't increased during TieredCompileTaskTimeout. > (AdvancedThresholdPolicy::is_stale(...)). > > I'll do a proper fix, it is the right thing to do and should be pretty > quick. I'll change the comment to an enum that represent who submitted > the compile, and add a table for the comments. This could be useful in > other settings to. Sounds good. Thanks, Vladimir > > Regards, > Nils > > On 2016-04-08 19:09, Vladimir Kozlov wrote: >> What do you mean "stale"? >> I would prefer to see the real fix as you suggested to avoid removing >> WB comp tasks from queue. Adding timeout is not reliable. >> >> Thanks, >> Vladimir >> >> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this small fix of the BlockingCompilation test. >>> >>> Summary: >>> Add method enqueued for compilation with WB API may be removed from >>> the compile queue as stale. >>> >>> Solution: >>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>> stale while the test is running. (Also added some extra >>> checks that may spare us from waiting until timeout for failing.) >>> >>> This is an workaround but we should consider fixing something >>> permanent for WB API compiles - like tagging the compile >>> task with info about the origin of the compile. The comment field has >>> this information - but then it needs to be >>> converted to an enum. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>> >>> Best regards, >>> Nils Eliasson >>> >>> >>> >>> > From vladimir.x.ivanov at oracle.com Tue Apr 12 16:55:41 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Apr 2016 19:55:41 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570D2348.5000204@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> Message-ID: <570D288D.8020106@oracle.com> On 4/12/16 7:33 PM, Vladimir Kozlov wrote: > You did not fix comment: > > + // public native Object Unsafe.allocateInstance(Class cls); > > should be: > > + // private native Object allocateInstance0(Class cls) throws > InstantiationException; Ok, finally found where it is :-) Incorporated (will update the webrev shortly). > An other question: does it really throw InstantiationException? Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper). I didn't move the check into Java, because I didn't want to add yet another guard on fast path. Best regards, Vladimir Ivanov > > On 4/12/16 4:07 AM, Vladimir Ivanov wrote: >> >>>> Additional flag in a mirror (j.l.Class) which marks instance klasses >>>> could help here, but I'm still not sure it's worth the effort. >>>> >>>> Ideally, something like [1] (which requires 2 new intrinsics): >>> >>> I would advise against that. We are fixing a long-standing bug here >>> and although we see a regression the code we produced before was just >>> wrong. Comparing against something that was wrong in the first place >>> is moot. >> >> It wasn't intended as a call for action, but more like a backup plan if >> there's a need to speed up the reflection case. >> >> I'd like to keep the fix simple and current version looks good enough: >> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00 >> >> Any Reviews, please? >> >> Best regards, >> Vladimir Ivanov >> >>> >>> Take the hit; I doubt it will show up at customer applications. >>> >>>> >>>> * isFastAllocatable() performs all necessary checks: null checks on >>>> cls, not primitive, not array, not interface, not abstract, fully >>>> initialized, no finalizers; >>>> >>>> * allocateInstanceSlow() handles all cases the intrisic doesn't >>>> handle: either throws IE or does necessary operations (e.g., >>>> initialize the class or register a finalizer) when instantiating an >>>> object. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] >>>> @ForceInline >>>> public Object allocateInstance(Class cls) throws >>>> InstantiationException { >>>> // Interfaces and abstract classes are handled by the >>>> intrinsic. >>>> if (isFastAllocatable(cls)) { >>>> return allocateInstance0(cls); >>>> } else { >>>> return allocateInstanceSlow(cls); >>>> } >>>> } >>>> >>>> @HotSpotIntrinsicCandidate >>>> private native boolean isFastAllocatable(Class cls); >>>> >>>> @HotSpotIntrinsicCandidate >>>> private native Object allocateInstance0(Class cls) throws >>>> InstantiationException; >>>> >>>> // Calls into modified OptoRuntime::new_instance_C >>>> @HotSpotIntrinsicCandidate >>>> private native Object allocateInstanceSlow(Class cls) throws >>>> InstantiationException; >>>> >>>> >>>>> >>>>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >>>>> >>>>> Suggestions to improve fidelity: >>>>> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves >>>>> variance >>>>> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >>>>> @Benchmarks if you want to use -prof perfasm >>>>> >>>>> Thanks, >>>>> -Aleksey >>>>> >>>>> >>> From karen.kinnear at oracle.com Tue Apr 12 17:18:32 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 12 Apr 2016 13:18:32 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> Message-ID: <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com> Igor, My apologies, I thought you had already decided to push. Yes, I am good with the changes. Sorry to keep you waiting. thanks, Karen > On Apr 6, 2016, at 1:49 PM, Igor Veresov wrote: > > Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately. > > Thanks, > igor > >> On Apr 5, 2016, at 4:22 PM, Igor Veresov wrote: >> >> >>> On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: >>> >>> Igor, >>> >>> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter >>> for instance? >> >> Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. >> >>> >>> If so, I am ok with checking this in - further notes below. >>> >>>> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >>>> >>>> >>>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>>>> >>>>> I am in agreement with Lois that the JVMS looks good with moving the exception. >>>> >>>> Thanks! >>>>> >>>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>>>> meeting I will check one more time. It might be worth adding a comment. >>>> >>>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >>>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >>>> >>>>> >>>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>>>> >>>> >>>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >>>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). >>> >>> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match >>> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. >>> That is ok with me - I will add a note to the bug. >> >> Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? >> >>> >>> Also: I see a ciMethod::check_call that has a comment - >>> IT appears to fail when applied to an invoke interface call site. >>> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. >>> >> >> This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? >> >> igor >> >>> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take >>> the subtleties of invoke interface and invoke special into account. >>>> >>>> igor >>>> >>>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>>>> so that you get the correct behavior depending on the requesting byte code. >>>>> >>>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>>>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>>>> >>>>> thanks, >>>>> Karen >>>>> >>>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>>>> >>>>>> >>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>> Hi Lois, >>>>>>> >>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>> >>>>>>> igor >>>>>> Hi Igor, >>>>>> >>>>>> Thanks for waiting on this. A couple of comments: >>>>>> >>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>> >>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>> >>>>>> Just curious did you also run the testbase default methods tests? >>>>>> Lois >>>>>> >>>>>>> >>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>> >>>>>>>> Hi Igor, >>>>>>>> >>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Lois >>>>>>>> >>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>> >>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> igor >> > From igor.veresov at oracle.com Tue Apr 12 18:54:33 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 12 Apr 2016 11:54:33 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com> Message-ID: Thanks, Karen! igor > On Apr 12, 2016, at 10:18 AM, Karen Kinnear wrote: > > Igor, > > My apologies, I thought you had already decided to push. Yes, I am good with the changes. > Sorry to keep you waiting. > > thanks, > Karen > >> On Apr 6, 2016, at 1:49 PM, Igor Veresov wrote: >> >> Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately. >> >> Thanks, >> igor >> >>> On Apr 5, 2016, at 4:22 PM, Igor Veresov wrote: >>> >>> >>>> On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: >>>> >>>> Igor, >>>> >>>> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter >>>> for instance? >>> >>> Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. >>> >>>> >>>> If so, I am ok with checking this in - further notes below. >>>> >>>>> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >>>>> >>>>> >>>>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>>>>> >>>>>> I am in agreement with Lois that the JVMS looks good with moving the exception. >>>>> >>>>> Thanks! >>>>>> >>>>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>>>>> meeting I will check one more time. It might be worth adding a comment. >>>>> >>>>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >>>>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >>>>> >>>>>> >>>>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>>>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>>>>> >>>>> >>>>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >>>>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). >>>> >>>> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match >>>> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. >>>> That is ok with me - I will add a note to the bug. >>> >>> Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? >>> >>>> >>>> Also: I see a ciMethod::check_call that has a comment - >>>> IT appears to fail when applied to an invoke interface call site. >>>> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. >>>> >>> >>> This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? >>> >>> igor >>> >>>> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take >>>> the subtleties of invoke interface and invoke special into account. >>>>> >>>>> igor >>>>> >>>>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>>>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>>>>> so that you get the correct behavior depending on the requesting byte code. >>>>>> >>>>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>>>>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>>>>> >>>>>> thanks, >>>>>> Karen >>>>>> >>>>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>>>>> >>>>>>> >>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>>> Hi Lois, >>>>>>>> >>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>>> >>>>>>>> igor >>>>>>> Hi Igor, >>>>>>> >>>>>>> Thanks for waiting on this. A couple of comments: >>>>>>> >>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>>> >>>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>>> >>>>>>> Just curious did you also run the testbase default methods tests? >>>>>>> Lois >>>>>>> >>>>>>>> >>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>>> >>>>>>>>> Hi Igor, >>>>>>>>> >>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Lois >>>>>>>>> >>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>>> >>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> igor >>> >> > From john.r.rose at oracle.com Tue Apr 12 20:09:23 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 12 Apr 2016 13:09:23 -0700 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570D288D.8020106@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> <570D288D.8020106@oracle.com> Message-ID: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov wrote: > > On 4/12/16 7:33 PM, Vladimir Kozlov wrote: >> You did not fix comment: >> >> + // public native Object Unsafe.allocateInstance(Class cls); >> >> should be: >> >> + // private native Object allocateInstance0(Class cls) throws >> InstantiationException; > > Ok, finally found where it is :-) > Incorporated (will update the webrev shortly). > >> An other question: does it really throw InstantiationException? > > Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper). > > I didn't move the check into Java, because I didn't want to add yet another guard on fast path. A fix is necessary, but I'm not comfortable with the shape of the checking logic. The C-coded JNI function (not the intrinsic) just surfaces the function JNIEnv::AllocObject. This function calls some complicated C++ logic in Klass::check_valid_for_instantiation to check for various things, including arrays and abstracts. (There's also a primitive check.) So the problem is that the JIT intrinsic doesn't mimic all these checks. And a good tactic is to lift such checks into Java code, since Java is maintainable. But, there is still a maintenance problem: The checks in the proposed chance overlap with, but do not cover, the checks performed by JNIEnv::AllocObject. Thus, it is difficult to prove that they are correct. Some additional checks are performed (in an ad hoc manner) by the JIT intrinsic. Thus, the checking for a valid class is now in three places: 1. JNIEnv::AllocObject (when the intrinsic is not used), 2. the new Java code (whether the intrinsic is used or not), and 3. the partial checks in the intrinsic code (library_call.cpp). The unit test will prevent regressions, but the code is still messy and hard to work with. Can we make it better at this point? Maybe not; maybe this is the least-bad point fix. But it seems to me that a less-bad fix would put the required logic in two places rather than three. Two ways to do that are 1. push the prim and array checks from Java down into the JIT intrinsic, next to the pre-existing checks, or 2. pull the pre-existing JIT intrinsic tests up into Java. Option 2 seems to require a new intrinsic to capture the pre-existing intrinsic tests. On the whole, since this Unsafe API point simply exposes JNIEnv::AllocObject, I suggest doing the necessary work in library_call.cpp to make the intrinsic accurately reflect that JNI function. That will make the checks easier to verify and maintain. I don't think (AM I REALLY SAYING THIS?) the Java-based checks help much in this particular case. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Apr 12 20:40:14 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 12 Apr 2016 10:40:14 -1000 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> <570D288D.8020106@oracle.com> <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> Message-ID: <4D7414C9-9D41-4939-A7FE-95CDAC9E865C@oracle.com> > On Apr 12, 2016, at 10:09 AM, John Rose wrote: > > On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov > wrote: >> >> On 4/12/16 7:33 PM, Vladimir Kozlov wrote: >>> You did not fix comment: >>> >>> + // public native Object Unsafe.allocateInstance(Class cls); >>> >>> should be: >>> >>> + // private native Object allocateInstance0(Class cls) throws >>> InstantiationException; >> >> Ok, finally found where it is :-) >> Incorporated (will update the webrev shortly). >> >>> An other question: does it really throw InstantiationException? >> >> Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper). >> >> I didn't move the check into Java, because I didn't want to add yet another guard on fast path. > > A fix is necessary, but I'm not comfortable with the shape of the checking logic. > The C-coded JNI function (not the intrinsic) just surfaces the function JNIEnv::AllocObject. > This function calls some complicated C++ logic in Klass::check_valid_for_instantiation > to check for various things, including arrays and abstracts. (There's also a primitive check.) > > So the problem is that the JIT intrinsic doesn't mimic all these checks. > And a good tactic is to lift such checks into Java code, since Java is maintainable. > But, there is still a maintenance problem: The checks in the proposed chance > overlap with, but do not cover, the checks performed by JNIEnv::AllocObject. > Thus, it is difficult to prove that they are correct. Some additional checks are > performed (in an ad hoc manner) by the JIT intrinsic. > > Thus, the checking for a valid class is now in three places: 1. JNIEnv::AllocObject > (when the intrinsic is not used), 2. the new Java code (whether the intrinsic is used > or not), and 3. the partial checks in the intrinsic code (library_call.cpp). > > The unit test will prevent regressions, but the code is still messy and hard to work with. > > Can we make it better at this point? Maybe not; maybe this is the least-bad point fix. > But it seems to me that a less-bad fix would put the required logic in two places > rather than three. Two ways to do that are 1. push the prim and array checks from > Java down into the JIT intrinsic, next to the pre-existing checks, or 2. pull the > pre-existing JIT intrinsic tests up into Java. Option 2 seems to require a new > intrinsic to capture the pre-existing intrinsic tests. > > On the whole, since this Unsafe API point simply exposes JNIEnv::AllocObject, > I suggest doing the necessary work in library_call.cpp to make the intrinsic > accurately reflect that JNI function. That will make the checks easier to verify > and maintain. I don't think (AM I REALLY SAYING THIS?) the Java-based checks > help much in this particular case. -1 (for obvious reasons) -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Wed Apr 13 06:26:24 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 13 Apr 2016 06:26:24 +0000 Subject: CR for RFR 8153998 Message-ID: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Wed Apr 13 08:53:01 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 13 Apr 2016 10:53:01 +0200 Subject: [9] RFR(S): 8154073: Several compiler tests fail when are executed with C1 only Message-ID: <570E08ED.4010207@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8154073 http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/ TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests. TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message. Tested with RBT (running). Thanks, Tobias From nils.eliasson at oracle.com Wed Apr 13 12:59:30 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 13 Apr 2016 14:59:30 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570D2869.5030206@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> Message-ID: <570E42B2.2090306@oracle.com> Hi, New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ Summary Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change. Only new logic is the CompileTask::can_become_stale() method. Testing: Running Testset hotspot on all platforms and hotspot_all on one platform Regards, Nils Eliawsson On 2016-04-12 18:55, Vladimir Kozlov wrote: > On 4/12/16 6:30 AM, Nils Eliasson wrote: >> Tasks get evicted from the compile_queue if their invocation counter >> hasn't increased during TieredCompileTaskTimeout. >> (AdvancedThresholdPolicy::is_stale(...)). >> >> I'll do a proper fix, it is the right thing to do and should be pretty >> quick. I'll change the comment to an enum that represent who submitted >> the compile, and add a table for the comments. This could be useful in >> other settings to. > > Sounds good. > > Thanks, > Vladimir > >> >> Regards, >> Nils >> >> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>> What do you mean "stale"? >>> I would prefer to see the real fix as you suggested to avoid removing >>> WB comp tasks from queue. Adding timeout is not reliable. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this small fix of the BlockingCompilation test. >>>> >>>> Summary: >>>> Add method enqueued for compilation with WB API may be removed from >>>> the compile queue as stale. >>>> >>>> Solution: >>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>> stale while the test is running. (Also added some extra >>>> checks that may spare us from waiting until timeout for failing.) >>>> >>>> This is an workaround but we should consider fixing something >>>> permanent for WB API compiles - like tagging the compile >>>> task with info about the origin of the compile. The comment field has >>>> this information - but then it needs to be >>>> converted to an enum. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>> >>>> Best regards, >>>> Nils Eliasson >>>> >>>> >>>> >>>> >> From vladimir.x.ivanov at oracle.com Wed Apr 13 16:01:52 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 13 Apr 2016 19:01:52 +0300 Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error when invoking nonexistent method Message-ID: <570E6D70.40904@oracle.com> http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8154172 C1 unconditionally inserts null check before doing a call, even if it throws an error during linkage. It contradicts JVMS which requires that linking errors precede run-time errors. The fix is to detect non-resolvable cases and avoid null checks / profiling altogether letting the runtime to throw a linkage error. Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck). Some clarifications: - klass->is_loaded() && !target->is_loaded() is true when method resolution fails; - static vs non-static checks aren't needed because stream()->get_method already returns unloaded method in such case; Thanks! Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Apr 13 16:47:19 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 13 Apr 2016 19:47:19 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> <570D288D.8020106@oracle.com> <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> Message-ID: <570E7817.1040106@oracle.com> Thanks for the feedback, John. I see your point. Actually, after looking at check_valid_for_instantiation more carefully, I found one more missing check in the intrinsic: Class instantiation is forbidden in InstanceKlass::check_valid_for_instantiation, but the intrinsic allows it. So I agree it would be desirable to minimize duplication in the code. Am I right that you are in favor of the following approach? http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path/ I'll experiment to see how does it shape out in both cases. Best regards, Vladimir Ivanov On 4/12/16 11:09 PM, John Rose wrote: > On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov > > wrote: >> >> On 4/12/16 7:33 PM, Vladimir Kozlov wrote: >>> You did not fix comment: >>> >>> + // public native Object Unsafe.allocateInstance(Class cls); >>> >>> should be: >>> >>> + // private native Object allocateInstance0(Class cls) throws >>> InstantiationException; >> >> Ok, finally found where it is :-) >> Incorporated (will update the webrev shortly). >> >>> An other question: does it really throw InstantiationException? >> >> Yes, it does throw IE from runtime call on slow path for abstract >> classes & interfaces (they have slow bit set in layout_helper). >> >> I didn't move the check into Java, because I didn't want to add yet >> another guard on fast path. > > A fix is necessary, but I'm not comfortable with the shape of the > checking logic. > The C-coded JNI function (not the intrinsic) just surfaces the function > JNIEnv::AllocObject. > This function calls some complicated C++ logic > in Klass::check_valid_for_instantiation > to check for various things, including arrays and abstracts. (There's > also a primitive check.) > > So the problem is that the JIT intrinsic doesn't mimic all these checks. > And a good tactic is to lift such checks into Java code, since Java is > maintainable. > But, there is still a maintenance problem: The checks in the proposed > chance > overlap with, but do not cover, the checks performed by JNIEnv::AllocObject. > Thus, it is difficult to prove that they are correct. Some additional > checks are > performed (in an ad hoc manner) by the JIT intrinsic. > > Thus, the checking for a valid class is now in three places: 1. > JNIEnv::AllocObject > (when the intrinsic is not used), 2. the new Java code (whether the > intrinsic is used > or not), and 3. the partial checks in the intrinsic code (library_call.cpp). > > The unit test will prevent regressions, but the code is still messy and > hard to work with. > > Can we make it better at this point? Maybe not; maybe this is the > least-bad point fix. > But it seems to me that a less-bad fix would put the required logic in > two places > rather than three. Two ways to do that are 1. push the prim and array > checks from > Java down into the JIT intrinsic, next to the pre-existing checks, or 2. > pull the > pre-existing JIT intrinsic tests up into Java. Option 2 seems to > require a new > intrinsic to capture the pre-existing intrinsic tests. > > On the whole, since this Unsafe API point simply exposes > JNIEnv::AllocObject, > I suggest doing the necessary work in library_call.cpp to make the intrinsic > accurately reflect that JNI function. That will make the checks easier > to verify > and maintain. I don't think (AM I REALLY SAYING THIS?) the Java-based > checks > help much in this particular case. > > ? John From vladimir.kozlov at oracle.com Wed Apr 13 21:02:07 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 14:02:07 -0700 Subject: [9] RFR(S): 8154073: Several compiler tests fail when are executed with C1 only In-Reply-To: <570E08ED.4010207@oracle.com> References: <570E08ED.4010207@oracle.com> Message-ID: <570EB3CF.5010408@oracle.com> Looks good. Thanks, Vladimir On 4/13/16 1:53 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8154073 > http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/ > > TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests. > > TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message. > > Tested with RBT (running). > > Thanks, > Tobias > From christian.thalinger at oracle.com Wed Apr 13 21:08:02 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 13 Apr 2016 11:08:02 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: > On Apr 12, 2016, at 8:26 PM, Berg, Michael C wrote: > > <>Hi Folks, > > I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. > This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. > Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } > > Thanks, > Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 13 21:34:49 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 14:34:49 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570E42B2.2090306@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> Message-ID: <570EBB79.7060805@oracle.com> Very nice, I like it. One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. Thanks, Vladimir On 4/13/16 5:59 AM, Nils Eliasson wrote: > Hi, > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ > > Summary > Introduced an enum CompileReason with members matching all the old > variants, and a table containing all the unchanged strings. I see the > possibility of removing/changing/simplifying some CompileReasons but > have choosen not to do so in this change. > > Only new logic is the CompileTask::can_become_stale() method. > > Testing: > Running Testset hotspot on all platforms and hotspot_all on one platform > > Regards, > Nils Eliawsson > > On 2016-04-12 18:55, Vladimir Kozlov wrote: >> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>> Tasks get evicted from the compile_queue if their invocation counter >>> hasn't increased during TieredCompileTaskTimeout. >>> (AdvancedThresholdPolicy::is_stale(...)). >>> >>> I'll do a proper fix, it is the right thing to do and should be pretty >>> quick. I'll change the comment to an enum that represent who submitted >>> the compile, and add a table for the comments. This could be useful in >>> other settings to. >> >> Sounds good. >> >> Thanks, >> Vladimir >> >>> >>> Regards, >>> Nils >>> >>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>> What do you mean "stale"? >>>> I would prefer to see the real fix as you suggested to avoid removing >>>> WB comp tasks from queue. Adding timeout is not reliable. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this small fix of the BlockingCompilation test. >>>>> >>>>> Summary: >>>>> Add method enqueued for compilation with WB API may be removed from >>>>> the compile queue as stale. >>>>> >>>>> Solution: >>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>> stale while the test is running. (Also added some extra >>>>> checks that may spare us from waiting until timeout for failing.) >>>>> >>>>> This is an workaround but we should consider fixing something >>>>> permanent for WB API compiles - like tagging the compile >>>>> task with info about the origin of the compile. The comment field has >>>>> this information - but then it needs to be >>>>> converted to an enum. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>> >>>>> Best regards, >>>>> Nils Eliasson >>>>> >>>>> >>>>> >>>>> >>> > From michael.c.berg at intel.com Wed Apr 13 21:35:39 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 13 Apr 2016 21:35:39 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: See below for context. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Wednesday, April 13, 2016 2:08 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Ok, that?s easy enough. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 13 21:52:36 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 14:52:36 -0700 Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error when invoking nonexistent method In-Reply-To: <570E6D70.40904@oracle.com> References: <570E6D70.40904@oracle.com> Message-ID: <570EBFA4.4060005@oracle.com> Looks good to me. ciEnv.cpp cahnges looks empty. If it is only spacing changes we don't need to include them into bug fix. thanks, Vladimir K On 4/13/16 9:01 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8154172 > > C1 unconditionally inserts null check before doing a call, even if it > throws an error during linkage. It contradicts JVMS which requires that > linking errors precede run-time errors. > > The fix is to detect non-resolvable cases and avoid null checks / > profiling altogether letting the runtime to throw a linkage error. > > Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck). > > Some clarifications: > > - klass->is_loaded() && !target->is_loaded() is true when method > resolution fails; > > - static vs non-static checks aren't needed because > stream()->get_method already returns unloaded method in such case; > > Thanks! > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Thu Apr 14 00:40:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 17:40:44 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: <570EE70C.8000906@oracle.com> Hi Michael, Please, split changes. _rex_vex_w_reverted (and other assembler) changes can be pushed first. evmovdqul -> evmovdquq and Vectors element_size() changes could be pushed separately too. You don't need MachMskNode place holder methods in other platforms .ad. I think Matcher::has_predicated_vectors() will be enough since MachMskNode is generated only when has_predicated_vectors() is true. This is how we usually do. macroAssembler_x86.cpp Why you use table and not instructions to generate mask value? Looking on table it very easy to generate (you would need additional instruction but it is better than load from memory I think): (1 << src) - 1 src == 0 could be treated specially. You can leave the table as comment to see which values are expected. x86.ad names should be consistent: MaskCreateINode -> CreateMaskINode, set_mask -> createMask. You can also use Matcher::has_predicated_vectors() in predicate: +instruct createMask(rRegI dst, rRegI src) %{ + predicate(Matcher::has_predicated_vectors()); + match(Set dst (CreateMaskI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} May be it should setMask as reverse to restoreMask. And more precisely setvectmask/restorevectmask. MaskCreateINode or SetVectMaskINode should be defined in vector.hpp and not in subnode.hpp. block.cpp Matcher::has_predicated_vectors() should be checked with if (found_fixup_loops) to avoid useless looping. I don't like how you inject MachMskNode. It should be generated on exit from loop where you created MaskCreateINode. Will need additional review after you clean up above comments. Thanks, Vladimir On 4/12/16 11:26 PM, Berg, Michael C wrote: > Hi Folks, > > I would like to contribute Programmable SIMD as implemented on > multi-versioned post loops. See: > https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of > the implementation. > > This component delivers mask programmed post loops which execute in a > single iteration in place of fixup scalar loops which used to take many > iterations to complete work for user loops. > > Currently I have enabled this optimization for x86 only, specifically > for machines with masked data predication implemented as per fully > enabled EVEX targets. It delivers up to 2x performance and has been > modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows(see jbs entry below): > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ > > Thanks, > > Michael > From tobias.hartmann at oracle.com Thu Apr 14 06:31:10 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Apr 2016 08:31:10 +0200 Subject: [9] RFR(S): 8154073: Several compiler tests fail when are executed with C1 only In-Reply-To: <570EB3CF.5010408@oracle.com> References: <570E08ED.4010207@oracle.com> <570EB3CF.5010408@oracle.com> Message-ID: <570F392E.40608@oracle.com> Thanks, Vladimir! Best regards, Tobias On 13.04.2016 23:02, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/13/16 1:53 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8154073 >> http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/ >> >> TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests. >> >> TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message. >> >> Tested with RBT (running). >> >> Thanks, >> Tobias >> From rwestrel at redhat.com Thu Apr 14 06:46:49 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 14 Apr 2016 08:46:49 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body Message-ID: When running scimark on aarch64: ;; B16: # B17 <- B21 top-of-loop Freq: 2305.21 0x000003ffa126f710: add w17, w11, w12 ;*iadd {reexecute=0 rethrow=0 return_oop=0} ; - jnt.scimark2.FFT::transform_internal at 243 (line 129) 0x000003ffa126f714: nop 0x000003ffa126f718: nop 0x000003ffa126f71c: nop ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0} ; - jnt.scimark2.FFT::transform_internal at 238 (line 129) ;; B17: # B32 B18 <- B25 B16 Loop: B17-B16 inner Freq: 3056.06 0x000003ffa126f720: lsl w16, w17, #1 ;*imul {reexecute=0 rethrow=0 return_oop=0} ; - jnt.scimark2.FFT::transform_internal at 244 (line 129) The 3 nops are added by the code that aligns loop entries: the top of loop block is first encountered and its alignment is set, the loop head is later encountered through the backbranch of an outer loop and its alignment is set. I propose that the code that aligns loop entries verifies that a loop top doesn't exist before it sets the alignment: http://cr.openjdk.java.net/~roland/8154135/webrev.00/ Roland. From igor.veresov at oracle.com Thu Apr 14 07:01:28 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 14 Apr 2016 00:01:28 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570E42B2.2090306@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> Message-ID: Sorry for nitpicking, but can?t compile_reason argument be of type CompileReason instead of int everywhere? It?d be also nice to place reason_name close to the enum. igor > On Apr 13, 2016, at 5:59 AM, Nils Eliasson wrote: > > Hi, > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ > > Summary > Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change. > > Only new logic is the CompileTask::can_become_stale() method. > > Testing: > Running Testset hotspot on all platforms and hotspot_all on one platform > > Regards, > Nils Eliawsson > > On 2016-04-12 18:55, Vladimir Kozlov wrote: >> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>> Tasks get evicted from the compile_queue if their invocation counter >>> hasn't increased during TieredCompileTaskTimeout. >>> (AdvancedThresholdPolicy::is_stale(...)). >>> >>> I'll do a proper fix, it is the right thing to do and should be pretty >>> quick. I'll change the comment to an enum that represent who submitted >>> the compile, and add a table for the comments. This could be useful in >>> other settings to. >> >> Sounds good. >> >> Thanks, >> Vladimir >> >>> >>> Regards, >>> Nils >>> >>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>> What do you mean "stale"? >>>> I would prefer to see the real fix as you suggested to avoid removing >>>> WB comp tasks from queue. Adding timeout is not reliable. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this small fix of the BlockingCompilation test. >>>>> >>>>> Summary: >>>>> Add method enqueued for compilation with WB API may be removed from >>>>> the compile queue as stale. >>>>> >>>>> Solution: >>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>> stale while the test is running. (Also added some extra >>>>> checks that may spare us from waiting until timeout for failing.) >>>>> >>>>> This is an workaround but we should consider fixing something >>>>> permanent for WB API compiles - like tagging the compile >>>>> task with info about the origin of the compile. The comment field has >>>>> this information - but then it needs to be >>>>> converted to an enum. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>> >>>>> Best regards, >>>>> Nils Eliasson >>>>> >>>>> >>>>> >>>>> >>> > From zoltan.majo at oracle.com Thu Apr 14 11:47:28 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 13:47:28 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap Message-ID: <570F8350.5080209@oracle.com> Hi, please review the patch for 8151708. https://bugs.openjdk.java.net/browse/JDK-8151708 Problem: On solaris_sparc, the VM can set the TLAB's top pointer to a value past the end of the Java heap. The problem appears with large values of MinTLABSize.The reason for the problem is that the 'brcs' instruction at http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3260 http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3265 checks the condition codes in 'icc' (32-bit), but not in 'xcc' (64-bit). Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. Webrev: http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ Testing: - JPRT - reproducer on solaris_sparc. Thank you! Best regards, Zoltan From tobias.hartmann at oracle.com Thu Apr 14 12:15:19 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Apr 2016 14:15:19 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8350.5080209@oracle.com> References: <570F8350.5080209@oracle.com> Message-ID: <570F89D7.7080809@oracle.com> Hi Zoltan, On 14.04.2016 13:47, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8151708. > > https://bugs.openjdk.java.net/browse/JDK-8151708 > > Problem: On solaris_sparc, the VM can set the TLAB's top pointer to a value past the end of the Java heap. The problem appears with large values of MinTLABSize.The reason for the problem is that the 'brcs' instruction at > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3260 > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3265 > > checks the condition codes in 'icc' (32-bit), but not in 'xcc' (64-bit). I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. Best regards, Tobias > Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ > > Testing: > - JPRT > - reproducer on solaris_sparc. > > Thank you! > > Best regards, > > > Zoltan > From zoltan.majo at oracle.com Thu Apr 14 12:26:55 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 14:26:55 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F89D7.7080809@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> Message-ID: <570F8C8F.4080003@oracle.com> Hi Tobias, thank you for the feedback! On 04/14/2016 02:15 PM, Tobias Hartmann wrote: > [...] > I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. Yes, that simplifies the code a bit. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ Tests are running. Thank you! Best regards, Zoltan > > Best regards, > Tobias > >> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >> >> Testing: >> - JPRT >> - reproducer on solaris_sparc. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> From nils.eliasson at oracle.com Thu Apr 14 12:32:47 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Apr 2016 14:32:47 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> Message-ID: <570F8DEF.7000504@oracle.com> Yes, good feedback - New webrev including your and Vladimirs suggestions: http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ Thanks for having a look! Nils On 2016-04-14 09:01, Igor Veresov wrote: > Sorry for nitpicking, but can?t compile_reason argument be of type CompileReason instead of int everywhere? It?d be also nice to place reason_name close to the enum. > > igor > > >> On Apr 13, 2016, at 5:59 AM, Nils Eliasson wrote: >> >> Hi, >> >> New webrev: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >> >> Summary >> Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change. >> >> Only new logic is the CompileTask::can_become_stale() method. >> >> Testing: >> Running Testset hotspot on all platforms and hotspot_all on one platform >> >> Regards, >> Nils Eliawsson >> >> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>> Tasks get evicted from the compile_queue if their invocation counter >>>> hasn't increased during TieredCompileTaskTimeout. >>>> (AdvancedThresholdPolicy::is_stale(...)). >>>> >>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>> quick. I'll change the comment to an enum that represent who submitted >>>> the compile, and add a table for the comments. This could be useful in >>>> other settings to. >>> Sounds good. >>> >>> Thanks, >>> Vladimir >>> >>>> Regards, >>>> Nils >>>> >>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>> What do you mean "stale"? >>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this small fix of the BlockingCompilation test. >>>>>> >>>>>> Summary: >>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>> the compile queue as stale. >>>>>> >>>>>> Solution: >>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>> stale while the test is running. (Also added some extra >>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>> >>>>>> This is an workaround but we should consider fixing something >>>>>> permanent for WB API compiles - like tagging the compile >>>>>> task with info about the origin of the compile. The comment field has >>>>>> this information - but then it needs to be >>>>>> converted to an enum. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>> >>>>>> Best regards, >>>>>> Nils Eliasson >>>>>> >>>>>> >>>>>> >>>>>> From tobias.hartmann at oracle.com Thu Apr 14 12:33:48 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Apr 2016 14:33:48 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8C8F.4080003@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> Message-ID: <570F8E2C.20605@oracle.com> Hi Zoltan, On 14.04.2016 14:26, Zolt?n Maj? wrote: > Hi Tobias, > > > thank you for the feedback! > > On 04/14/2016 02:15 PM, Tobias Hartmann wrote: >> [...] >> I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. > > Yes, that simplifies the code a bit. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ Looks good! Best regards, Tobias > > Tests are running. > > Thank you! > > Best regards, > > > Zoltan > >> >> Best regards, >> Tobias >> >>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>> >>> Testing: >>> - JPRT >>> - reproducer on solaris_sparc. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> > From nils.eliasson at oracle.com Thu Apr 14 12:43:06 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Apr 2016 14:43:06 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570EBB79.7060805@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> Message-ID: <570F905A.4050202@oracle.com> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ Thanks! Nils On 2016-04-13 23:34, Vladimir Kozlov wrote: > Very nice, I like it. > > One note. CompileReason (and its names) should be CompileTask class > where it is recorded. Then CompileTask::can_become_stale() can be in > header file so it is inlinined on all platforms. > > Thanks, > Vladimir > > On 4/13/16 5:59 AM, Nils Eliasson wrote: >> Hi, >> >> New webrev: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >> >> Summary >> Introduced an enum CompileReason with members matching all the old >> variants, and a table containing all the unchanged strings. I see the >> possibility of removing/changing/simplifying some CompileReasons but >> have choosen not to do so in this change. >> >> Only new logic is the CompileTask::can_become_stale() method. >> >> Testing: >> Running Testset hotspot on all platforms and hotspot_all on one platform >> >> Regards, >> Nils Eliawsson >> >> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>> Tasks get evicted from the compile_queue if their invocation counter >>>> hasn't increased during TieredCompileTaskTimeout. >>>> (AdvancedThresholdPolicy::is_stale(...)). >>>> >>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>> quick. I'll change the comment to an enum that represent who submitted >>>> the compile, and add a table for the comments. This could be useful in >>>> other settings to. >>> >>> Sounds good. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Regards, >>>> Nils >>>> >>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>> What do you mean "stale"? >>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this small fix of the BlockingCompilation test. >>>>>> >>>>>> Summary: >>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>> the compile queue as stale. >>>>>> >>>>>> Solution: >>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>> stale while the test is running. (Also added some extra >>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>> >>>>>> This is an workaround but we should consider fixing something >>>>>> permanent for WB API compiles - like tagging the compile >>>>>> task with info about the origin of the compile. The comment field >>>>>> has >>>>>> this information - but then it needs to be >>>>>> converted to an enum. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>> >>>>>> Best regards, >>>>>> Nils Eliasson >>>>>> >>>>>> >>>>>> >>>>>> >>>> >> From nils.eliasson at oracle.com Thu Apr 14 13:17:47 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Apr 2016 15:17:47 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" Message-ID: <570F987B.2070202@oracle.com> Hi, Please review this fix. Summary: In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. Solution: We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any logging or warning because this is really a corner case. Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ (Ignore the extra tags in the webrev) Best regards, Nils Eliasson -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Thu Apr 14 13:21:00 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 15:21:00 +0200 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input Message-ID: <570F993C.3040509@oracle.com> Hi, please review the patch for 8153357. https://bugs.openjdk.java.net/browse/JDK-8153357 Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its unique input. http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast node feeds from the unique input. To be able to remove the phi node, the C2 compiler must to determine the type of cast to add in place of the phi node (CastII, CastPP, or CheckCastPP). http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a CheckCastPP instead of a CastPP when casting between two klass pointers). Please find more details about the cause of the failure in the bug description: https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 Solution: Refine C2's logic to determine the type of cast node added. Webrev: http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ Testing: - JPRT; - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs). Thank you and best regards, Zoltan From zoltan.majo at oracle.com Thu Apr 14 13:24:41 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 15:24:41 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8E2C.20605@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> <570F8E2C.20605@oracle.com> Message-ID: <570F9A19.4090500@oracle.com> Hi Tobias, On 04/14/2016 02:33 PM, Tobias Hartmann wrote: > [...] > Looks good! Thank you! For the record: Testing with the reproducer was successful. Best regards, Zoltan > > Best regards, > Tobias > >> Tests are running. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> Best regards, >>> Tobias >>> >>>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>>> >>>> Testing: >>>> - JPRT >>>> - reproducer on solaris_sparc. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> From anton.ivanov at oracle.com Thu Apr 14 13:30:48 2016 From: anton.ivanov at oracle.com (Anton Ivanov) Date: Thu, 14 Apr 2016 16:30:48 +0300 Subject: RFR(XS): 8154174: improve JitTester performance Message-ID: <570F9B88.2040102@oracle.com> Hi, Please review small patch that improves JitTester performance In current implementation JitTester has exception based logic, which is not good by itself, but changing this is quite expensive and there is simple way to decrease exception overhead - turn off stack trace in ProductionFailedException constructor ( this exception is created very often and stack trace is never need, as it only used to control program flow ) Also small improvement was done in code that does deep copy of SymbolTable element ( Map iteration was rewritten to get rid of multiple redundant Map.get() which cost 0(1) only in average case and could be worse potentially ) Testing: local webrev: http://cr.openjdk.java.net/~aaivanov/8154174/webrev bug: https://bugs.openjdk.java.net/browse/JDK-8154174 -- Best regards, Anton Ivanov From michael.c.berg at intel.com Thu Apr 14 15:23:31 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 14 Apr 2016 15:23:31 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: The code has been updated with the change from below: webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.02a/ Regards, Michael From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Wednesday, April 13, 2016 2:36 PM To: Christian Thalinger Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: CR for RFR 8153998 See below for context. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Wednesday, April 13, 2016 2:08 PM To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Ok, that?s easy enough. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Apr 14 16:53:22 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 14 Apr 2016 19:53:22 +0300 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses Message-ID: <570FCB02.6000507@oracle.com> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8134918 Type speculation can produce mismatched unsafe accesses. It injects a guard based on profile data and then propagate type info down to the users. If there's an unsafe access, it can become mismatched w.r.t. profile data being used. It happens even for valid usages. If an unsafe access always matches memory location at runtime, the code produced by type speculation in that case is effectively dead. What cause problems are unsafe OOP accesses (U.putObject()/getObject() on non-OOP locations). The fix is to avoid intrinsification of problematic accesses. Type speculation injects precise type information, which is available during intrinsification. We could try to support mismatched unsafe object accesses instead, but I don't see any value in that. Testing: JPRT, pit-hs-comp (in progress). Thanks! Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Apr 14 16:54:09 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 14 Apr 2016 19:54:09 +0300 Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error when invoking nonexistent method In-Reply-To: <570EBFA4.4060005@oracle.com> References: <570E6D70.40904@oracle.com> <570EBFA4.4060005@oracle.com> Message-ID: <570FCB31.5080709@oracle.com> Thanks, Vladimir. Best regards, Vladimir Ivanov On 4/14/16 12:52 AM, Vladimir Kozlov wrote: > Looks good to me. ciEnv.cpp cahnges looks empty. If it is only spacing > changes we don't need to include them into bug fix. > > thanks, > Vladimir K > > On 4/13/16 9:01 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8154172 >> >> C1 unconditionally inserts null check before doing a call, even if it >> throws an error during linkage. It contradicts JVMS which requires that >> linking errors precede run-time errors. >> >> The fix is to detect non-resolvable cases and avoid null checks / >> profiling altogether letting the runtime to throw a linkage error. >> >> Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck). >> >> Some clarifications: >> >> - klass->is_loaded() && !target->is_loaded() is true when method >> resolution fails; >> >> - static vs non-static checks aren't needed because >> stream()->get_method already returns unloaded method in such case; >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From christian.thalinger at oracle.com Mon Apr 11 17:59:42 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Apr 2016 07:59:42 -1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Message-ID: [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: > > Dear all: > > Can I please request reviews for the following change? > This change was created for JDK 9 and ppc64. > > Description: > This change adds options of compare-and-exchange for POWER architecture. > As described in atomic_linux_ppc.inline.hpp, the current implementation of > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > general purposes because twice calls of sync before and after cmpxchg will > keep consistency. However, they sometimes cause overheads because > sync instructions are very expensive in the current POWER chip design. > With this change, callers can explicitly specify to run fence and acquire with > two additional bool parameters. Because their default values are "true", > it is not necessary to modify existing cmpxchg calls. > > In addition, with the new parameters of cmpxchg, this change improves > performance of copy_to_survivor in the parallel GC. > copy_to_survivor changes forward pointers by using cmpxchg. This > operation doesn't require any sync instructions, in my understanding. > A pointer is changed at most once in a GC and when cmpxchg fails, > the latest pointer is available for the caller. > > When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly > doesn't support new version format of Java 9), pause time of young GC was > reduced from 10% to 20%. > > Summary of source code changes: > > * src/share/vm/runtime/atomic.hpp > * src/share/vm/runtime/atomic.cpp > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > - Add two arguments of fence and acquire to cmpxchg only for PPC64. > Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, > they are reduced while inlining to callers. > > * src/share/vm/oops/oop.inline.hpp > - Changed cas_set_mark to call cmpxchg without fence and acquire. > cas_set_mark is called only by cas_forward_to that is called only by > copy_to_survivor_space and oop_promotion_failed in > psPromotionManager. > > Code change: > > Please see an attached diff file that was generated with "hg diff -g" > under the latest hotspot directory. > > Passed test: > SPECjbb2013 (customized) > > * I believe some other cmpxchg will be optimized by reducing fence > or acquire because twice calls of sync are too conservative to implement > Java memory model. > > > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64_cmpxchg_opt.diff Type: application/octet-stream Size: 8837 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Thu Apr 14 18:15:53 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 08:15:53 -1000 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: References: Message-ID: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com> > On Apr 13, 2016, at 8:46 PM, Roland Westrelin wrote: > > > When running scimark on aarch64: > > ;; B16: # B17 <- B21 top-of-loop Freq: 2305.21 > > 0x000003ffa126f710: add w17, w11, w12 ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 243 (line 129) > > 0x000003ffa126f714: nop > 0x000003ffa126f718: nop > 0x000003ffa126f71c: nop ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 238 (line 129) > > ;; B17: # B32 B18 <- B25 B16 Loop: B17-B16 inner Freq: 3056.06 > > 0x000003ffa126f720: lsl w16, w17, #1 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 244 (line 129) > > The 3 nops are added by the code that aligns loop entries: the top of > loop block is first encountered and its alignment is set, the loop head > is later encountered through the backbranch of an outer loop and its > alignment is set. > > I propose that the code that aligns loop entries verifies that a loop > top doesn't exist before it sets the alignment: > > http://cr.openjdk.java.net/~roland/8154135/webrev.00/ I wonder if this has any performance implications (good or bad). This alignment is not aarch64 specific so we were doing it all the time. > > Roland. From christian.thalinger at oracle.com Thu Apr 14 18:19:46 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 08:19:46 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: > On Apr 13, 2016, at 11:35 AM, Berg, Michael C wrote: > > See below for context. > > Regards, > Michael > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Wednesday, April 13, 2016 2:08 PM > To: Berg, Michael C > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > > On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: > > <>Hi Folks, > > I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. > This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. > Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ > > +//------------------------------MachMskNode----------------------------------- > +// Machine function Msk Node > +class MachMskNode : public MachIdealNode { > Does ?Msk? mean mask? Then we should call it MachMaskNode. > > Ok, that?s easy enough. > > Also, I don?t quite understand why we have: > +instruct set_mask(rRegI dst, rRegI src) %{ > + predicate(VM_Version::supports_avx512vl()); > + match(Set dst (MaskCreateI src)); > + effect(TEMP dst); > + format %{ "createmsk $dst, $src" %} > + ins_encode %{ > + __ createmsk($dst$$Register, $src$$Register); > + %} > but: > + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { > + MacroAssembler _masm(&cbuf); > + __ restoremsk(); > + } > > The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. > The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. > The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. > The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Hmm. So, there is no way we can have a RestoreMaskINode? > > Thanks, > Michael > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Thu Apr 14 18:44:06 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 14 Apr 2016 18:44:06 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: Christian, There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Thursday, April 14, 2016 11:20 AM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: See below for context. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Wednesday, April 13, 2016 2:08 PM To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Ok, that?s easy enough. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Hmm. So, there is no way we can have a RestoreMaskINode? Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Thu Apr 14 18:45:59 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 08:45:59 -1000 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570F905A.4050202@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: > On Apr 14, 2016, at 2:43 AM, Nils Eliasson wrote: > > I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. > > It gets verbose in the method declarations in compileBroker Don?t worry about this. > and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. Yes, that?s the right place. > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ + bool can_become_stale() const { + return !_is_blocking && (_compile_reason < Reason_Whitebox); + } I?m not a fan of implicit contracts just defined by comments. This method doesn?t seem to be performance critical so I would suggest to use a switch-case. An attribute on the enum would be much better but we all know this isn?t Java. > > Thanks! > Nils > > On 2016-04-13 23:34, Vladimir Kozlov wrote: >> Very nice, I like it. >> >> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >> >> Thanks, >> Vladimir >> >> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>> >>> Summary >>> Introduced an enum CompileReason with members matching all the old >>> variants, and a table containing all the unchanged strings. I see the >>> possibility of removing/changing/simplifying some CompileReasons but >>> have choosen not to do so in this change. >>> >>> Only new logic is the CompileTask::can_become_stale() method. >>> >>> Testing: >>> Running Testset hotspot on all platforms and hotspot_all on one platform >>> >>> Regards, >>> Nils Eliawsson >>> >>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>> hasn't increased during TieredCompileTaskTimeout. >>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>> >>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>> quick. I'll change the comment to an enum that represent who submitted >>>>> the compile, and add a table for the comments. This could be useful in >>>>> other settings to. >>>> >>>> Sounds good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>> What do you mean "stale"? >>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>> >>>>>>> Summary: >>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>> the compile queue as stale. >>>>>>> >>>>>>> Solution: >>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>> stale while the test is running. (Also added some extra >>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>> >>>>>>> This is an workaround but we should consider fixing something >>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>> task with info about the origin of the compile. The comment field has >>>>>>> this information - but then it needs to be >>>>>>> converted to an enum. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>> >>>>>>> Best regards, >>>>>>> Nils Eliasson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 14 18:57:02 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 11:57:02 -0700 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8C8F.4080003@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> Message-ID: <570FE7FE.2000001@oracle.com> Good. Thanks, Vladimir On 4/14/16 5:26 AM, Zolt?n Maj? wrote: > Hi Tobias, > > > thank you for the feedback! > > On 04/14/2016 02:15 PM, Tobias Hartmann wrote: >> [...] >> I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. > > Yes, that simplifies the code a bit. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ > > Tests are running. > > Thank you! > > Best regards, > > > Zoltan > >> >> Best regards, >> Tobias >> >>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>> >>> Testing: >>> - JPRT >>> - reproducer on solaris_sparc. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> > From vladimir.kozlov at oracle.com Thu Apr 14 19:02:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 12:02:12 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570F905A.4050202@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: <570FE934.5000800@oracle.com> Looks good. Thanks, Vladimir On 4/14/16 5:43 AM, Nils Eliasson wrote: > I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. > > It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is > the keeper of the CompileReason so it makes sense too. > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ > > Thanks! > Nils > > On 2016-04-13 23:34, Vladimir Kozlov wrote: >> Very nice, I like it. >> >> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >> >> Thanks, >> Vladimir >> >> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>> >>> Summary >>> Introduced an enum CompileReason with members matching all the old >>> variants, and a table containing all the unchanged strings. I see the >>> possibility of removing/changing/simplifying some CompileReasons but >>> have choosen not to do so in this change. >>> >>> Only new logic is the CompileTask::can_become_stale() method. >>> >>> Testing: >>> Running Testset hotspot on all platforms and hotspot_all on one platform >>> >>> Regards, >>> Nils Eliawsson >>> >>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>> hasn't increased during TieredCompileTaskTimeout. >>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>> >>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>> quick. I'll change the comment to an enum that represent who submitted >>>>> the compile, and add a table for the comments. This could be useful in >>>>> other settings to. >>>> >>>> Sounds good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>> What do you mean "stale"? >>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>> >>>>>>> Summary: >>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>> the compile queue as stale. >>>>>>> >>>>>>> Solution: >>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>> stale while the test is running. (Also added some extra >>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>> >>>>>>> This is an workaround but we should consider fixing something >>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>> task with info about the origin of the compile. The comment field has >>>>>>> this information - but then it needs to be >>>>>>> converted to an enum. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>> >>>>>>> Best regards, >>>>>>> Nils Eliasson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>> > From igor.veresov at oracle.com Thu Apr 14 22:15:20 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 14 Apr 2016 15:15:20 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570F905A.4050202@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: <790C0A6F-8E06-4891-A771-9112606C6812@oracle.com> Looks good. Thanks! igor > On Apr 14, 2016, at 5:43 AM, Nils Eliasson wrote: > > I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. > > It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ > > Thanks! > Nils > > On 2016-04-13 23:34, Vladimir Kozlov wrote: >> Very nice, I like it. >> >> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >> >> Thanks, >> Vladimir >> >> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>> >>> Summary >>> Introduced an enum CompileReason with members matching all the old >>> variants, and a table containing all the unchanged strings. I see the >>> possibility of removing/changing/simplifying some CompileReasons but >>> have choosen not to do so in this change. >>> >>> Only new logic is the CompileTask::can_become_stale() method. >>> >>> Testing: >>> Running Testset hotspot on all platforms and hotspot_all on one platform >>> >>> Regards, >>> Nils Eliawsson >>> >>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>> hasn't increased during TieredCompileTaskTimeout. >>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>> >>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>> quick. I'll change the comment to an enum that represent who submitted >>>>> the compile, and add a table for the comments. This could be useful in >>>>> other settings to. >>>> >>>> Sounds good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>> What do you mean "stale"? >>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>> >>>>>>> Summary: >>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>> the compile queue as stale. >>>>>>> >>>>>>> Solution: >>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>> stale while the test is running. (Also added some extra >>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>> >>>>>>> This is an workaround but we should consider fixing something >>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>> task with info about the origin of the compile. The comment field has >>>>>>> this information - but then it needs to be >>>>>>> converted to an enum. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>> >>>>>>> Best regards, >>>>>>> Nils Eliasson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>> > From christian.thalinger at oracle.com Thu Apr 14 22:35:03 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 12:35:03 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> > On Apr 14, 2016, at 8:44 AM, Berg, Michael C wrote: > > Christian, > > There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. That?s unfortunate but I understand. I?m fine with it then. > > Regards, > Michael > > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Thursday, April 14, 2016 11:20 AM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > > On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: > > See below for context. > > Regards, > Michael > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Wednesday, April 13, 2016 2:08 PM > To: Berg, Michael C > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > > On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: > > <>Hi Folks, > > I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. > This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. > Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ > > +//------------------------------MachMskNode----------------------------------- > +// Machine function Msk Node > +class MachMskNode : public MachIdealNode { > Does ?Msk? mean mask? Then we should call it MachMaskNode. > > Ok, that?s easy enough. > > Also, I don?t quite understand why we have: > +instruct set_mask(rRegI dst, rRegI src) %{ > + predicate(VM_Version::supports_avx512vl()); > + match(Set dst (MaskCreateI src)); > + effect(TEMP dst); > + format %{ "createmsk $dst, $src" %} > + ins_encode %{ > + __ createmsk($dst$$Register, $src$$Register); > + %} > but: > + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { > + MacroAssembler _masm(&cbuf); > + __ restoremsk(); > + } > > The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. > The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. > The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. > The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. > > Hmm. So, there is no way we can have a RestoreMaskINode? > > Thanks, > Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 14 23:12:56 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 16:12:56 -0700 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: References: Message-ID: <571023F8.5090903@oracle.com> I agree with optimization but I am not sure about changes. Can we check only one previous block to be more conservative?: Block* b = prev(targ_block) bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment() && !b->head()->is_Loop() Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be for RISC cpus (with fixed instruction size) we should change them. Thanks, Vladimir On 4/13/16 11:46 PM, Roland Westrelin wrote: > > When running scimark on aarch64: > > ;; B16: # B17 <- B21 top-of-loop Freq: 2305.21 > > 0x000003ffa126f710: add w17, w11, w12 ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 243 (line 129) > > 0x000003ffa126f714: nop > 0x000003ffa126f718: nop > 0x000003ffa126f71c: nop ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 238 (line 129) > > ;; B17: # B32 B18 <- B25 B16 Loop: B17-B16 inner Freq: 3056.06 > > 0x000003ffa126f720: lsl w16, w17, #1 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 244 (line 129) > > The 3 nops are added by the code that aligns loop entries: the top of > loop block is first encountered and its alignment is set, the loop head > is later encountered through the backbranch of an outer loop and its > alignment is set. > > I propose that the code that aligns loop entries verifies that a loop > top doesn't exist before it sets the alignment: > > http://cr.openjdk.java.net/~roland/8154135/webrev.00/ > > Roland. > From vladimir.kozlov at oracle.com Thu Apr 14 23:26:59 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 16:26:59 -0700 Subject: CR for RFR 8153998 In-Reply-To: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> Message-ID: <57102743.8080508@oracle.com> On 4/14/16 3:35 PM, Christian Thalinger wrote: > >> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >> >> Christian, >> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. > > That?s unfortunate but I understand. I?m fine with it then. You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. Vladimir > >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Thursday, April 14, 2016 11:20 AM >> *To:*Berg, Michael C > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >> See below for context. >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Wednesday, April 13, 2016 2:08 PM >> *To:*Berg, Michael C > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >> Hi Folks, >> >> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >> performance and has been modeled over a large number of loop lengths and forms of loops. >> This code was tested as follows(see jbs entry below): >> >> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >> >> webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >> >> +//------------------------------MachMskNode----------------------------------- >> >> +// Machine function Msk Node >> >> +class MachMskNode : public MachIdealNode { >> >> Does ?Msk? mean mask? Then we should call it MachMaskNode. >> Ok, that?s easy enough. >> Also, I don?t quite understand why we have: >> >> +instruct set_mask(rRegI dst, rRegI src) %{ >> >> + predicate(VM_Version::supports_avx512vl()); >> >> + match(Set dst (MaskCreateI src)); >> >> + effect(TEMP dst); >> >> + format %{ "createmsk $dst, $src" %} >> >> + ins_encode %{ >> >> + __ createmsk($dst$$Register, $src$$Register); >> >> + %} >> >> but: >> >> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { >> >> + MacroAssembler _masm(&cbuf); >> >> + __ restoremsk(); >> >> + } >> >> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >> >> Hmm. So, there is no way we can have a RestoreMaskINode? >> >> Thanks, >> Michael > From michael.c.berg at intel.com Thu Apr 14 23:38:48 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 14 Apr 2016 23:38:48 +0000 Subject: CR for RFR 8153998 In-Reply-To: <57102743.8080508@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> Message-ID: Vladimir, Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. I tried something like that early on with CountedLoopEnd. The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 4:27 PM To: Christian Thalinger ; Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 3:35 PM, Christian Thalinger wrote: > >> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >> >> Christian, >> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. > > That?s unfortunate but I understand. I?m fine with it then. You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. Vladimir > >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >> > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >> See below for context. >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Wednesday, April 13, 2016 2:08 PM >> *To:*Berg, Michael C > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >> Hi Folks, >> >> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >> performance and has been modeled over a large number of loop lengths and forms of loops. >> This code was tested as follows(see jbs entry below): >> >> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >> >> webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >> >> >> +//------------------------------MachMskNode------------------------- >> ---------- >> >> +// Machine function Msk Node >> >> +class MachMskNode : public MachIdealNode { >> >> Does ?Msk? mean mask? Then we should call it MachMaskNode. >> Ok, that?s easy enough. >> Also, I don?t quite understand why we have: >> >> +instruct set_mask(rRegI dst, rRegI src) %{ >> >> + predicate(VM_Version::supports_avx512vl()); >> >> + match(Set dst (MaskCreateI src)); >> >> + effect(TEMP dst); >> >> + format %{ "createmsk $dst, $src" %} >> >> + ins_encode %{ >> >> + __ createmsk($dst$$Register, $src$$Register); >> >> + %} >> >> but: >> >> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const >> { >> >> + MacroAssembler _masm(&cbuf); >> >> + __ restoremsk(); >> >> + } >> >> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >> >> Hmm. So, there is no way we can have a RestoreMaskINode? >> >> Thanks, >> Michael > From vladimir.kozlov at oracle.com Thu Apr 14 23:41:39 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 16:41:39 -0700 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <570F987B.2070202@oracle.com> References: <570F987B.2070202@oracle.com> Message-ID: <57102AB3.30709@oracle.com> I agree with this simple change as the fix. Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. I don't see a PIT link in the bug report. Thanks, Vladimir On 4/14/16 6:17 AM, Nils Eliasson wrote: > Hi, > > Please review this fix. > > Summary: > In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. > > Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some > essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. > > Solution: > We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any > logging or warning because this is really a corner case. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 > Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ > (Ignore the extra tags in the webrev) > > Best regards, > Nils Eliasson From vladimir.kozlov at oracle.com Fri Apr 15 00:02:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 17:02:05 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> Message-ID: <57102F7D.2090303@oracle.com> On 4/14/16 4:38 PM, Berg, Michael C wrote: > Vladimir, > > Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. > > I tried something like that early on with CountedLoopEnd. In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). I don't see any side effects for restoremask in your code. What are you talking about? I am suggesting something like next: instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ predicate(n->has_vect_mask_set()); match(CountedLoopEnd cop cr); effect(USE labl); ins_cost(400); format %{ "j$cop $labl\t# loop end\n\t" "restoremask \t# vector mask restore for loops" %} ins_encode %{ Label* L = $labl$$label; __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump __ restoremask(); %} ins_pipe(pipe_jcc); %} Vladimir > The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 4:27 PM > To: Christian Thalinger ; Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 3:35 PM, Christian Thalinger wrote: >> >>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>> >>> Christian, >>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >> >> That?s unfortunate but I understand. I?m fine with it then. > > You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. > > Vladimir > >> >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>> > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>> See below for context. >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>> *To:*Berg, Michael C > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>> Hi Folks, >>> >>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>> performance and has been modeled over a large number of loop lengths and forms of loops. >>> This code was tested as follows(see jbs entry below): >>> >>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>> >>> webrev: >>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>> >>> >>> +//------------------------------MachMskNode------------------------- >>> ---------- >>> >>> +// Machine function Msk Node >>> >>> +class MachMskNode : public MachIdealNode { >>> >>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>> Ok, that?s easy enough. >>> Also, I don?t quite understand why we have: >>> >>> +instruct set_mask(rRegI dst, rRegI src) %{ >>> >>> + predicate(VM_Version::supports_avx512vl()); >>> >>> + match(Set dst (MaskCreateI src)); >>> >>> + effect(TEMP dst); >>> >>> + format %{ "createmsk $dst, $src" %} >>> >>> + ins_encode %{ >>> >>> + __ createmsk($dst$$Register, $src$$Register); >>> >>> + %} >>> >>> but: >>> >>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const >>> { >>> >>> + MacroAssembler _masm(&cbuf); >>> >>> + __ restoremsk(); >>> >>> + } >>> >>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>> >>> Hmm. So, there is no way we can have a RestoreMaskINode? >>> >>> Thanks, >>> Michael >> From michael.c.berg at intel.com Fri Apr 15 00:12:30 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 15 Apr 2016 00:12:30 +0000 Subject: CR for RFR 8153998 In-Reply-To: <57102F7D.2090303@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> Message-ID: The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. Ok, I will try the pattern match method. Thanks -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:02 PM To: Berg, Michael C ; Christian Thalinger Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 4:38 PM, Berg, Michael C wrote: > Vladimir, > > Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. > > I tried something like that early on with CountedLoopEnd. In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). I don't see any side effects for restoremask in your code. What are you talking about? I am suggesting something like next: instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ predicate(n->has_vect_mask_set()); match(CountedLoopEnd cop cr); effect(USE labl); ins_cost(400); format %{ "j$cop $labl\t# loop end\n\t" "restoremask \t# vector mask restore for loops" %} ins_encode %{ Label* L = $labl$$label; __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump __ restoremask(); %} ins_pipe(pipe_jcc); %} Vladimir > The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 4:27 PM > To: Christian Thalinger ; Berg, > Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 3:35 PM, Christian Thalinger wrote: >> >>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>> >>> Christian, >>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >> >> That?s unfortunate but I understand. I?m fine with it then. > > You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. > > Vladimir > >> >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>> > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>> See below for context. >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>> *To:*Berg, Michael C > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>> Hi Folks, >>> >>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>> performance and has been modeled over a large number of loop lengths and forms of loops. >>> This code was tested as follows(see jbs entry below): >>> >>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>> >>> webrev: >>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>> >>> >>> +//------------------------------MachMskNode------------------------ >>> +- >>> ---------- >>> >>> +// Machine function Msk Node >>> >>> +class MachMskNode : public MachIdealNode { >>> >>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>> Ok, that?s easy enough. >>> Also, I don?t quite understand why we have: >>> >>> +instruct set_mask(rRegI dst, rRegI src) %{ >>> >>> + predicate(VM_Version::supports_avx512vl()); >>> >>> + match(Set dst (MaskCreateI src)); >>> >>> + effect(TEMP dst); >>> >>> + format %{ "createmsk $dst, $src" %} >>> >>> + ins_encode %{ >>> >>> + __ createmsk($dst$$Register, $src$$Register); >>> >>> + %} >>> >>> but: >>> >>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>> const { >>> >>> + MacroAssembler _masm(&cbuf); >>> >>> + __ restoremsk(); >>> >>> + } >>> >>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>> >>> Hmm. So, there is no way we can have a RestoreMaskINode? >>> >>> Thanks, >>> Michael >> From vladimir.kozlov at oracle.com Fri Apr 15 00:46:52 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 17:46:52 -0700 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <570F993C.3040509@oracle.com> References: <570F993C.3040509@oracle.com> Message-ID: <571039FC.1000603@oracle.com> I think check should use !isa_oopptr() since one of nodes could be ConP NULL ptr which is not klassptr. Thanks, Vladimir On 4/14/16 6:21 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8153357. > > https://bugs.openjdk.java.net/browse/JDK-8153357 > > Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its unique input. > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 > > Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast node feeds from the unique input. > > To be able to remove the phi node, the C2 compiler must to determine the type of cast to add in place of the phi node (CastII, CastPP, or CheckCastPP). > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 > > The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a CheckCastPP instead of a CastPP when casting between two klass pointers). > > Please find more details about the cause of the failure in the bug description: > https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 > > > Solution: Refine C2's logic to determine the type of cast node added. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ > > Testing: > - JPRT; > - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); > - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs). > > Thank you and best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Fri Apr 15 00:51:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 17:51:44 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> Message-ID: <57103B20.1040207@oracle.com> On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode------------------------ >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From michael.c.berg at intel.com Fri Apr 15 00:54:01 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 15 Apr 2016 00:54:01 +0000 Subject: CR for RFR 8153998 In-Reply-To: <57103B20.1040207@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. It will be clean when next you see the code. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:52 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode----------------------- >>>> +- >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From vladimir.kozlov at oracle.com Fri Apr 15 01:12:18 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 18:12:18 -0700 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses In-Reply-To: <570FCB02.6000507@oracle.com> References: <570FCB02.6000507@oracle.com> Message-ID: <57103FF2.5060907@oracle.com> Next assert should be at the beginning of method: + assert(type != T_OBJECT || !unaligned, "unaligned access not supported with object type"); Fix Copyright year in the test. There is no PIT link in the bug report. Thanks, Vladimir On 4/14/16 9:53 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8134918 > > Type speculation can produce mismatched unsafe accesses. > > It injects a guard based on profile data and then propagate type info down to the users. If there's an unsafe access, it can become mismatched w.r.t. profile data being used. > > It happens even for valid usages. If an unsafe access always matches memory location at runtime, the code produced by type speculation in that case is effectively dead. > > What cause problems are unsafe OOP accesses (U.putObject()/getObject() on non-OOP locations). > > The fix is to avoid intrinsification of problematic accesses. Type speculation injects precise type information, which is available during intrinsification. > > We could try to support mismatched unsafe object accesses instead, but I don't see any value in that. > > Testing: JPRT, pit-hs-comp (in progress). > > Thanks! > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Fri Apr 15 01:44:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 18:44:44 -0700 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> Message-ID: <5710478C.8050200@oracle.com> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results. Thanks, Vladimir On 4/12/16 2:45 AM, Doerr, Martin wrote: > Hi, > > I think we have come to a common understanding and there was no complaint about my latest webrev: > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ > > Can I consider it reviewed? > Can somebody sponsor, please? > > Thanks and best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Donnerstag, 7. April 2016 12:52 > To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe > > Hi Andrew, Jamsheed and all, > > thank you very much for your input. > > As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). > Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also correct). > > My change still contains a releasing store for newly created ExceptionCache instances. > As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms. > I think having the release doesn't hurt too much and makes the design a little cleaner. > > I also added comments based on your input. > > The new webrev is here: > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ > > Please review. I will also need a sponsor from Oracle, please. > > Thanks again and best regards, > Martin > > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Donnerstag, 7. April 2016 12:14 > To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe > > On 07/04/16 10:08, Doerr, Martin wrote: > >> atomic update for the _count would only be required if there were >> multiply threads which attempt to increment it >> concurrently. However, updates are under lock, so we only have >> concurrent readers which is ok. >> >> I still think "volatile" does what we need here. Especially the xlC >> compiler on AIX tends to reload variables from memory. Exactly this >> can be prevented by making the field volatile. > > I think your latest patch is OK. Whether volatile is really good > enough, I don't know. The new(ish) C++ memory model treats this as a > race, and therefore undefined behaviour. Old C++ didn't have a memory > model, so the best we can do with racy code is guess about what our > compilers might do. > > I certainly much prefer a release_store to the storestore fence used > in the fix for 8143897. > > Andrew. > From zoltan.majo at oracle.com Fri Apr 15 06:46:20 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Apr 2016 08:46:20 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570FE7FE.2000001@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> <570FE7FE.2000001@oracle.com> Message-ID: <57108E3C.4030209@oracle.com> Thank you, Vladimir and Tobias, for the review! Best regards, Zoltan On 04/14/2016 08:57 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 4/14/16 5:26 AM, Zolt?n Maj? wrote: >> Hi Tobias, >> >> >> thank you for the feedback! >> >> On 04/14/2016 02:15 PM, Tobias Hartmann wrote: >>> [...] >>> I would simply replace the 'br' by 'brx' which tests either xcc or >>> icc depending on the architecture. >> >> Yes, that simplifies the code a bit. Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ >> >> Tests are running. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Best regards, >>> Tobias >>> >>>> Solution: As the VM is handling addresses at the above-mentioned >>>> locations, the appropriate condition codes are supposed to be >>>> checked. Use 'BPcc' instead of 'Bicc' at these locations. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>>> >>>> Testing: >>>> - JPRT >>>> - reproducer on solaris_sparc. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >> From jamsheed.c.m at oracle.com Fri Apr 15 07:44:34 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 15 Apr 2016 13:14:34 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <5710478C.8050200@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> <5710478C.8050200@oracle.com> Message-ID: <57109BE2.1090602@oracle.com> Hi Vladimir, PIT testing is in progress, link is available in bug report. Best Regards, Jamsheed On 4/15/2016 7:14 AM, Vladimir Kozlov wrote: > Looks fine to me. Jamsheed, please, run our PIT testing with these > changes and analyze results. > > Thanks, > Vladimir > > On 4/12/16 2:45 AM, Doerr, Martin wrote: >> Hi, >> >> I think we have come to a common understanding and there was no >> complaint about my latest webrev: >> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >> >> Can I consider it reviewed? >> Can somebody sponsor, please? >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >> Doerr, Martin >> Sent: Donnerstag, 7. April 2016 12:52 >> To: Andrew Haley ; Jamsheed C m >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR(S): 8153267: nmethod's exception cache not >> multi-thread safe >> >> Hi Andrew, Jamsheed and all, >> >> thank you very much for your input. >> >> As Andrew, Jamsheed and I think, it's better to have a releasing >> store in increment_count(). >> Therefore, I have replaced the storestore barrier introduced with >> JDK-8143897 (even though this barrier was also correct). >> >> My change still contains a releasing store for newly created >> ExceptionCache instances. >> As Jamsheed has pointed out, this should not be strictly required as >> we have the other barrier. It may only produce additional false >> negatives on weak memory model platforms. >> I think having the release doesn't hurt too much and makes the design >> a little cleaner. >> >> I also added comments based on your input. >> >> The new webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >> >> Please review. I will also need a sponsor from Oracle, please. >> >> Thanks again and best regards, >> Martin >> >> >> -----Original Message----- >> From: Andrew Haley [mailto:aph at redhat.com] >> Sent: Donnerstag, 7. April 2016 12:14 >> To: Doerr, Martin ; Jamsheed C m >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): 8153267: nmethod's exception cache not >> multi-thread safe >> >> On 07/04/16 10:08, Doerr, Martin wrote: >> >>> atomic update for the _count would only be required if there were >>> multiply threads which attempt to increment it >>> concurrently. However, updates are under lock, so we only have >>> concurrent readers which is ok. >>> >>> I still think "volatile" does what we need here. Especially the xlC >>> compiler on AIX tends to reload variables from memory. Exactly this >>> can be prevented by making the field volatile. >> >> I think your latest patch is OK. Whether volatile is really good >> enough, I don't know. The new(ish) C++ memory model treats this as a >> race, and therefore undefined behaviour. Old C++ didn't have a memory >> model, so the best we can do with racy code is guess about what our >> compilers might do. >> >> I certainly much prefer a release_store to the storestore fence used >> in the fix for 8143897. >> >> Andrew. >> From michael.c.berg at intel.com Fri Apr 15 09:04:25 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 15 Apr 2016 09:04:25 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Vladimir, the code has been updated and is available at: webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ Thanks, Michael -----Original Message----- From: Berg, Michael C Sent: Thursday, April 14, 2016 5:54 PM To: Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: CR for RFR 8153998 Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. It will be clean when next you see the code. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:52 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode----------------------- >>>> +- >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From nils.eliasson at oracle.com Fri Apr 15 09:39:41 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 11:39:41 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <57102AB3.30709@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> Message-ID: <5710B6DD.9090009@oracle.com> Thanks Vladimir! On 2016-04-15 01:41, Vladimir Kozlov wrote: > I agree with this simple change as the fix. > Note, -Xcomp does not switch off Interpreter (we can run without > Interpreter). We use !UseInterpreter as indication if Xcomp was used. > I don't see a PIT link in the bug report. There was none, Tobias found this regression testing something else. Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ Regards, Nils > > Thanks, > Vladimir > > On 4/14/16 6:17 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this fix. >> >> Summary: >> In JDK-8150646 I added an assert in compile_method that the compiler >> must not be NULL. Before there was a return there that just ignored >> the compile. >> >> Running the VM with the flag combination -Xcomp and >> -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter >> is set to false (but the interpreter it is still available) and then >> some >> essential methods are forced to be compiled, but the initial >> complevel becomes 0 and hits the assert in compileBroker. >> >> Solution: >> We could discuss if it should be allowed to submit compiles on level >> 0, a change that would become a bit larger. This time I choose to >> extend the _initalized check in compile_method. I didn't add any >> logging or warning because this is really a corner case. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >> (Ignore the extra tags in the webrev) >> >> Best regards, >> Nils Eliasson From rwestrel at redhat.com Fri Apr 15 11:14:41 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 13:14:41 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com> References: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com> Message-ID: <5710CD21.4060902@redhat.com> Hi Christian, Thanks for looking at this. > I wonder if this has any performance implications (good or bad). > This alignment is not aarch64 specific so we were doing it all the > time. Unless I'm missing something nops in the body of a loop can't really help performance. This looks like a bug to me. Roland. From tobias.hartmann at oracle.com Fri Apr 15 11:15:42 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Apr 2016 13:15:42 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710B6DD.9090009@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> Message-ID: <5710CD5E.5070103@oracle.com> Hi Nils, On 15.04.2016 11:39, Nils Eliasson wrote: > Thanks Vladimir! > On 2016-04-15 01:41, Vladimir Kozlov wrote: >> I agree with this simple change as the fix. >> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. >> I don't see a PIT link in the bug report. > > There was none, Tobias found this regression testing something else. > > Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java > > Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. Otherwise looks good to me. Best regards, Tobias > > Regards, > Nils > >> >> Thanks, >> Vladimir >> >> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this fix. >>> >>> Summary: >>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. >>> >>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some >>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>> >>> Solution: >>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any >>> logging or warning because this is really a corner case. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>> (Ignore the extra tags in the webrev) >>> >>> Best regards, >>> Nils Eliasson > From nils.eliasson at oracle.com Fri Apr 15 11:22:59 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 13:22:59 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710CD5E.5070103@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> Message-ID: <5710CF13.5090404@oracle.com> Hi Tobias, Thanks for your feedback! New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 Regards, Nils On 2016-04-15 13:15, Tobias Hartmann wrote: > Hi Nils, > > On 15.04.2016 11:39, Nils Eliasson wrote: >> Thanks Vladimir! >> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>> I agree with this simple change as the fix. >>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. >>> I don't see a PIT link in the bug report. >> There was none, Tobias found this regression testing something else. >> >> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ > Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. > > Otherwise looks good to me. > > Best regards, > Tobias > >> Regards, >> Nils >> >>> Thanks, >>> Vladimir >>> >>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this fix. >>>> >>>> Summary: >>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. >>>> >>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some >>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>>> >>>> Solution: >>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any >>>> logging or warning because this is really a corner case. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>> (Ignore the extra tags in the webrev) >>>> >>>> Best regards, >>>> Nils Eliasson From tobias.hartmann at oracle.com Fri Apr 15 11:24:10 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Apr 2016 13:24:10 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710CF13.5090404@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> <5710CF13.5090404@oracle.com> Message-ID: <5710CF5A.3090409@oracle.com> Hi Nils, On 15.04.2016 13:22, Nils Eliasson wrote: > Hi Tobias, > > Thanks for your feedback! > > New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 Looks good! Best regards, Tobias > > Regards, > Nils > > On 2016-04-15 13:15, Tobias Hartmann wrote: >> Hi Nils, >> >> On 15.04.2016 11:39, Nils Eliasson wrote: >>> Thanks Vladimir! >>> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>>> I agree with this simple change as the fix. >>>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. >>>> I don't see a PIT link in the bug report. >>> There was none, Tobias found this regression testing something else. >>> >>> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ >> Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. >> >> Otherwise looks good to me. >> >> Best regards, >> Tobias >> >>> Regards, >>> Nils >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this fix. >>>>> >>>>> Summary: >>>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. >>>>> >>>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some >>>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>>>> >>>>> Solution: >>>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any >>>>> logging or warning because this is really a corner case. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>>> (Ignore the extra tags in the webrev) >>>>> >>>>> Best regards, >>>>> Nils Eliasson > From vladimir.x.ivanov at oracle.com Fri Apr 15 11:42:40 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Apr 2016 14:42:40 +0300 Subject: RFR(m): 8145468 deprecations for java.lang In-Reply-To: <57107B6B.5070100@javaspecialists.eu> References: <570EF756.2090602@oracle.com> <570FAB0C.7070505@javaspecialists.eu> <57101D11.6060109@oracle.com> <5710234A.2060001@oracle.com> <57107B6B.5070100@javaspecialists.eu> Message-ID: <5710D3B0.6010903@oracle.com> >>> I had a sidebar with Shipilev on this, and this is indeed still >>> potentially an issue. Alexey's example was: >>> >>> set.contains(new Integer(i)) // 1 >>> >>> vs >>> >>> set.contains(Integer.valueOf(i)) // 2 >>> >>> EA is able to optimize away the allocation in line 1, but the additional >>> complexity of dealing with the Integer cache in valueOf() defeats EA for >>> line 2. (Autoboxing pretty much desugars to line 2.) >> >> I'd say it's a motivating example to improve EA implementation in C2, >> but not to avoid deprecation of public constructors in primitive type >> boxes. It shouldn't matter for C2 whether Integer.valueOf() or >> Integer:: is used. If it does, it's a bug. >> > To do that would probably require a change to the Java Language > Specification to allow us to get rid of the IntegerCache. Unfortunately > it is defined to have a range of -128 to 127 at least in the cache, so > this probably makes it really hard or impossible to optimize this with > EA. I always found it amusing that the killer application for EA, > getting rid of autoboxed Integer objects, didn't really work :-) Still, I'd separate optimization and specification aspects. This case is neither "really hard" nor impossible to optimize. What is hard is to ensure the optimization covers all interesting cases :-) AFAIK C2 should already do a pretty decent job of eliminating box/unbox pairs (e.g., Integer.valueOf().intValue()) and the cache is not a problem here. What can cause problems is when box identity intervenes. For example, even for non-escaping objects the runtime has to be able to materialize them at safepoints. In order to preserve identity invariants, the runtime has to take into account how the box is created (constructor vs factory method). Probably, that's the missing case right now. But there's nothing insurmountable to fix it - the runtime should just consult the cache in the rare cases when rematerialization is necessary. Best regards, Vladimir Ivanov From rwestrel at redhat.com Fri Apr 15 12:24:30 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 14:24:30 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <571023F8.5090903@oracle.com> References: <571023F8.5090903@oracle.com> Message-ID: <5710DD7E.1060105@redhat.com> Hi Vladimir, Thanks for looking at this. > I agree with optimization but I am not sure about changes. Is this an optimization? It looks more like a bug to me. > Can we check only one previous block to be more conservative?: > > Block* b = prev(targ_block) > bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment() > && !b->head()->is_Loop() That would be good enough as far as I can tell. Here is a new webrev: http://cr.openjdk.java.net/~roland/8154135/webrev.01/ > Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be > for RISC cpus (with fixed instruction size) we should change them. Thanks for the pointer. This said, I don't see what could prevent the problem I see from happening on x86 so to me it looks like a bug, rather than a tuning problem. Roland. From nils.eliasson at oracle.com Fri Apr 15 13:06:58 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 15:06:58 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: <5710E772.5050801@oracle.com> Hi, On 2016-04-14 20:45, Christian Thalinger wrote: > >> On Apr 14, 2016, at 2:43 AM, Nils Eliasson > > wrote: >> >> I moved the reasons to CompileTask.hpp and put it together with the >> names list. Also changed the type from int to CompileReason as Igor >> suggested. >> >> It gets verbose in the method declarations in compileBroker > > Don?t worry about this. > >> and sometimes I think CompileReason should be declared in >> CompileBroker because it is mostly used there. On the other hand, >> CompileTask is the keeper of the CompileReason so it makes sense too. > > Yes, that?s the right place. > >> >> New webrev: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >> > > *+ bool can_become_stale() const {* > *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* > *+ }* > I?m not a fan of implicit contracts just defined by comments. This > method doesn?t seem to be performance critical so I would suggest to > use a switch-case. An attribute on the enum would be much better but > we all know this isn?t Java. As you suggested: http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 Also made reasons CTW and Replay not stale-able. Thanks! Nils > >> >> Thanks! >> Nils >> >> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>> Very nice, I like it. >>> >>> One note. CompileReason (and its names) should be CompileTask class >>> where it is recorded. Then CompileTask::can_become_stale() can be in >>> header file so it is inlinined on all platforms. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>> >>>> >>>> Summary >>>> Introduced an enum CompileReason with members matching all the old >>>> variants, and a table containing all the unchanged strings. I see the >>>> possibility of removing/changing/simplifying some CompileReasons but >>>> have choosen not to do so in this change. >>>> >>>> Only new logic is the CompileTask::can_become_stale() method. >>>> >>>> Testing: >>>> Running Testset hotspot on all platforms and hotspot_all on one >>>> platform >>>> >>>> Regards, >>>> Nils Eliawsson >>>> >>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>> >>>>>> I'll do a proper fix, it is the right thing to do and should be >>>>>> pretty >>>>>> quick. I'll change the comment to an enum that represent who >>>>>> submitted >>>>>> the compile, and add a table for the comments. This could be >>>>>> useful in >>>>>> other settings to. >>>>> >>>>> Sounds good. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Regards, >>>>>> Nils >>>>>> >>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>> What do you mean "stale"? >>>>>>> I would prefer to see the real fix as you suggested to avoid >>>>>>> removing >>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>> >>>>>>>> Summary: >>>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>>> the compile queue as stale. >>>>>>>> >>>>>>>> Solution: >>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>> stale while the test is running. (Also added some extra >>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>> >>>>>>>> This is an workaround but we should consider fixing something >>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>> task with info about the origin of the compile. The comment >>>>>>>> field has >>>>>>>> this information - but then it needs to be >>>>>>>> converted to an enum. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Nils Eliasson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Fri Apr 15 13:30:29 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 15:30:29 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <57102AB3.30709@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> Message-ID: <5710ECF5.80201@oracle.com> I forgot the link to the test job: This is both for this and JDK-8153013 BlockingCompilation test times out https://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1309-10848 Regards, //Nils On 2016-04-15 01:41, Vladimir Kozlov wrote: > I agree with this simple change as the fix. > Note, -Xcomp does not switch off Interpreter (we can run without > Interpreter). We use !UseInterpreter as indication if Xcomp was used. > I don't see a PIT link in the bug report. > > Thanks, > Vladimir > > On 4/14/16 6:17 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this fix. >> >> Summary: >> In JDK-8150646 I added an assert in compile_method that the compiler >> must not be NULL. Before there was a return there that just ignored >> the compile. >> >> Running the VM with the flag combination -Xcomp and >> -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter >> is set to false (but the interpreter it is still available) and then >> some >> essential methods are forced to be compiled, but the initial >> complevel becomes 0 and hits the assert in compileBroker. >> >> Solution: >> We could discuss if it should be allowed to submit compiles on level >> 0, a change that would become a bit larger. This time I choose to >> extend the _initalized check in compile_method. I didn't add any >> logging or warning because this is really a corner case. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >> (Ignore the extra tags in the webrev) >> >> Best regards, >> Nils Eliasson From nils.eliasson at oracle.com Fri Apr 15 15:10:35 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 17:10:35 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log Message-ID: <5711046B.9080808@oracle.com> Hi, Please review this fix of print opto_assembly. Summary: The compilelog can get corrupted and the VM may assert on "failed: bad tag in log". When printing assembly in output.cpp we first take the ttylock, print the head and then the method metadata. However the metadata printing makes a vm entry and may block for a safepoint and will then release the lock (break_tty_lock_for_safepoint). After that some of the other compiler thread that haven't safepointed will take the lock and the broken log will be a fact when the safepoint is over and the first thread starts logging again. Solution: Print the method metadata to a temporary buffer, then take the tty lock. Testing: Repro from bug stops failing. Running :hotspot_all (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ Regards, Nils Eliasson -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Fri Apr 15 15:11:37 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Apr 2016 17:11:37 +0200 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <571039FC.1000603@oracle.com> References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com> Message-ID: <571104A9.7060208@oracle.com> Hi Vladimir, thank you for the feedback! On 04/15/2016 02:46 AM, Vladimir Kozlov wrote: > I think check should use !isa_oopptr() since one of nodes could be > ConP NULL ptr which is not klassptr. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/ RBT testing passes. I did ~70 runs with the reproducer, no problems have shown up so far. I'll do ~900 more runs, though. Thank you! Best regards, Zoltan > > Thanks, > Vladimir > > On 4/14/16 6:21 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8153357. >> >> https://bugs.openjdk.java.net/browse/JDK-8153357 >> >> Problem: When determining the unique input of a phi, the C2 compiler >> removes cast nodes connecting the phi to its unique input. >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 >> >> >> Then (if the phi has indeed a unique input), the C2 compiler attempts >> replace the phi with a cast node. The new cast node feeds from the >> unique input. >> >> To be able to remove the phi node, the C2 compiler must to determine >> the type of cast to add in place of the phi node (CastII, CastPP, or >> CheckCastPP). >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 >> >> >> The failure in the bug report appears because the C2 compiler adds a >> cast node of unexpected type to the graph (a CheckCastPP instead of a >> CastPP when casting between two klass pointers). >> >> Please find more details about the cause of the failure in the bug >> description: >> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 >> >> >> >> Solution: Refine C2's logic to determine the type of cast node added. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ >> >> Testing: >> - JPRT; >> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); >> - 500 non-failing runs with the reproducer (the problem reproduces >> with < 100 runs). >> >> Thank you and best regards, >> >> >> Zoltan >> From zoltan.majo at oracle.com Fri Apr 15 15:25:01 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Apr 2016 17:25:01 +0200 Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if on-stack-replacement is enabled Message-ID: <571107CD.8070205@oracle.com> Hi, please review the patch for 8072428. https://bugs.openjdk.java.net/browse/JDK-8072428 Problem: On-stack-replacement requires loop counters; disabling loop counters with on-stack-replacement enabled triggers an assert. Solution: Set UseLoopCounter ergonomically if on-stack-replacement is enabled. Print warning. Webrev: http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/ Tested with locally-built VM (linux_x64). Thank you! Best regards, Zoltan From long.chen at linaro.org Fri Apr 15 12:45:44 2016 From: long.chen at linaro.org (Long Chen) Date: Fri, 15 Apr 2016 20:45:44 +0800 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' Message-ID: Hi Please review this patch making use of DC ZVA to do block zeroing. http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.patch I?m sorry that I can?t produce a test case matching the ?clear_array? pattern showing obvious improvement. However, generating ?DC ZVA? should be the right thing to do as it usually has better cache behaviors. Besides, gcc and linux?s memset have been using ?DC ZVA?. The ArrayFillByte case benefits from ?DC ZVA? when the array length is large. Test, http://people.linaro.org/~long.chen/block_zeroing/ArrayFillByte.java Performance result, http://people.linaro.org/~long.chen/block_zeroing/BlockZeroing.html Tested with jtreg hotspot and langtools. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Fri Apr 15 16:50:57 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 15 Apr 2016 09:50:57 -0700 Subject: RFR(XS): 8154174: improve JitTester performance In-Reply-To: <570F9B88.2040102@oracle.com> References: <570F9B88.2040102@oracle.com> Message-ID: Hi Anton, looks good to me, thanks for doing that. ? Igor > On Apr 14, 2016, at 6:30 AM, Anton Ivanov wrote: > > Hi, > Please review small patch that improves JitTester performance > > In current implementation JitTester has exception based logic, which is not good by itself, but changing this is quite expensive and there is simple way to decrease exception overhead - turn off stack trace in ProductionFailedException constructor ( this exception is created very often and stack trace is never need, as it only used to control program flow ) > Also small improvement was done in code that does deep copy of SymbolTable element ( Map iteration was rewritten to get rid of multiple redundant Map.get() which cost 0(1) only in average case and could be worse potentially ) > > Testing: local > > webrev: http://cr.openjdk.java.net/~aaivanov/8154174/webrev > bug: https://bugs.openjdk.java.net/browse/JDK-8154174 > > -- > Best regards, > Anton Ivanov > From vladimir.kozlov at oracle.com Fri Apr 15 17:14:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:14:05 -0700 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710CF13.5090404@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> <5710CF13.5090404@oracle.com> Message-ID: <5711215D.4060202@oracle.com> Looks good. Make sure the test is executed in JPRT. Thanks, Vladimir On 4/15/16 4:22 AM, Nils Eliasson wrote: > Hi Tobias, > > Thanks for your feedback! > > New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 > > Regards, > Nils > > On 2016-04-15 13:15, Tobias Hartmann wrote: >> Hi Nils, >> >> On 15.04.2016 11:39, Nils Eliasson wrote: >>> Thanks Vladimir! >>> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>>> I agree with this simple change as the fix. >>>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication >>>> if Xcomp was used. >>>> I don't see a PIT link in the bug report. >>> There was none, Tobias found this regression testing something else. >>> >>> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ >> Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 >> ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. >> >> Otherwise looks good to me. >> >> Best regards, >> Tobias >> >>> Regards, >>> Nils >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this fix. >>>>> >>>>> Summary: >>>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return >>>>> there that just ignored the compile. >>>>> >>>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: >>>>> UseInterpreter is set to false (but the interpreter it is still available) and then some >>>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>>>> >>>>> Solution: >>>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. >>>>> This time I choose to extend the _initalized check in compile_method. I didn't add any >>>>> logging or warning because this is really a corner case. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>>> (Ignore the extra tags in the webrev) >>>>> >>>>> Best regards, >>>>> Nils Eliasson > From vladimir.kozlov at oracle.com Fri Apr 15 17:27:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:27:40 -0700 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <571104A9.7060208@oracle.com> References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com> <571104A9.7060208@oracle.com> Message-ID: <5711248C.7000503@oracle.com> Looks good to me. thanks, Vladimir On 4/15/16 8:11 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! > > On 04/15/2016 02:46 AM, Vladimir Kozlov wrote: >> I think check should use !isa_oopptr() since one of nodes could be ConP NULL ptr which is not klassptr. > > Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/ > > RBT testing passes. I did ~70 runs with the reproducer, no problems have shown up so far. I'll do ~900 more runs, though. > > Thank you! > > Best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 4/14/16 6:21 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the patch for 8153357. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153357 >>> >>> Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its >>> unique input. >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 >>> >>> Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast >>> node feeds from the unique input. >>> >>> To be able to remove the phi node, the C2 compiler must to determine the type of cast to add in place of the phi >>> node (CastII, CastPP, or CheckCastPP). >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 >>> >>> The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a >>> CheckCastPP instead of a CastPP when casting between two klass pointers). >>> >>> Please find more details about the cause of the failure in the bug description: >>> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 >>> >>> >>> >>> Solution: Refine C2's logic to determine the type of cast node added. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ >>> >>> Testing: >>> - JPRT; >>> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); >>> - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs). >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> > From vladimir.kozlov at oracle.com Fri Apr 15 17:34:51 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:34:51 -0700 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5710DD7E.1060105@redhat.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> Message-ID: <5711263B.9070505@oracle.com> On 4/15/16 5:24 AM, Roland Westrelin wrote: > Hi Vladimir, > > Thanks for looking at this. > >> I agree with optimization but I am not sure about changes. > > Is this an optimization? It looks more like a bug to me. Code is correct but not optimal. I don't think it is bug. > >> Can we check only one previous block to be more conservative?: >> >> Block* b = prev(targ_block) >> bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment() >> && !b->head()->is_Loop() > > That would be good enough as far as I can tell. Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8154135/webrev.01/ Looks good. > >> Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be >> for RISC cpus (with fixed instruction size) we should change them. > > Thanks for the pointer. This said, I don't see what could prevent the > problem I see from happening on x86 so to me it looks like a bug, rather > than a tuning problem. NumberOfLoopInstrToAlign code is used only on x86 and may hide the problem you see. And I suggested to look on that code to see if we can get additional performance benefits on RISC (on arm64 in your case). Thanks, Vladimir > > Roland. > From vladimir.kozlov at oracle.com Fri Apr 15 17:44:32 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:44:32 -0700 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <5711046B.9080808@oracle.com> References: <5711046B.9080808@oracle.com> Message-ID: <57112880.1010204@oracle.com> Use resizable stream: stringStream(size_t initial_bufsize = 256); 1024 may not be enough. Thanks, Vladimir On 4/15/16 8:10 AM, Nils Eliasson wrote: > Hi, > > Please review this fix of print opto_assembly. > > Summary: > The compilelog can get corrupted and the VM may assert on "failed: bad tag in log". > > When printing assembly in output.cpp we first take the ttylock, print the head and then the method metadata. However the > metadata printing makes a vm entry and may block for a safepoint and will then release the lock > (break_tty_lock_for_safepoint). After that some of the other compiler thread that haven't safepointed will take the lock > and the broken log will be a fact when the safepoint is over and the first thread starts logging again. > > Solution: > Print the method metadata to a temporary buffer, then take the tty lock. > > Testing: > Repro from bug stops failing. > Running :hotspot_all (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 > Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ > > Regards, > Nils Eliasson From rwestrel at redhat.com Fri Apr 15 18:17:47 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 20:17:47 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5711263B.9070505@oracle.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> <5711263B.9070505@oracle.com> Message-ID: <5711304B.3040907@redhat.com> >> http://cr.openjdk.java.net/~roland/8154135/webrev.01/ > > Looks good. Thanks for the review. I need a sponsor now... > NumberOfLoopInstrToAlign code is used only on x86 and may hide the > problem you see. > And I suggested to look on that code to see if we can get additional > performance benefits on RISC (on arm64 in your case). Ok. Again, thanks for the pointer to that piece of code. Roland. From vladimir.x.ivanov at oracle.com Fri Apr 15 18:25:26 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Apr 2016 21:25:26 +0300 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses In-Reply-To: <57103FF2.5060907@oracle.com> References: <570FCB02.6000507@oracle.com> <57103FF2.5060907@oracle.com> Message-ID: <57113216.8030906@oracle.com> Thanks for the feedback, Vladimir. Updated version: http://cr.openjdk.java.net/~vlivanov/8134918/webrev.01/ Additional changes: * alias type doesn't differentiate between byte[] & boolean[]; use address type to narrow the basic type; > Next assert should be at the beginning of method: > + assert(type != T_OBJECT || !unaligned, "unaligned access not > supported with object type"); Fixed. > Fix Copyright year in the test. Fixed. > There is no PIT link in the bug report. Added. Best regards, Vladimir Ivanov > > Thanks, > Vladimir > > On 4/14/16 9:53 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8134918 >> >> Type speculation can produce mismatched unsafe accesses. >> >> It injects a guard based on profile data and then propagate type info >> down to the users. If there's an unsafe access, it can become >> mismatched w.r.t. profile data being used. >> >> It happens even for valid usages. If an unsafe access always matches >> memory location at runtime, the code produced by type speculation in >> that case is effectively dead. >> >> What cause problems are unsafe OOP accesses (U.putObject()/getObject() >> on non-OOP locations). >> >> The fix is to avoid intrinsification of problematic accesses. Type >> speculation injects precise type information, which is available >> during intrinsification. >> >> We could try to support mismatched unsafe object accesses instead, but >> I don't see any value in that. >> >> Testing: JPRT, pit-hs-comp (in progress). >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Apr 15 18:26:21 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Apr 2016 21:26:21 +0300 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5710DD7E.1060105@redhat.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> Message-ID: <5711324D.6090000@oracle.com> Looks good. I'll sponsor the fix. Best regards, Vladimir Ivanov On 4/15/16 3:24 PM, Roland Westrelin wrote: > That would be good enough as far as I can tell. Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8154135/webrev.01/ From vladimir.kozlov at oracle.com Fri Apr 15 18:52:47 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 11:52:47 -0700 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses In-Reply-To: <57113216.8030906@oracle.com> References: <570FCB02.6000507@oracle.com> <57103FF2.5060907@oracle.com> <57113216.8030906@oracle.com> Message-ID: <5711387F.1040600@oracle.com> Looks good. Thanks, Vladimir On 4/15/16 11:25 AM, Vladimir Ivanov wrote: > Thanks for the feedback, Vladimir. > > Updated version: > http://cr.openjdk.java.net/~vlivanov/8134918/webrev.01/ > > Additional changes: > > * alias type doesn't differentiate between byte[] & boolean[]; use address type to narrow the basic type; > >> Next assert should be at the beginning of method: >> + assert(type != T_OBJECT || !unaligned, "unaligned access not >> supported with object type"); > Fixed. > >> Fix Copyright year in the test. > Fixed. > >> There is no PIT link in the bug report. > Added. > > Best regards, > Vladimir Ivanov > >> >> Thanks, >> Vladimir >> >> On 4/14/16 9:53 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8134918 >>> >>> Type speculation can produce mismatched unsafe accesses. >>> >>> It injects a guard based on profile data and then propagate type info >>> down to the users. If there's an unsafe access, it can become >>> mismatched w.r.t. profile data being used. >>> >>> It happens even for valid usages. If an unsafe access always matches >>> memory location at runtime, the code produced by type speculation in >>> that case is effectively dead. >>> >>> What cause problems are unsafe OOP accesses (U.putObject()/getObject() >>> on non-OOP locations). >>> >>> The fix is to avoid intrinsification of problematic accesses. Type >>> speculation injects precise type information, which is available >>> during intrinsification. >>> >>> We could try to support mismatched unsafe object accesses instead, but >>> I don't see any value in that. >>> >>> Testing: JPRT, pit-hs-comp (in progress). >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.kozlov at oracle.com Fri Apr 15 18:56:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 11:56:16 -0700 Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if on-stack-replacement is enabled In-Reply-To: <571107CD.8070205@oracle.com> References: <571107CD.8070205@oracle.com> Message-ID: <57113950.9070700@oracle.com> Looks good. Thanks, Vladimir On 4/15/16 8:25 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8072428. > > https://bugs.openjdk.java.net/browse/JDK-8072428 > > Problem: On-stack-replacement requires loop counters; disabling loop counters with on-stack-replacement enabled triggers > an assert. > > Solution: Set UseLoopCounter ergonomically if on-stack-replacement is enabled. Print warning. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/ > > Tested with locally-built VM (linux_x64). > > Thank you! > > Best regards, > > > Zoltan > From rwestrel at redhat.com Fri Apr 15 18:58:39 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 20:58:39 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5711324D.6090000@oracle.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> <5711324D.6090000@oracle.com> Message-ID: <571139DF.4070607@redhat.com> > Looks good. > > I'll sponsor the fix. Thanks for the review and for pushing it! Roland. From vladimir.kozlov at oracle.com Fri Apr 15 19:30:15 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 12:30:15 -0700 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57109BE2.1090602@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> <5710478C.8050200@oracle.com> <57109BE2.1090602@oracle.com> Message-ID: <57114147.5060206@oracle.com> Thank you, Jamsheed. Testing results looks fine so far. I am pushing it. Thanks, Vladimir On 4/15/16 12:44 AM, Jamsheed C m wrote: > Hi Vladimir, > > PIT testing is in progress, link is available in bug report. > > Best Regards, > Jamsheed > > On 4/15/2016 7:14 AM, Vladimir Kozlov wrote: >> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results. >> >> Thanks, >> Vladimir >> >> On 4/12/16 2:45 AM, Doerr, Martin wrote: >>> Hi, >>> >>> I think we have come to a common understanding and there was no complaint about my latest webrev: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Can I consider it reviewed? >>> Can somebody sponsor, please? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin >>> Sent: Donnerstag, 7. April 2016 12:52 >>> To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> Hi Andrew, Jamsheed and all, >>> >>> thank you very much for your input. >>> >>> As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). >>> Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also >>> correct). >>> >>> My change still contains a releasing store for newly created ExceptionCache instances. >>> As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce >>> additional false negatives on weak memory model platforms. >>> I think having the release doesn't hurt too much and makes the design a little cleaner. >>> >>> I also added comments based on your input. >>> >>> The new webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Please review. I will also need a sponsor from Oracle, please. >>> >>> Thanks again and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Andrew Haley [mailto:aph at redhat.com] >>> Sent: Donnerstag, 7. April 2016 12:14 >>> To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> On 07/04/16 10:08, Doerr, Martin wrote: >>> >>>> atomic update for the _count would only be required if there were >>>> multiply threads which attempt to increment it >>>> concurrently. However, updates are under lock, so we only have >>>> concurrent readers which is ok. >>>> >>>> I still think "volatile" does what we need here. Especially the xlC >>>> compiler on AIX tends to reload variables from memory. Exactly this >>>> can be prevented by making the field volatile. >>> >>> I think your latest patch is OK. Whether volatile is really good >>> enough, I don't know. The new(ish) C++ memory model treats this as a >>> race, and therefore undefined behaviour. Old C++ didn't have a memory >>> model, so the best we can do with racy code is guess about what our >>> compilers might do. >>> >>> I certainly much prefer a release_store to the storestore fence used >>> in the fix for 8143897. >>> >>> Andrew. >>> > From christian.thalinger at oracle.com Fri Apr 15 20:36:33 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 15 Apr 2016 10:36:33 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: <532B4EF9-CE01-4BCF-8B9A-396D6B004BED@oracle.com> > On Apr 14, 2016, at 11:04 PM, Berg, Michael C wrote: > > Vladimir, the code has been updated and is available at: > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ Much better! There are a few smaller things: +class MaskCreateINode : public Node { Didn?t we agree on CreateMaskINode? names should be consistent: MaskCreateINode -> CreateMaskINode, set_mask -> createMask. + Flag_has_vect_mask_set = Flag_is_scheduled << 1, + bool has_vect_mask_set() const { return (_flags & Flag_has_vect_mask_set) != 0; } Please rename to *has_vector_mask_set. +const bool Matcher::has_predicated_vectors(void) { + bool ret_value = false; + switch(UseAVX) { + case 0: + case 1: + case 2: + break; + + case 3: + ret_value = VM_Version::supports_avx512vl(); + break; + } + + return ret_value; +} Change this to: +const bool Matcher::has_predicated_vectors(void) { + switch (UseAVX) { + case 3: + return VM_Version::supports_avx512vl(); + default: return false; + } +} src/share/vm/opto/matcher.hpp + // Some uarchs have predicated registers on vectors Is ?uarchs? a typo? > > Thanks, > Michael > > -----Original Message----- > From: Berg, Michael C > Sent: Thursday, April 14, 2016 5:54 PM > To: Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: CR for RFR 8153998 > > Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. > It will be clean when next you see the code. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:52 PM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 5:12 PM, Berg, Michael C wrote: >> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. > > How it is sizeless when it generates kmovwl() instruction? > Do you mean it does not have side effects (no flags modified)? > > Vladimir > >> >> Ok, I will try the pattern match method. >> >> Thanks >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:02 PM >> To: Berg, Michael C ; Christian Thalinger >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>> Vladimir, >>> >>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>> >>> I tried something like that early on with CountedLoopEnd. >> >> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >> I don't see any side effects for restoremask in your code. What are you talking about? >> >> I am suggesting something like next: >> >> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >> predicate(n->has_vect_mask_set()); >> match(CountedLoopEnd cop cr); >> effect(USE labl); >> >> ins_cost(400); >> format %{ "j$cop $labl\t# loop end\n\t" >> "restoremask \t# vector mask restore for loops" >> %} >> ins_encode %{ >> Label* L = $labl$$label; >> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >> __ restoremask(); >> %} >> ins_pipe(pipe_jcc); >> %} >> >> Vladimir >> >>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 4:27 PM >>> To: Christian Thalinger ; Berg, >>> Michael C >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>> >>>>> Christian, >>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>> >>>> That?s unfortunate but I understand. I?m fine with it then. >>> >>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>> >>> Vladimir >>> >>>> >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>> > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>> See below for context. >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>> *To:*Berg, Michael C > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>> This code was tested as follows(see jbs entry below): >>>>> >>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>> >>>>> webrev: >>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>> >>>>> >>>>> +//------------------------------MachMskNode----------------------- >>>>> +- >>>>> +- >>>>> ---------- >>>>> >>>>> +// Machine function Msk Node >>>>> >>>>> +class MachMskNode : public MachIdealNode { >>>>> >>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>> Ok, that?s easy enough. >>>>> Also, I don?t quite understand why we have: >>>>> >>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>> >>>>> + predicate(VM_Version::supports_avx512vl()); >>>>> >>>>> + match(Set dst (MaskCreateI src)); >>>>> >>>>> + effect(TEMP dst); >>>>> >>>>> + format %{ "createmsk $dst, $src" %} >>>>> >>>>> + ins_encode %{ >>>>> >>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>> >>>>> + %} >>>>> >>>>> but: >>>>> >>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>> const { >>>>> >>>>> + MacroAssembler _masm(&cbuf); >>>>> >>>>> + __ restoremsk(); >>>>> >>>>> + } >>>>> >>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>> >>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>> >>>>> Thanks, >>>>> Michael >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Apr 15 20:43:10 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 15 Apr 2016 10:43:10 -1000 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5710E772.5050801@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> Message-ID: > On Apr 15, 2016, at 3:06 AM, Nils Eliasson wrote: > > Hi, > > On 2016-04-14 20:45, Christian Thalinger wrote: >> >>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson < nils.eliasson at oracle.com > wrote: >>> >>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. >>> >>> It gets verbose in the method declarations in compileBroker >> >> Don?t worry about this. >> >>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. >> >> Yes, that?s the right place. >> >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >> >> + bool can_become_stale() const { >> + return !_is_blocking && (_compile_reason < Reason_Whitebox); >> + } >> I?m not a fan of implicit contracts just defined by comments. This method doesn?t seem to be performance critical so I would suggest to use a switch-case. An attribute on the enum would be much better but we all know this isn?t Java. > > As you suggested: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 Thanks. A space is missing and the closing } indent is wrong: + bool can_become_stale() const { + switch(_compile_reason) { + case Reason_BackedgeCount: + case Reason_InvocationCount: + case Reason_Tiered: + return !_is_blocking; + } + return false; + } Also, what about: + Reason_None, + Reason_CTW, // Compile the world + Reason_Replay, // ciReplay These were covered before. > > Also made reasons CTW and Replay not stale-able. > > Thanks! > Nils > >> >>> >>> Thanks! >>> Nils >>> >>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>> Very nice, I like it. >>>> >>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> New webrev: >>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>> >>>>> Summary >>>>> Introduced an enum CompileReason with members matching all the old >>>>> variants, and a table containing all the unchanged strings. I see the >>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>> have choosen not to do so in this change. >>>>> >>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>> >>>>> Testing: >>>>> Running Testset hotspot on all platforms and hotspot_all on one platform >>>>> >>>>> Regards, >>>>> Nils Eliawsson >>>>> >>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>> >>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>>>> quick. I'll change the comment to an enum that represent who submitted >>>>>>> the compile, and add a table for the comments. This could be useful in >>>>>>> other settings to. >>>>>> >>>>>> Sounds good. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Nils >>>>>>> >>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>> What do you mean "stale"? >>>>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>> >>>>>>>>> Summary: >>>>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>>>> the compile queue as stale. >>>>>>>>> >>>>>>>>> Solution: >>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>>> >>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>> task with info about the origin of the compile. The comment field has >>>>>>>>> this information - but then it needs to be >>>>>>>>> converted to an enum. >>>>>>>>> >>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Nils Eliasson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Sat Apr 16 04:20:33 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sat, 16 Apr 2016 04:20:33 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Vladimir/Christian: I believe I have addressed all concerns in this update: Webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ Regards, Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Friday, April 15, 2016 2:04 AM To: 'Vladimir Kozlov' Cc: 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: CR for RFR 8153998 Vladimir, the code has been updated and is available at: webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ Thanks, Michael -----Original Message----- From: Berg, Michael C Sent: Thursday, April 14, 2016 5:54 PM To: Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: CR for RFR 8153998 Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. It will be clean when next you see the code. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:52 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode----------------------- >>>> +- >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From aph at redhat.com Sat Apr 16 07:59:38 2016 From: aph at redhat.com (Andrew Haley) Date: Sat, 16 Apr 2016 08:59:38 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: Message-ID: <5711F0EA.8020106@redhat.com> + void dc(cache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); + } + + void ic(cache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); } Are DC and IC really synonyms? +typedef void (*_zero_Fn)(HeapWord* to, size_t count); + static void pd_fill_to_aligned_words(HeapWord* tohw, size_t count, juint value) { - pd_fill_to_words(tohw, count, value); + if (UseBlockZeroing + && value == 0 + && count >= (size_t)(BlockZeroingLowLimit >> LogHeapWordSize)) { + ((_zero_Fn)StubRoutines::zero_aligned_words())(tohw, count); + } + else { + pd_fill_to_words(tohw, count, value); + } } I'm not convinced of the value of this. We already know that a simple while (count-- > 0) { *to++ = v; } turns into a call to memset() which does DC ZVA. diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp @@ -4670,11 +4670,54 @@ BLOCK_COMMENT(is_string ? "} string_equals" : "} array_equals"); } + +// base: Address of a buffer to be zeroed, 8 bytes aligned. +// cnt: Count in 8-byte unit. +// is_large: True when 'cnt' is known to be >= BlockZeroingLowLimit. +void MacroAssembler::zero_words(Register base, Register cnt, bool is_large) +{ + if (UseBlockZeroing) { + Label non_block_zeroing; + block_zeroing(base, cnt, non_block_zeroing, is_large); Always use the imperative form of a verb for methods: "block_zero", not, "block_zeroing". // base: Address of a buffer to be zeroed, 8 bytes aligned. -// cnt: Count in 8-byte unit. -void MacroAssembler::zero_words(Register base, Register cnt) +// cnt: Immediate count in 8-byte unit. Please make this // cnt: count in HeapWords Thanks, Andrew. From vladimir.kozlov at oracle.com Mon Apr 18 07:00:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 00:00:28 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: <5714860C.5030702@oracle.com> This looks good. I will start our testing for it. Thanks, Vladimir On 4/15/16 9:20 PM, Berg, Michael C wrote: > Vladimir/Christian: > > I believe I have addressed all concerns in this update: > > Webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ > > Regards, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C > Sent: Friday, April 15, 2016 2:04 AM > To: 'Vladimir Kozlov' > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: RE: CR for RFR 8153998 > > Vladimir, the code has been updated and is available at: > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ > > Thanks, > Michael > > -----Original Message----- > From: Berg, Michael C > Sent: Thursday, April 14, 2016 5:54 PM > To: Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: CR for RFR 8153998 > > Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. > It will be clean when next you see the code. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:52 PM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 5:12 PM, Berg, Michael C wrote: >> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. > > How it is sizeless when it generates kmovwl() instruction? > Do you mean it does not have side effects (no flags modified)? > > Vladimir > >> >> Ok, I will try the pattern match method. >> >> Thanks >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:02 PM >> To: Berg, Michael C ; Christian Thalinger >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>> Vladimir, >>> >>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>> >>> I tried something like that early on with CountedLoopEnd. >> >> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >> I don't see any side effects for restoremask in your code. What are you talking about? >> >> I am suggesting something like next: >> >> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >> predicate(n->has_vect_mask_set()); >> match(CountedLoopEnd cop cr); >> effect(USE labl); >> >> ins_cost(400); >> format %{ "j$cop $labl\t# loop end\n\t" >> "restoremask \t# vector mask restore for loops" >> %} >> ins_encode %{ >> Label* L = $labl$$label; >> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >> __ restoremask(); >> %} >> ins_pipe(pipe_jcc); >> %} >> >> Vladimir >> >>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 4:27 PM >>> To: Christian Thalinger ; Berg, >>> Michael C >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>> >>>>> Christian, >>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>> >>>> That?s unfortunate but I understand. I?m fine with it then. >>> >>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>> >>> Vladimir >>> >>>> >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>> > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>> See below for context. >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>> *To:*Berg, Michael C > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>> This code was tested as follows(see jbs entry below): >>>>> >>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>> >>>>> webrev: >>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>> >>>>> >>>>> +//------------------------------MachMskNode----------------------- >>>>> +- >>>>> +- >>>>> ---------- >>>>> >>>>> +// Machine function Msk Node >>>>> >>>>> +class MachMskNode : public MachIdealNode { >>>>> >>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>> Ok, that?s easy enough. >>>>> Also, I don?t quite understand why we have: >>>>> >>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>> >>>>> + predicate(VM_Version::supports_avx512vl()); >>>>> >>>>> + match(Set dst (MaskCreateI src)); >>>>> >>>>> + effect(TEMP dst); >>>>> >>>>> + format %{ "createmsk $dst, $src" %} >>>>> >>>>> + ins_encode %{ >>>>> >>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>> >>>>> + %} >>>>> >>>>> but: >>>>> >>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>> const { >>>>> >>>>> + MacroAssembler _masm(&cbuf); >>>>> >>>>> + __ restoremsk(); >>>>> >>>>> + } >>>>> >>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>> >>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>> >>>>> Thanks, >>>>> Michael >>>> From martin.doerr at sap.com Mon Apr 18 07:31:36 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 18 Apr 2016 07:31:36 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57114147.5060206@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> <5710478C.8050200@oracle.com> <57109BE2.1090602@oracle.com> <57114147.5060206@oracle.com> Message-ID: <4e54bacd93284faf9843b60f98314de5@DEWDFE13DE14.global.corp.sap> Thanks everybody for the discussion, for reviewing and for sponsoring. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Freitag, 15. April 2016 21:30 To: Jamsheed C m ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thank you, Jamsheed. Testing results looks fine so far. I am pushing it. Thanks, Vladimir On 4/15/16 12:44 AM, Jamsheed C m wrote: > Hi Vladimir, > > PIT testing is in progress, link is available in bug report. > > Best Regards, > Jamsheed > > On 4/15/2016 7:14 AM, Vladimir Kozlov wrote: >> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results. >> >> Thanks, >> Vladimir >> >> On 4/12/16 2:45 AM, Doerr, Martin wrote: >>> Hi, >>> >>> I think we have come to a common understanding and there was no complaint about my latest webrev: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Can I consider it reviewed? >>> Can somebody sponsor, please? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin >>> Sent: Donnerstag, 7. April 2016 12:52 >>> To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> Hi Andrew, Jamsheed and all, >>> >>> thank you very much for your input. >>> >>> As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). >>> Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also >>> correct). >>> >>> My change still contains a releasing store for newly created ExceptionCache instances. >>> As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce >>> additional false negatives on weak memory model platforms. >>> I think having the release doesn't hurt too much and makes the design a little cleaner. >>> >>> I also added comments based on your input. >>> >>> The new webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Please review. I will also need a sponsor from Oracle, please. >>> >>> Thanks again and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Andrew Haley [mailto:aph at redhat.com] >>> Sent: Donnerstag, 7. April 2016 12:14 >>> To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> On 07/04/16 10:08, Doerr, Martin wrote: >>> >>>> atomic update for the _count would only be required if there were >>>> multiply threads which attempt to increment it >>>> concurrently. However, updates are under lock, so we only have >>>> concurrent readers which is ok. >>>> >>>> I still think "volatile" does what we need here. Especially the xlC >>>> compiler on AIX tends to reload variables from memory. Exactly this >>>> can be prevented by making the field volatile. >>> >>> I think your latest patch is OK. Whether volatile is really good >>> enough, I don't know. The new(ish) C++ memory model treats this as a >>> race, and therefore undefined behaviour. Old C++ didn't have a memory >>> model, so the best we can do with racy code is guess about what our >>> compilers might do. >>> >>> I certainly much prefer a release_store to the storestore fence used >>> in the fix for 8143897. >>> >>> Andrew. >>> > From zoltan.majo at oracle.com Mon Apr 18 07:36:40 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 18 Apr 2016 09:36:40 +0200 Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if on-stack-replacement is enabled In-Reply-To: <57113950.9070700@oracle.com> References: <571107CD.8070205@oracle.com> <57113950.9070700@oracle.com> Message-ID: <57148E88.9010905@oracle.com> Thank you, Vladimir, for the review! Best regards, Zoltan On 04/15/2016 08:56 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/15/16 8:25 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8072428. >> >> https://bugs.openjdk.java.net/browse/JDK-8072428 >> >> Problem: On-stack-replacement requires loop counters; disabling loop >> counters with on-stack-replacement enabled triggers >> an assert. >> >> Solution: Set UseLoopCounter ergonomically if on-stack-replacement is >> enabled. Print warning. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/ >> >> Tested with locally-built VM (linux_x64). >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> From edward.nevill at gmail.com Mon Apr 18 08:10:51 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 18 Apr 2016 09:10:51 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: Message-ID: <1460967051.10749.31.camel@mint> On Fri, 2016-04-15 at 20:45 +0800, Long Chen wrote: > Hi > > Please review this patch making use of DC ZVA to do block zeroing. > > http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.patch > > I?m sorry that I can?t produce a test case matching the ?clear_array? pattern showing obvious improvement. However, generating ?DC ZVA? should be the right thing to do as it usually has better cache behaviors. Besides, gcc and linux?s memset have been using ?DC ZVA?. > Hi Long, Thanks for this. I have benchmarked this on 3 different partners HW using the following JMH test case http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java On two partners HW I see a significant improvement. On one partners HW I see almost identical performance. Here are the results I get with the original normalised to 100 sec to avoid disclosing any absolute performance figures. Partner A, Original = 100 sec, revised = 100.7 sec Partner B, Original = 100 sec, revised = 97.6 sec Partner C, Original = 100 sec, revised = 91.2 sec One small improvement might be to above using a tmp register which has to be allocated here -instruct clearArray_imm_reg(immL cnt, iRegP base, Universe dummy, rFlagsReg cr) +instruct clearArray_imm_reg(immL cnt, iRegP base, iRegLNoSp tmp, Universe dummy, rFlagsReg cr) - __ zero_words($base$$Register, (u_int64_t)$cnt$$constant); + __ zero_words($base$$Register, (u_int64_t)$cnt$$constant, $tmp$$Register); by using 'lr' as the tmp register here + } else if (UseBlockZeroing && cnt >= (u_int64_t)(BlockZeroingLowLimit >> LogBytesPerWord)) { + mov(tmp, cnt); + zero_words(base, tmp, true); AFAIK, 'lr' is always available as a tmp register in C2 generated code. All the best, Ed. From zoltan.majo at oracle.com Mon Apr 18 09:22:38 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 18 Apr 2016 11:22:38 +0200 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <5711248C.7000503@oracle.com> References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com> <571104A9.7060208@oracle.com> <5711248C.7000503@oracle.com> Message-ID: <5714A75E.8050300@oracle.com> Hi Vladimir, On 04/15/2016 07:27 PM, Vladimir Kozlov wrote: > Looks good to me. thank you for the review! Best regards, Zoltan > > thanks, > Vladimir > > On 4/15/16 8:11 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! >> >> On 04/15/2016 02:46 AM, Vladimir Kozlov wrote: >>> I think check should use !isa_oopptr() since one of nodes could be >>> ConP NULL ptr which is not klassptr. >> >> Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/ >> >> RBT testing passes. I did ~70 runs with the reproducer, no problems >> have shown up so far. I'll do ~900 more runs, though. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/14/16 6:21 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8153357. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153357 >>>> >>>> Problem: When determining the unique input of a phi, the C2 >>>> compiler removes cast nodes connecting the phi to its >>>> unique input. >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 >>>> >>>> >>>> Then (if the phi has indeed a unique input), the C2 compiler >>>> attempts replace the phi with a cast node. The new cast >>>> node feeds from the unique input. >>>> >>>> To be able to remove the phi node, the C2 compiler must to >>>> determine the type of cast to add in place of the phi >>>> node (CastII, CastPP, or CheckCastPP). >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 >>>> >>>> >>>> The failure in the bug report appears because the C2 compiler adds >>>> a cast node of unexpected type to the graph (a >>>> CheckCastPP instead of a CastPP when casting between two klass >>>> pointers). >>>> >>>> Please find more details about the cause of the failure in the bug >>>> description: >>>> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 >>>> >>>> >>>> >>>> >>>> Solution: Refine C2's logic to determine the type of cast node added. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ >>>> >>>> Testing: >>>> - JPRT; >>>> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); >>>> - 500 non-failing runs with the reproducer (the problem reproduces >>>> with < 100 runs). >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> From nils.eliasson at oracle.com Mon Apr 18 09:39:23 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 18 Apr 2016 11:39:23 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5711215D.4060202@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> <5710CF13.5090404@oracle.com> <5711215D.4060202@oracle.com> Message-ID: <5714AB4B.9070200@oracle.com> Thank you Vladimir, I have verified the test executes in JPRT. Regards, Nils On 2016-04-15 19:14, Vladimir Kozlov wrote: > Looks good. Make sure the test is executed in JPRT. > > Thanks, > Vladimir > > On 4/15/16 4:22 AM, Nils Eliasson wrote: >> Hi Tobias, >> >> Thanks for your feedback! >> >> New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 >> >> Regards, >> Nils >> >> On 2016-04-15 13:15, Tobias Hartmann wrote: >>> Hi Nils, >>> >>> On 15.04.2016 11:39, Nils Eliasson wrote: >>>> Thanks Vladimir! >>>> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>>>> I agree with this simple change as the fix. >>>>> Note, -Xcomp does not switch off Interpreter (we can run without >>>>> Interpreter). We use !UseInterpreter as indication >>>>> if Xcomp was used. >>>>> I don't see a PIT link in the bug report. >>>> There was none, Tobias found this regression testing something else. >>>> >>>> Now I have added a regression test: >>>> hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >>>> >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ >>> Please set the test copyright date to 2016. I would maybe also >>> change the test summary to what you wrote in line 30 >>> ("Sanity test flag combo..") because this has nothing to do without >>> support for blocking compiles. >>> >>> Otherwise looks good to me. >>> >>> Best regards, >>> Tobias >>> >>>> Regards, >>>> Nils >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this fix. >>>>>> >>>>>> Summary: >>>>>> In JDK-8150646 I added an assert in compile_method that the >>>>>> compiler must not be NULL. Before there was a return >>>>>> there that just ignored the compile. >>>>>> >>>>>> Running the VM with the flag combination -Xcomp and >>>>>> -XX:TieredStopAtLevel=0 creates a special situation: >>>>>> UseInterpreter is set to false (but the interpreter it is still >>>>>> available) and then some >>>>>> essential methods are forced to be compiled, but the initial >>>>>> complevel becomes 0 and hits the assert in compileBroker. >>>>>> >>>>>> Solution: >>>>>> We could discuss if it should be allowed to submit compiles on >>>>>> level 0, a change that would become a bit larger. >>>>>> This time I choose to extend the _initalized check in >>>>>> compile_method. I didn't add any >>>>>> logging or warning because this is really a corner case. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>>>> (Ignore the extra tags in the webrev) >>>>>> >>>>>> Best regards, >>>>>> Nils Eliasson >> From nils.eliasson at oracle.com Mon Apr 18 10:24:11 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 18 Apr 2016 12:24:11 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> Message-ID: <5714B5CB.70705@oracle.com> Hi, On 2016-04-15 22:43, Christian Thalinger wrote: > >> On Apr 15, 2016, at 3:06 AM, Nils Eliasson > > wrote: >> >> Hi, >> >> On 2016-04-14 20:45, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson >>>> wrote: >>>> >>>> I moved the reasons to CompileTask.hpp and put it together with the >>>> names list. Also changed the type from int to CompileReason as Igor >>>> suggested. >>>> >>>> It gets verbose in the method declarations in compileBroker >>> >>> Don?t worry about this. >>> >>>> and sometimes I think CompileReason should be declared in >>>> CompileBroker because it is mostly used there. On the other hand, >>>> CompileTask is the keeper of the CompileReason so it makes sense too. >>> >>> Yes, that?s the right place. >>> >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>> >>> >>> *+ bool can_become_stale() const {* >>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* >>> *+ }* >>> I?m not a fan of implicit contracts just defined by comments. This >>> method doesn?t seem to be performance critical so I would suggest to >>> use a switch-case. An attribute on the enum would be much better >>> but we all know this isn?t Java. >> >> As you suggested: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 > > Thanks. A space is missing and the closing } indent is wrong: > *+ bool can_become_stale() const {* > *+ switch(_compile_reason) {* > *+ case Reason_BackedgeCount:* > *+ case Reason_InvocationCount:* > *+ case Reason_Tiered:* > *+ return !_is_blocking;* > *+ }* > *+ return false;* > *+ }* > Also, what about: > *+ Reason_None,* > *+ Reason_CTW, // Compile the world* > *+ Reason_Replay, // ciReplay* > These were covered before. Reason_None - is only used for bounds checking together with Reason_Count. Reason_Replay - if these compilations can get stale we can get indeterminism in replay. Reason_CTW - CTW could silently drop compiles -> more indeterminism. Regards, Nils > >> >> Also made reasons CTW and Replay not stale-able. >> >> Thanks! >> Nils >> >>> >>>> >>>> Thanks! >>>> Nils >>>> >>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>> Very nice, I like it. >>>>> >>>>> One note. CompileReason (and its names) should be CompileTask >>>>> class where it is recorded. Then CompileTask::can_become_stale() >>>>> can be in header file so it is inlinined on all platforms. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> New webrev: >>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>> >>>>>> >>>>>> Summary >>>>>> Introduced an enum CompileReason with members matching all the old >>>>>> variants, and a table containing all the unchanged strings. I see the >>>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>>> have choosen not to do so in this change. >>>>>> >>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>> >>>>>> Testing: >>>>>> Running Testset hotspot on all platforms and hotspot_all on one >>>>>> platform >>>>>> >>>>>> Regards, >>>>>> Nils Eliawsson >>>>>> >>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>> Tasks get evicted from the compile_queue if their invocation >>>>>>>> counter >>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>> >>>>>>>> I'll do a proper fix, it is the right thing to do and should be >>>>>>>> pretty >>>>>>>> quick. I'll change the comment to an enum that represent who >>>>>>>> submitted >>>>>>>> the compile, and add a table for the comments. This could be >>>>>>>> useful in >>>>>>>> other settings to. >>>>>>> >>>>>>> Sounds good. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nils >>>>>>>> >>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>> What do you mean "stale"? >>>>>>>>> I would prefer to see the real fix as you suggested to avoid >>>>>>>>> removing >>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>> >>>>>>>>>> Summary: >>>>>>>>>> Add method enqueued for compilation with WB API may be >>>>>>>>>> removed from >>>>>>>>>> the compile queue as stale. >>>>>>>>>> >>>>>>>>>> Solution: >>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>>>> >>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>> task with info about the origin of the compile. The comment >>>>>>>>>> field has >>>>>>>>>> this information - but then it needs to be >>>>>>>>>> converted to an enum. >>>>>>>>>> >>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Nils Eliasson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ENOMIKI at jp.ibm.com Mon Apr 18 10:36:39 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Mon, 18 Apr 2016 10:36:39 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604181036.u3IAarDQ018409@d19av08.sagamino.japan.ibm.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest1.java Type: application/octet-stream Size: 3177 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 32237 bytes Desc: not available URL: From nils.eliasson at oracle.com Mon Apr 18 11:24:00 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 18 Apr 2016 13:24:00 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <57112880.1010204@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> Message-ID: <5714C3D0.2070804@oracle.com> Resizeable is better, but then we assert on expanding the stringbuffer while being under a different ResourceMark. Regards, Nils On 2016-04-15 19:44, Vladimir Kozlov wrote: > Use resizable stream: > > stringStream(size_t initial_bufsize = 256); > > 1024 may not be enough. > > Thanks, > Vladimir > > On 4/15/16 8:10 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this fix of print opto_assembly. >> >> Summary: >> The compilelog can get corrupted and the VM may assert on "failed: >> bad tag in log". >> >> When printing assembly in output.cpp we first take the ttylock, print >> the head and then the method metadata. However the >> metadata printing makes a vm entry and may block for a safepoint and >> will then release the lock >> (break_tty_lock_for_safepoint). After that some of the other compiler >> thread that haven't safepointed will take the lock >> and the broken log will be a fact when the safepoint is over and the >> first thread starts logging again. >> >> Solution: >> Print the method metadata to a temporary buffer, then take the tty lock. >> >> Testing: >> Repro from bug stops failing. >> Running :hotspot_all >> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >> >> Regards, >> Nils Eliasson From aph at redhat.com Mon Apr 18 11:45:10 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Apr 2016 12:45:10 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <1460967051.10749.31.camel@mint> References: <1460967051.10749.31.camel@mint> Message-ID: <5714C8C6.3030302@redhat.com> On 04/18/2016 09:10 AM, Edward Nevill wrote: > I have benchmarked this on 3 different partners HW using the following JMH test case > > http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java This isn't a great test for block zeroing. Nevertheless, I have approved this patch, with a few alterations. A note about using jmh, not just for you but for everyone working on this project. Don't do something like for (int i = 0; i < 10000; i++) theTest(); as an attempt to pad out the execution time. jmh is much better at this sort of thing than you are. Just put the code you're trying to test in the test case. Andrew. From aph at redhat.com Mon Apr 18 12:55:12 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Apr 2016 13:55:12 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: Message-ID: <5714D930.4090804@redhat.com> One other thing. This is rather a lot of code to emit every time an array is created: ;; zero_words { 0x0000007fa880f5f0: cmp x11, #0x20 0x0000007fa880f5f4: b.lt 0x0000007fa880f62c 0x0000007fa880f5f8: neg x8, x10 0x0000007fa880f5fc: and x8, x8, #0x7f 0x0000007fa880f600: cbz x8, 0x0000007fa880f614 0x0000007fa880f604: sub x11, x11, x8, asr #3 0x0000007fa880f608: sub x8, x8, #0x8 0x0000007fa880f60c: str xzr, [x10],#8 0x0000007fa880f610: cbnz x8, 0x0000007fa880f608 0x0000007fa880f614: sub x11, x11, #0x10 0x0000007fa880f618: dc zva, x10 0x0000007fa880f61c: subs x11, x11, #0x10 0x0000007fa880f620: add x10, x10, #0x80 0x0000007fa880f624: b.ge 0x0000007fa880f618 0x0000007fa880f628: add x11, x11, #0x10 0x0000007fa880f62c: and x8, x11, #0x7 I don't think this CBZ does anything useful: 0x0000007fa880f630: cbz x8, 0x0000007fa880f670 (I'm assuming that the 0-7 cases are uniformly distributed.) 0x0000007fa880f634: sub x11, x11, x8 0x0000007fa880f638: add x10, x10, x8, lsl #3 0x0000007fa880f63c: adr x9, 0x0000007fa880f670 0x0000007fa880f640: sub x9, x9, x8, lsl #2 0x0000007fa880f644: br x9 0x0000007fa880f648: add x10, x10, #0x40 0x0000007fa880f64c: sub x11, x11, #0x8 0x0000007fa880f650: stur xzr, [x10,#-64] 0x0000007fa880f654: stur xzr, [x10,#-56] 0x0000007fa880f658: stur xzr, [x10,#-48] 0x0000007fa880f65c: stur xzr, [x10,#-40] 0x0000007fa880f660: stur xzr, [x10,#-32] 0x0000007fa880f664: stur xzr, [x10,#-24] 0x0000007fa880f668: stur xzr, [x10,#-16] 0x0000007fa880f66c: stur xzr, [x10,#-8] 0x0000007fa880f670: cbnz x11, 0x0000007fa880f648 ;; } zero_words We could think about moving the large block case into a stub which is emitted after the main body of the method, or even into a shared stub. A shared stub would require the args to be in fixed registers, though. Andrew. From ENOMIKI at jp.ibm.com Sun Apr 17 18:28:01 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Sun, 17 Apr 2016 18:28:01 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604171828.u3HIS9u0012295@d19av06.sagamino.japan.ibm.com> Dear all, Could you please review the following change? I created two patches for generate_disjoint_long_copy with VMX(Vector Multimedia Extension) and VSX(Vector-Scalar Extension). Let me share our performance results. I changed array copy size with aligned (= src and dst alignments match) and unaligned. It means that I measured performance with the following four patterns at a time. Long array is 8 byte alignment, so these patterns will cover align and unaligned case. System.arraycopy(src, 0, dst, 0, size); System.arraycopy(src, 0, dst, 1, size); System.arraycopy(src, 1, dst, 0, size); System.arraycopy(src, 1, dst, 1, size); VMX(max), VSX(max) are aligned score, while VMX(min),VSX(min) are unaligned score. Scalar is original OpenJDK. VSX got better performance when array size is less than about 2048 byte, but VSX(min) got worse than VMX in large array size. It would be overhead of the alignment in VSX. Server: 8247-22L (POWER8 (3.3GHz 12 cores) x2, 512GB memory), Ubuntu Linux 15.04 ppc64LE (kernel: 3.19.0-18-generic), OpenJDK (build based on 1.9), JVMARGS: ?-Xmx40g ?Xms40g -Xmn20g" Here are benchmark code and patch files. In the VMX, it is implemented for ppc LE only now. (generated with "hg diff -g" under the latest hotspot directory.) Related links: "8154156: PPC64: improve array copy stubs by using vector instructions" https://bugs.openjdk.java.net/browse/JDK-8154156 "PPC64 VSX load/store instructions in stubs" http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-April/002419.html Regards, Miki + + + + + + + Miki ENOKI, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 47807 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest1.java Type: application/octet-stream Size: 3177 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: From ENOMIKI at jp.ibm.com Mon Apr 18 07:47:45 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Mon, 18 Apr 2016 07:47:45 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604180747.u3I7lwYA004525@d19av08.sagamino.japan.ibm.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1460965465853.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: From ENOMIKI at jp.ibm.com Mon Apr 18 10:19:27 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Mon, 18 Apr 2016 10:19:27 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604181019.u3IAJh3E008402@d19av08.sagamino.japan.ibm.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest1.java Type: application/octet-stream Size: 3177 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: From vladimir.kozlov at oracle.com Mon Apr 18 17:30:38 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 10:30:38 -0700 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <5714C3D0.2070804@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> Message-ID: <571519BE.605@oracle.com> tty would have the same problem but it use C_HEAP to allocate: defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) defaultStream(); Please, look if you can do something similar. Thanks, Vladimir On 4/18/16 4:24 AM, Nils Eliasson wrote: > Resizeable is better, but then we assert on expanding the stringbuffer > while being under a different ResourceMark. > > Regards, > Nils > > On 2016-04-15 19:44, Vladimir Kozlov wrote: >> Use resizable stream: >> >> stringStream(size_t initial_bufsize = 256); >> >> 1024 may not be enough. >> >> Thanks, >> Vladimir >> >> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this fix of print opto_assembly. >>> >>> Summary: >>> The compilelog can get corrupted and the VM may assert on "failed: >>> bad tag in log". >>> >>> When printing assembly in output.cpp we first take the ttylock, print >>> the head and then the method metadata. However the >>> metadata printing makes a vm entry and may block for a safepoint and >>> will then release the lock >>> (break_tty_lock_for_safepoint). After that some of the other compiler >>> thread that haven't safepointed will take the lock >>> and the broken log will be a fact when the safepoint is over and the >>> first thread starts logging again. >>> >>> Solution: >>> Print the method metadata to a temporary buffer, then take the tty lock. >>> >>> Testing: >>> Repro from bug stops failing. >>> Running :hotspot_all >>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>> >>> Regards, >>> Nils Eliasson > From vivek.r.deshpande at intel.com Mon Apr 18 17:38:14 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 18 Apr 2016 17:38:14 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Apr 18 18:25:52 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 08:25:52 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Looks good. > On Apr 15, 2016, at 6:20 PM, Berg, Michael C wrote: > > Vladimir/Christian: > > I believe I have addressed all concerns in this update: > > Webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ > > Regards, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C > Sent: Friday, April 15, 2016 2:04 AM > To: 'Vladimir Kozlov' > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: RE: CR for RFR 8153998 > > Vladimir, the code has been updated and is available at: > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ > > Thanks, > Michael > > -----Original Message----- > From: Berg, Michael C > Sent: Thursday, April 14, 2016 5:54 PM > To: Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: CR for RFR 8153998 > > Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. > It will be clean when next you see the code. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:52 PM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 5:12 PM, Berg, Michael C wrote: >> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. > > How it is sizeless when it generates kmovwl() instruction? > Do you mean it does not have side effects (no flags modified)? > > Vladimir > >> >> Ok, I will try the pattern match method. >> >> Thanks >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:02 PM >> To: Berg, Michael C ; Christian Thalinger >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>> Vladimir, >>> >>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>> >>> I tried something like that early on with CountedLoopEnd. >> >> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >> I don't see any side effects for restoremask in your code. What are you talking about? >> >> I am suggesting something like next: >> >> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >> predicate(n->has_vect_mask_set()); >> match(CountedLoopEnd cop cr); >> effect(USE labl); >> >> ins_cost(400); >> format %{ "j$cop $labl\t# loop end\n\t" >> "restoremask \t# vector mask restore for loops" >> %} >> ins_encode %{ >> Label* L = $labl$$label; >> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >> __ restoremask(); >> %} >> ins_pipe(pipe_jcc); >> %} >> >> Vladimir >> >>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 4:27 PM >>> To: Christian Thalinger ; Berg, >>> Michael C >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>> >>>>> Christian, >>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>> >>>> That?s unfortunate but I understand. I?m fine with it then. >>> >>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>> >>> Vladimir >>> >>>> >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>> > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>> See below for context. >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>> *To:*Berg, Michael C > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>> This code was tested as follows(see jbs entry below): >>>>> >>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>> >>>>> webrev: >>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>> >>>>> >>>>> +//------------------------------MachMskNode----------------------- >>>>> +- >>>>> +- >>>>> ---------- >>>>> >>>>> +// Machine function Msk Node >>>>> >>>>> +class MachMskNode : public MachIdealNode { >>>>> >>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>> Ok, that?s easy enough. >>>>> Also, I don?t quite understand why we have: >>>>> >>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>> >>>>> + predicate(VM_Version::supports_avx512vl()); >>>>> >>>>> + match(Set dst (MaskCreateI src)); >>>>> >>>>> + effect(TEMP dst); >>>>> >>>>> + format %{ "createmsk $dst, $src" %} >>>>> >>>>> + ins_encode %{ >>>>> >>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>> >>>>> + %} >>>>> >>>>> but: >>>>> >>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>> const { >>>>> >>>>> + MacroAssembler _masm(&cbuf); >>>>> >>>>> + __ restoremsk(); >>>>> >>>>> + } >>>>> >>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>> >>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>> >>>>> Thanks, >>>>> Michael >>>> From vladimir.kozlov at oracle.com Mon Apr 18 18:32:53 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 11:32:53 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: <57152855.2090106@oracle.com> Testing looks good. I will push it after few other pushes currently in a queue. Thanks, Vladimir On 4/18/16 11:25 AM, Christian Thalinger wrote: > Looks good. > >> On Apr 15, 2016, at 6:20 PM, Berg, Michael C wrote: >> >> Vladimir/Christian: >> >> I believe I have addressed all concerns in this update: >> >> Webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ >> >> Regards, >> Michael >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C >> Sent: Friday, April 15, 2016 2:04 AM >> To: 'Vladimir Kozlov' >> Cc: 'hotspot-compiler-dev at openjdk.java.net' >> Subject: RE: CR for RFR 8153998 >> >> Vladimir, the code has been updated and is available at: >> >> webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Berg, Michael C >> Sent: Thursday, April 14, 2016 5:54 PM >> To: Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: RE: CR for RFR 8153998 >> >> Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. >> It will be clean when next you see the code. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:52 PM >> To: Berg, Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 5:12 PM, Berg, Michael C wrote: >>> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. >> >> How it is sizeless when it generates kmovwl() instruction? >> Do you mean it does not have side effects (no flags modified)? >> >> Vladimir >> >>> >>> Ok, I will try the pattern match method. >>> >>> Thanks >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 5:02 PM >>> To: Berg, Michael C ; Christian Thalinger >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>>> Vladimir, >>>> >>>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>>> >>>> I tried something like that early on with CountedLoopEnd. >>> >>> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >>> I don't see any side effects for restoremask in your code. What are you talking about? >>> >>> I am suggesting something like next: >>> >>> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >>> predicate(n->has_vect_mask_set()); >>> match(CountedLoopEnd cop cr); >>> effect(USE labl); >>> >>> ins_cost(400); >>> format %{ "j$cop $labl\t# loop end\n\t" >>> "restoremask \t# vector mask restore for loops" >>> %} >>> ins_encode %{ >>> Label* L = $labl$$label; >>> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >>> __ restoremask(); >>> %} >>> ins_pipe(pipe_jcc); >>> %} >>> >>> Vladimir >>> >>>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>>> >>>> -Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Thursday, April 14, 2016 4:27 PM >>>> To: Christian Thalinger ; Berg, >>>> Michael C >>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: CR for RFR 8153998 >>>> >>>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>>> >>>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>>> >>>>>> Christian, >>>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>>> >>>>> That?s unfortunate but I understand. I?m fine with it then. >>>> >>>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>>> >>>> Vladimir >>>> >>>>> >>>>>> Regards, >>>>>> Michael >>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>>> > >>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>>> >>>>>> *Subject:*Re: CR for RFR 8153998 >>>>>> >>>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>>> See below for context. >>>>>> Regards, >>>>>> Michael >>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>>> *To:*Berg, Michael C > >>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>>> *Subject:*Re: CR for RFR 8153998 >>>>>> >>>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>>> This code was tested as follows(see jbs entry below): >>>>>> >>>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>>> >>>>>> webrev: >>>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>>> >>>>>> >>>>>> +//------------------------------MachMskNode----------------------- >>>>>> +- >>>>>> +- >>>>>> ---------- >>>>>> >>>>>> +// Machine function Msk Node >>>>>> >>>>>> +class MachMskNode : public MachIdealNode { >>>>>> >>>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>>> Ok, that?s easy enough. >>>>>> Also, I don?t quite understand why we have: >>>>>> >>>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>>> >>>>>> + predicate(VM_Version::supports_avx512vl()); >>>>>> >>>>>> + match(Set dst (MaskCreateI src)); >>>>>> >>>>>> + effect(TEMP dst); >>>>>> >>>>>> + format %{ "createmsk $dst, $src" %} >>>>>> >>>>>> + ins_encode %{ >>>>>> >>>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>>> >>>>>> + %} >>>>>> >>>>>> but: >>>>>> >>>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>>> const { >>>>>> >>>>>> + MacroAssembler _masm(&cbuf); >>>>>> >>>>>> + __ restoremsk(); >>>>>> >>>>>> + } >>>>>> >>>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>>> >>>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>>> >>>>>> Thanks, >>>>>> Michael >>>>> > From christian.thalinger at oracle.com Mon Apr 18 18:34:40 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 08:34:40 -1000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> > On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R wrote: > > Hi all > > I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. > This uses -XX:DisableIntrinsic option to achieve the same. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ src/cpu/x86/vm/stubGenerator_x86_64.cpp + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { StubRoutines::_dpow = generate_libmPow(); - StubRoutines::_dtan = generate_libmTan(); + } Was removing libmTan on purpose? > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Mon Apr 18 18:38:52 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 18 Apr 2016 18:38:52 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> Hi Christian I have added this. Just moved generate_libmTan() after sin and cos generation. if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { StubRoutines::_dtan = generate_libmTan(); } Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, April 18, 2016 11:35 AM To: Deshpande, Vivek R Cc: hotspot compiler ; Vladimir Kozlov ; Viswanathan, Sandhya Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ src/cpu/x86/vm/stubGenerator_x86_64.cpp + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { StubRoutines::_dpow = generate_libmPow(); - StubRoutines::_dtan = generate_libmTan(); + } Was removing libmTan on purpose? Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Apr 18 19:15:06 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 09:15:06 -1000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> Message-ID: > On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R wrote: > > Hi Christian > > I have added this. Just moved generate_libmTan() after sin and cos generation. > if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { > StubRoutines::_dtan = generate_libmTan(); > } > Sorry, I missed this. I should have used the browser?s search instead of eyeballing it. src/cpu/x86/vm/macroAssembler_x86.cpp fp_runtime_fallback is unused now: cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/ hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp 5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) { 5828: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use); 5833: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use); 5838: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use); hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp 995: void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use); src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp - __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); + mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)); I understand what it?s doing but we are calling the same methods as before. What has changed? > Regards, <> > Vivek > > <>From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Monday, April 18, 2016 11:35 AM > To: Deshpande, Vivek R > Cc: hotspot compiler ; Vladimir Kozlov ; Viswanathan, Sandhya > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > > On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: > > Hi all > > I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. > This uses -XX:DisableIntrinsic option to achieve the same. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > src/cpu/x86/vm/stubGenerator_x86_64.cpp > > + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { > StubRoutines::_dpow = generate_libmPow(); > - StubRoutines::_dtan = generate_libmTan(); > + } > Was removing libmTan on purpose? > > > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Mon Apr 18 20:28:08 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 18 Apr 2016 20:28:08 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com> Hi Christian Just calling SharedRuntime function kills the address in memory where there is jump (shown below) after the routine finishes and also need to make sure stack pointer is 16 byte aligned. So calling mathfunc() to take care of that instead of fp_runtime_fallback() which has extra overhead of storing/ restoring all the registers and xmm registers. 444 __ pop(rax); 445 __ mov(rsp, r13); 446 __ jmp(rax); Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, April 18, 2016 12:15 PM To: Deshpande, Vivek R Cc: Vladimir Kozlov; hotspot compiler Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R > wrote: Hi Christian I have added this. Just moved generate_libmTan() after sin and cos generation. if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { StubRoutines::_dtan = generate_libmTan(); } Sorry, I missed this. I should have used the browser?s search instead of eyeballing it. src/cpu/x86/vm/macroAssembler_x86.cpp fp_runtime_fallback is unused now: cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/ hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp 5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) { 5828: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use); 5833: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use); 5838: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use); hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp 995: void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use); src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp - __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); + mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)); I understand what it?s doing but we are calling the same methods as before. What has changed? Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, April 18, 2016 11:35 AM To: Deshpande, Vivek R > Cc: hotspot compiler >; Vladimir Kozlov >; Viswanathan, Sandhya > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ src/cpu/x86/vm/stubGenerator_x86_64.cpp + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { StubRoutines::_dpow = generate_libmPow(); - StubRoutines::_dtan = generate_libmTan(); + } Was removing libmTan on purpose? Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.civlin at intel.com Mon Apr 18 21:41:17 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Mon, 18 Apr 2016 21:41:17 +0000 Subject: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Message-ID: <39F83597C33E5F408096702907E6C4500F16BF0A@ORSMSX104.amr.corp.intel.com> We would like to contribute the SHA256 AVX2 intrinsic. This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. Contributor: Jan Civlin. bug: https://bugs.openjdk.java.net/browse/JDK-8154495 webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ From jan.civlin at intel.com Mon Apr 18 21:44:26 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Mon, 18 Apr 2016 21:44:26 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Message-ID: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> == Correction in the subject line === We would like to contribute the SHA256 AVX2 intrinsic. This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. Contributor: Jan Civlin. bug: https://bugs.openjdk.java.net/browse/JDK-8154495 webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ From vladimir.kozlov at oracle.com Tue Apr 19 00:09:10 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 17:09:10 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> Message-ID: <57157726.4030701@oracle.com> Hi Jan, The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. Please, move new code in macroAssembler_x86_sha.cpp to the end of file. _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: StubRoutines::x86::_k256_W_adr = generate_k256_W(); What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. Thanks, Vladimir On 4/18/16 2:44 PM, Civlin, Jan wrote: > == Correction in the subject line === > > We would like to contribute the SHA256 AVX2 intrinsic. > > This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. > > The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. > > Contributor: Jan Civlin. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8154495 > webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ > From christian.thalinger at oracle.com Tue Apr 19 04:33:13 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 18:33:13 -1000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com> Message-ID: > On Apr 18, 2016, at 10:28 AM, Deshpande, Vivek R wrote: > > Hi Christian > > Just calling SharedRuntime function kills the address in memory where there is jump (shown below) after the routine finishes and also need to make sure stack pointer is 16 byte aligned. So calling mathfunc() to take care of that instead of fp_runtime_fallback() which has extra overhead of storing/ restoring all the registers and xmm registers. > > 444 __ pop(rax); > 445 __ mov(rsp, r13); > 446 __ jmp(rax); > Ok, that makes sense. > Regards, > Vivek > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Monday, April 18, 2016 12:15 PM > To: Deshpande, Vivek R > Cc: Vladimir Kozlov; hotspot compiler > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > > On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R > wrote: > > Hi Christian > > I have added this. Just moved generate_libmTan() after sin and cos generation. > if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { > StubRoutines::_dtan = generate_libmTan(); > } > > > Sorry, I missed this. I should have used the browser?s search instead of eyeballing it. > src/cpu/x86/vm/macroAssembler_x86.cpp > > fp_runtime_fallback is unused now: > > cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/ > hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp > 5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) { > 5828: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use); > 5833: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use); > 5838: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use); > > hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp > 995: void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use); > > > src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp > > - __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); > + mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)); > I understand what it?s doing but we are calling the same methods as before. What has changed? > > > Regards, <> > Vivek > > <>From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Monday, April 18, 2016 11:35 AM > To: Deshpande, Vivek R > > Cc: hotspot compiler >; Vladimir Kozlov >; Viswanathan, Sandhya > > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > > On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: > > Hi all > > I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. > This uses -XX:DisableIntrinsic option to achieve the same. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > src/cpu/x86/vm/stubGenerator_x86_64.cpp > > + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { > StubRoutines::_dpow = generate_libmPow(); > - StubRoutines::_dtan = generate_libmTan(); > + } > Was removing libmTan on purpose? > > > > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Tue Apr 19 08:46:45 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 19 Apr 2016 10:46:45 +0200 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: Hi Vivek, you introduce the new method TemplateInterpreterGenerator::mathfunc() but only implement it on x86_64. Shouldn't we have at least empty implementations of this method for all architectures? Also the description in the bug sounds quite general but you only seem to implement it for certain math-intrinsics on x64. Another minor nit: in vmSymbols.hpp I don't think we need the const qualifier on the ID argument because it is only an enum anyway: + static bool is_disabled_by_flags(const vmIntrinsics::ID id); It makes sense on: static bool is_disabled_by_flags(const methodHandle& method); because here we are passing method by reference and the const qualifier guaranties that is_disabled_by_flags will not change the method. Regards, Volker On Mon, Apr 18, 2016 at 7:38 PM, Deshpande, Vivek R wrote: > Hi all > > > > I would like to contribute a patch which helps to control the intrinsics in > interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > > > Thanks and regards, > > Vivek > > From rwestrel at redhat.com Tue Apr 19 11:44:35 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 19 Apr 2016 13:44:35 +0200 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted Message-ID: <57161A23.3050807@redhat.com> (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << -shift) with src an int have some support in the aarch64.ad ad file: rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken and never match any ideal graph subtree. http://cr.openjdk.java.net/~roland/8154537/webrev.00/ Roland. From tobias.hartmann at oracle.com Tue Apr 19 12:35:43 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 19 Apr 2016 14:35:43 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC Message-ID: <5716261F.1070205@oracle.com> Hi, please review the following enhancement: https://bugs.openjdk.java.net/browse/JDK-6941938 MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. I evaluated the following three versions of the patch. -- Basic -- http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png Version "small" tries to improve this. -- Prefetching -- http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. -- Small -- http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). The numbers can be found here: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. What do you think? Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. Thanks, Tobias [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip [3] Microbenchmark results for the "basic" implementation http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png [4] Microbenchmark results for the "prefetching" implementation http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png From aph at redhat.com Tue Apr 19 12:52:50 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 19 Apr 2016 13:52:50 +0100 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57161A23.3050807@redhat.com> References: <57161A23.3050807@redhat.com> Message-ID: <57162A22.2050706@redhat.com> On 04/19/2016 12:44 PM, Roland Westrelin wrote: > (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << > -shift) with src an int have some support in the aarch64.ad ad file: > rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken > and never match any ideal graph subtree. > > http://cr.openjdk.java.net/~roland/8154537/webrev.00/ OK, thanks. We'll need backports for http://hg.openjdk.java.net/aarch64-port/jdk8u/ and http://hg.openjdk.java.net/aarch64-port/jdk7u/ These should just apply cleanly. Andrew. From nils.eliasson at oracle.com Tue Apr 19 12:54:32 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 19 Apr 2016 14:54:32 +0200 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: <57162A88.7030608@oracle.com> Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: > > Hi all > > I would like to contribute a patch which helps to control the > intrinsics in interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > Thanks and regards, > > Vivek > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adinn at redhat.com Tue Apr 19 12:55:19 2016 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 19 Apr 2016 13:55:19 +0100 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57162A22.2050706@redhat.com> References: <57161A23.3050807@redhat.com> <57162A22.2050706@redhat.com> Message-ID: <57162AB7.6010602@redhat.com> On 19/04/16 13:52, Andrew Haley wrote: > On 04/19/2016 12:44 PM, Roland Westrelin wrote: >> (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << >> -shift) with src an int have some support in the aarch64.ad ad file: >> rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken >> and never match any ideal graph subtree. >> >> http://cr.openjdk.java.net/~roland/8154537/webrev.00/ > > OK, thanks. We'll need backports for > http://hg.openjdk.java.net/aarch64-port/jdk8u/ and > http://hg.openjdk.java.net/aarch64-port/jdk7u/ Patch also looks good to me. regards, Andrew Dinn ----------- From aph at redhat.com Tue Apr 19 13:19:31 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 19 Apr 2016 14:19:31 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: <5714D930.4090804@redhat.com> Message-ID: <57163063.3020506@redhat.com> On 04/19/2016 01:54 PM, Long Chen wrote: > Thanks for all these nice comments. Here is a revised version: > > http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch > > > Changes: > > 1. Are DC and IC really synonyms? > > DC and IC assembling was supposed to be distinguished by different > cache_maintenance parameters. I create two enums ?icache_maintanence? and > ?dcache_maintanence? in the revised patch, to make it look better. > > + enum icache_maintenance {IVAU = 0b0101}; > + enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110, > ZVA = 0b100}; > + void dc(dcache_maintenance cm, Register Rt) { > + sys(0b011, 0b0111, cm, 0b001, Rt); > + } > + > + void ic(icache_maintenance cm, Register Rt) { > + sys(0b011, 0b0111, cm, 0b001, Rt); > } That looks better, yes. > 5. To avoid scratching a new register, I write a small piece of code > after the dc zva loop in block_zero, so that block_zero doesn?t need to > fall through to fill_words to zero the small part of array. This code might > not perform as good as fill_words (unrolled), but it requires one less > register, and the code size becomes smaller as well. > The final code is like this: > > 0x0000007f7d3dd4fc: cmp x11, #0x20 > 0x0000007f7d3dd500: b.lt 0x0000007f7d3dd538 > 0x0000007f7d3dd504: neg x8, x10 > 0x0000007f7d3dd508: and x8, x8, #0x3f > 0x0000007f7d3dd50c: cbz x8, 0x0000007f7d3dd520 > 0x0000007f7d3dd510: sub x11, x11, x8, asr #3 > 0x0000007f7d3dd514: sub x8, x8, #0x8 > 0x0000007f7d3dd518: str xzr, [x10],#8 > 0x0000007f7d3dd51c: cbnz x8, 0x0000007f7d3dd514 > 0x0000007f7d3dd520: sub x11, x11, #0x8 > 0x0000007f7d3dd524: dc zva, x10 > 0x0000007f7d3dd528: subs x11, x11, #0x8 > 0x0000007f7d3dd52c: add x10, x10, #0x40 > 0x0000007f7d3dd530: b.ge 0x0000007f7d3dd524 > 0x0000007f7d3dd534: add x11, x11, #0x8 > 0x0000007f7d3dd538: tbz w11, #0, 0x0000007f7d3dd544 > 0x0000007f7d3dd53c: str xzr, [x10],#8 > 0x0000007f7d3dd540: sub x11, x11, #0x1 > 0x0000007f7d3dd544: cbz x11, 0x0000007f7d3dd554 > 0x0000007f7d3dd548: sub x11, x11, #0x2 > 0x0000007f7d3dd54c: stp xzr, xzr, [x10],#16 > 0x0000007f7d3dd550: cbnz x11, 0x0000007f7d3dd548 > > Would this be fine? It might well be. I'd like Ed to do a few measurements of large and small block zeroing. My guess is that a reasonably small unrolled loop doing STP ZR, ZR will work better than anything else, but we'll see. Thanks, Andrew. From vladimir.kozlov at oracle.com Tue Apr 19 16:06:36 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Apr 2016 09:06:36 -0700 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <5716261F.1070205@oracle.com> References: <5716261F.1070205@oracle.com> Message-ID: <5716578C.5080902@oracle.com> Very good. Go with basic. We can do SPU special improvements later if needed. "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." We do have arraycopy code for it but by default we don't use it: product(uintx, ArraycopySrcPrefetchDistance, 0, product(uintx, ArraycopyDstPrefetchDistance, 0, Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. Thanks, Vladimir On 4/19/16 5:35 AM, Tobias Hartmann wrote: > Hi, > > please review the following enhancement: > https://bugs.openjdk.java.net/browse/JDK-6941938 > > MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). > > I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. > > We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). > > Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. > > I evaluated the following three versions of the patch. > > -- Basic -- > http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ > The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png > > I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. > > There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png > Version "small" tries to improve this. > > -- Prefetching -- > http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ > This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png > > However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. > > -- Small -- > http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ > This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). > > The numbers can be found here: > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx > > I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. > > What do you think? > > Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java > [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip > [3] Microbenchmark results for the "basic" implementation > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png > [4] Microbenchmark results for the "prefetching" implementation > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png > From vladimir.kozlov at oracle.com Tue Apr 19 15:55:19 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Apr 2016 08:55:19 -0700 (PDT) Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57161A23.3050807@redhat.com> References: <57161A23.3050807@redhat.com> Message-ID: <571654E7.5020404@oracle.com> Looks good. thanks, Vladimir On 4/19/16 4:44 AM, Roland Westrelin wrote: > (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << > -shift) with src an int have some support in the aarch64.ad ad file: > rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken > and never match any ideal graph subtree. > > http://cr.openjdk.java.net/~roland/8154537/webrev.00/ > > Roland. > From vivek.r.deshpande at intel.com Tue Apr 19 17:27:42 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 19 Apr 2016 17:27:42 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8118A@ORSMSX106.amr.corp.intel.com> Hi Volkar Thanks for your review and comments. I will surely take care of these things you mentioned. I am using mathfunc() to methods which call SharedRuntime::d(exp, pow, sin, cos, tan, log, log10) as an alternate when DisableIntrinsic is used to not use LIBM intrinsics. Thanks and regards, Vivek -----Original Message----- From: Volker Simonis [mailto:volker.simonis at gmail.com] Sent: Tuesday, April 19, 2016 1:47 AM To: Deshpande, Vivek R Cc: hotspot compiler; Vladimir Kozlov; Christian Thalinger Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, you introduce the new method TemplateInterpreterGenerator::mathfunc() but only implement it on x86_64. Shouldn't we have at least empty implementations of this method for all architectures? Also the description in the bug sounds quite general but you only seem to implement it for certain math-intrinsics on x64. Another minor nit: in vmSymbols.hpp I don't think we need the const qualifier on the ID argument because it is only an enum anyway: + static bool is_disabled_by_flags(const vmIntrinsics::ID id); It makes sense on: static bool is_disabled_by_flags(const methodHandle& method); because here we are passing method by reference and the const qualifier guaranties that is_disabled_by_flags will not change the method. Regards, Volker On Mon, Apr 18, 2016 at 7:38 PM, Deshpande, Vivek R wrote: > Hi all > > > > I would like to contribute a patch which helps to control the > intrinsics in interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webr > ev.00/ > > > > Thanks and regards, > > Vivek > > From nils.eliasson at oracle.com Tue Apr 19 17:13:12 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 19 Apr 2016 10:13:12 -0700 (PDT) Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5714B5CB.70705@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> <5714B5CB.70705@oracle.com> Message-ID: <57166728.4060906@oracle.com> On 2016-04-18 12:24, Nils Eliasson wrote: > Hi, > > On 2016-04-15 22:43, Christian Thalinger wrote: >> >>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson >>> wrote: >>> >>> Hi, >>> >>> On 2016-04-14 20:45, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson >>>>> wrote: >>>>> >>>>> I moved the reasons to CompileTask.hpp and put it together with >>>>> the names list. Also changed the type from int to CompileReason as >>>>> Igor suggested. >>>>> >>>>> It gets verbose in the method declarations in compileBroker >>>> >>>> Don?t worry about this. >>>> >>>>> and sometimes I think CompileReason should be declared in >>>>> CompileBroker because it is mostly used there. On the other hand, >>>>> CompileTask is the keeper of the CompileReason so it makes sense too. >>>> >>>> Yes, that?s the right place. >>>> >>>>> >>>>> New webrev: >>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>>> >>>> >>>> *+ bool can_become_stale() const {* >>>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* >>>> *+ }* >>>> I?m not a fan of implicit contracts just defined by comments. This >>>> method doesn?t seem to be performance critical so I would suggest >>>> to use a switch-case. An attribute on the enum would be much >>>> better but we all know this isn?t Java. >>> >>> As you suggested: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 >> >> Thanks. A space is missing and the closing } indent is wrong: >> *+ bool can_become_stale() const {* >> *+ switch(_compile_reason) {* >> *+ case Reason_BackedgeCount:* >> *+ case Reason_InvocationCount:* >> *+ case Reason_Tiered:* >> *+ return !_is_blocking;* >> *+ }* >> *+ return false;* >> *+ }* And I fixed the indentation. Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ Thanks! Nils >> Also, what about: >> *+ Reason_None,* >> *+ Reason_CTW, // Compile the world* >> *+ Reason_Replay, // ciReplay* >> These were covered before. > Reason_None - is only used for bounds checking together with Reason_Count. > Reason_Replay - if these compilations can get stale we can get > indeterminism in replay. > Reason_CTW - CTW could silently drop compiles -> more indeterminism. > > Regards, > Nils > >> >>> >>> Also made reasons CTW and Replay not stale-able. >>> >>> Thanks! >>> Nils >>> >>>> >>>>> >>>>> Thanks! >>>>> Nils >>>>> >>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>>> Very nice, I like it. >>>>>> >>>>>> One note. CompileReason (and its names) should be CompileTask >>>>>> class where it is recorded. Then CompileTask::can_become_stale() >>>>>> can be in header file so it is inlinined on all platforms. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> New webrev: >>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>>> >>>>>>> >>>>>>> Summary >>>>>>> Introduced an enum CompileReason with members matching all the old >>>>>>> variants, and a table containing all the unchanged strings. I >>>>>>> see the >>>>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>>>> have choosen not to do so in this change. >>>>>>> >>>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>>> >>>>>>> Testing: >>>>>>> Running Testset hotspot on all platforms and hotspot_all on one >>>>>>> platform >>>>>>> >>>>>>> Regards, >>>>>>> Nils Eliawsson >>>>>>> >>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>>> Tasks get evicted from the compile_queue if their invocation >>>>>>>>> counter >>>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>>> >>>>>>>>> I'll do a proper fix, it is the right thing to do and should >>>>>>>>> be pretty >>>>>>>>> quick. I'll change the comment to an enum that represent who >>>>>>>>> submitted >>>>>>>>> the compile, and add a table for the comments. This could be >>>>>>>>> useful in >>>>>>>>> other settings to. >>>>>>>> >>>>>>>> Sounds good. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nils >>>>>>>>> >>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>>> What do you mean "stale"? >>>>>>>>>> I would prefer to see the real fix as you suggested to avoid >>>>>>>>>> removing >>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>>> >>>>>>>>>>> Summary: >>>>>>>>>>> Add method enqueued for compilation with WB API may be >>>>>>>>>>> removed from >>>>>>>>>>> the compile queue as stale. >>>>>>>>>>> >>>>>>>>>>> Solution: >>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>>> checks that may spare us from waiting until timeout for >>>>>>>>>>> failing.) >>>>>>>>>>> >>>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>>> task with info about the origin of the compile. The comment >>>>>>>>>>> field has >>>>>>>>>>> this information - but then it needs to be >>>>>>>>>>> converted to an enum. >>>>>>>>>>> >>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Nils Eliasson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Apr 19 17:37:09 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Apr 2016 07:37:09 -1000 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <57166728.4060906@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> <5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com> Message-ID: <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com> > On Apr 19, 2016, at 7:13 AM, Nils Eliasson wrote: > > > > On 2016-04-18 12:24, Nils Eliasson wrote: >> Hi, >> >> On 2016-04-15 22:43, Christian Thalinger wrote: >>> >>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson > wrote: >>>> >>>> Hi, >>>> >>>> On 2016-04-14 20:45, Christian Thalinger wrote: >>>>> >>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson > wrote: >>>>>> >>>>>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. >>>>>> >>>>>> It gets verbose in the method declarations in compileBroker >>>>> >>>>> Don?t worry about this. >>>>> >>>>>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. >>>>> >>>>> Yes, that?s the right place. >>>>> >>>>>> >>>>>> New webrev: >>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>>> >>>>> + bool can_become_stale() const { >>>>> + return !_is_blocking && (_compile_reason < Reason_Whitebox); >>>>> + } >>>>> I?m not a fan of implicit contracts just defined by comments. This method doesn?t seem to be performance critical so I would suggest to use a switch-case. An attribute on the enum would be much better but we all know this isn?t Java. >>>> >>>> As you suggested: >>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 >>> >>> Thanks. A space is missing and the closing } indent is wrong: >>> + bool can_become_stale() const { >>> + switch(_compile_reason) { >>> + case Reason_BackedgeCount: >>> + case Reason_InvocationCount: >>> + case Reason_Tiered: >>> + return !_is_blocking; >>> + } >>> + return false; >>> + } > And I fixed the indentation. > > Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ + switch(_compile_reason) { Space after switch. > > Thanks! > Nils >>> Also, what about: >>> + Reason_None, >>> + Reason_CTW, // Compile the world >>> + Reason_Replay, // ciReplay >>> These were covered before. >> Reason_None - is only used for bounds checking together with Reason_Count. >> Reason_Replay - if these compilations can get stale we can get indeterminism in replay. >> Reason_CTW - CTW could silently drop compiles -> more indeterminism. >> >> Regards, >> Nils >> >>> >>>> >>>> Also made reasons CTW and Replay not stale-able. >>>> >>>> Thanks! >>>> Nils >>>> >>>>> >>>>>> >>>>>> Thanks! >>>>>> Nils >>>>>> >>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>>>> Very nice, I like it. >>>>>>> >>>>>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> New webrev: >>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>>>> >>>>>>>> Summary >>>>>>>> Introduced an enum CompileReason with members matching all the old >>>>>>>> variants, and a table containing all the unchanged strings. I see the >>>>>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>>>>> have choosen not to do so in this change. >>>>>>>> >>>>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>>>> >>>>>>>> Testing: >>>>>>>> Running Testset hotspot on all platforms and hotspot_all on one platform >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nils Eliawsson >>>>>>>> >>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>>>> >>>>>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>>>>>>> quick. I'll change the comment to an enum that represent who submitted >>>>>>>>>> the compile, and add a table for the comments. This could be useful in >>>>>>>>>> other settings to. >>>>>>>>> >>>>>>>>> Sounds good. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nils >>>>>>>>>> >>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>>>> What do you mean "stale"? >>>>>>>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>>>> >>>>>>>>>>>> Summary: >>>>>>>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>>>>>>> the compile queue as stale. >>>>>>>>>>>> >>>>>>>>>>>> Solution: >>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>>>>>> >>>>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>>>> task with info about the origin of the compile. The comment field has >>>>>>>>>>>> this information - but then it needs to be >>>>>>>>>>>> converted to an enum. >>>>>>>>>>>> >>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Nils Eliasson >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Tue Apr 19 17:13:05 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 19 Apr 2016 10:13:05 -0700 (PDT) Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <5716578C.5080902@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> Message-ID: <57166721.5010208@oracle.com> Thanks, Vladimir! On 19.04.2016 18:06, Vladimir Kozlov wrote: > Very good. Go with basic. We can do SPU special improvements later if needed. Okay, I'll push the basic version. For reference, here are the results on a SPARC T4: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png > "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." > We do have arraycopy code for it but by default we don't use it: > product(uintx, ArraycopySrcPrefetchDistance, 0, > product(uintx, ArraycopyDstPrefetchDistance, 0, > > Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: java -XX:ArraycopySrcPrefetchDistance=42 -version ArraycopySrcPrefetchDistance (42) must be 0 Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit Thanks, Tobias > > Thanks, > Vladimir > > On 4/19/16 5:35 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following enhancement: >> https://bugs.openjdk.java.net/browse/JDK-6941938 >> >> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >> >> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >> >> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >> >> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >> >> I evaluated the following three versions of the patch. >> >> -- Basic -- >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >> >> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >> >> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >> Version "small" tries to improve this. >> >> -- Prefetching -- >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >> >> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >> >> -- Small -- >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >> >> The numbers can be found here: >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >> >> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >> >> What do you think? >> >> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >> [3] Microbenchmark results for the "basic" implementation >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >> [4] Microbenchmark results for the "prefetching" implementation >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >> From rwestrel at redhat.com Tue Apr 19 18:54:04 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 19 Apr 2016 20:54:04 +0200 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <571654E7.5020404@oracle.com> References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com> Message-ID: <57167ECC.8080304@redhat.com> Thanks everyone for the review. I need a sponsor for that one given it touches a shared code test case. Roland. From vladimir.kozlov at oracle.com Tue Apr 19 22:12:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Apr 2016 15:12:21 -0700 (PDT) Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57167ECC.8080304@redhat.com> References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com> <57167ECC.8080304@redhat.com> Message-ID: <5716AD45.1060302@oracle.com> In JPRT. Vladimir On 4/19/16 11:54 AM, Roland Westrelin wrote: > Thanks everyone for the review. I need a sponsor for that one given it > touches a shared code test case. > > Roland. > From vivek.r.deshpande at intel.com Wed Apr 20 00:44:43 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 20 Apr 2016 00:44:43 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <57162A88.7030608@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> HI Nils Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static. Could you point me the right location for the function ? Also I have updated the webrev with rest of the comments here: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Regards, Vivek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson Sent: Tuesday, April 19, 2016 5:55 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Apr 20 01:46:21 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Apr 2016 18:46:21 -0700 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <57166721.5010208@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> Message-ID: On Apr 19, 2016, at 10:13 AM, Tobias Hartmann wrote: > > Okay, I'll push the basic version. > So I started looking at your code and my inner SPARC junkie took over. This is what happened: http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ Perhaps there are some ideas that might be helpful: - The rampdown logic can lose a couple of instructions by using xorcc and movr. - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? On the other hand, what you wrote is nice and simple. HTH ? John P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more versions of misalignment, still with vectorization, as with the arraycopy stubs. But that's neither nice nor simple. > On Apr 19, 2016, at 10:13 AM, Tobias Hartmann wrote: > > Thanks, Vladimir! > > On 19.04.2016 18:06, Vladimir Kozlov wrote: >> Very good. Go with basic. We can do SPU special improvements later if needed. > > Okay, I'll push the basic version. > > For reference, here are the results on a SPARC T4: > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png > >> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >> We do have arraycopy code for it but by default we don't use it: >> product(uintx, ArraycopySrcPrefetchDistance, 0, >> product(uintx, ArraycopyDstPrefetchDistance, 0, >> >> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. > > Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: > > java -XX:ArraycopySrcPrefetchDistance=42 -version > ArraycopySrcPrefetchDistance (42) must be 0 > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit > > Thanks, > Tobias > >> >> Thanks, >> Vladimir >> >> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following enhancement: >>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>> >>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>> >>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>> >>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>> >>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>> >>> I evaluated the following three versions of the patch. >>> >>> -- Basic -- >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>> >>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>> >>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>> Version "small" tries to improve this. >>> >>> -- Prefetching -- >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>> >>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>> >>> -- Small -- >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>> >>> The numbers can be found here: >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>> >>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>> >>> What do you think? >>> >>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>> [3] Microbenchmark results for the "basic" implementation >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>> [4] Microbenchmark results for the "prefetching" implementation >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From long.chen at linaro.org Tue Apr 19 12:54:55 2016 From: long.chen at linaro.org (Long Chen) Date: Tue, 19 Apr 2016 20:54:55 +0800 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <5714D930.4090804@redhat.com> References: <5714D930.4090804@redhat.com> Message-ID: Thanks for all these nice comments. Here is a revised version: http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch Changes: 1. Are DC and IC really synonyms? DC and IC assembling was supposed to be distinguished by different cache_maintenance parameters. I create two enums ?icache_maintanence? and ?dcache_maintanence? in the revised patch, to make it look better. + enum icache_maintenance {IVAU = 0b0101}; + enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110, ZVA = 0b100}; + void dc(dcache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); + } + + void ic(icache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); } 2. I'm not convinced of the value of this. We already know that a simple while (count-- > 0) { *to++ = v; } turns into a call to memset() which does DC ZVA. OK. I reverted this change and leave it to the compiler. The patch becomes simpler :) 3. Block_zeroing -> block_zero, 8-byte unit -> HeapWords 4. I don't think this CBZ does anything useful: 0x0000007fa880f630: cbz x8, 0x0000007fa880f670 Removed 5. To avoid scratching a new register, I write a small piece of code after the dc zva loop in block_zero, so that block_zero doesn?t need to fall through to fill_words to zero the small part of array. This code might not perform as good as fill_words (unrolled), but it requires one less register, and the code size becomes smaller as well. The final code is like this: 0x0000007f7d3dd4fc: cmp x11, #0x20 0x0000007f7d3dd500: b.lt 0x0000007f7d3dd538 0x0000007f7d3dd504: neg x8, x10 0x0000007f7d3dd508: and x8, x8, #0x3f 0x0000007f7d3dd50c: cbz x8, 0x0000007f7d3dd520 0x0000007f7d3dd510: sub x11, x11, x8, asr #3 0x0000007f7d3dd514: sub x8, x8, #0x8 0x0000007f7d3dd518: str xzr, [x10],#8 0x0000007f7d3dd51c: cbnz x8, 0x0000007f7d3dd514 0x0000007f7d3dd520: sub x11, x11, #0x8 0x0000007f7d3dd524: dc zva, x10 0x0000007f7d3dd528: subs x11, x11, #0x8 0x0000007f7d3dd52c: add x10, x10, #0x40 0x0000007f7d3dd530: b.ge 0x0000007f7d3dd524 0x0000007f7d3dd534: add x11, x11, #0x8 0x0000007f7d3dd538: tbz w11, #0, 0x0000007f7d3dd544 0x0000007f7d3dd53c: str xzr, [x10],#8 0x0000007f7d3dd540: sub x11, x11, #0x1 0x0000007f7d3dd544: cbz x11, 0x0000007f7d3dd554 0x0000007f7d3dd548: sub x11, x11, #0x2 0x0000007f7d3dd54c: stp xzr, xzr, [x10],#16 0x0000007f7d3dd550: cbnz x11, 0x0000007f7d3dd548 Would this be fine? Regards Long On 18 April 2016 at 20:55, Andrew Haley wrote: > One other thing. This is rather a lot of code to emit every time an > array is created: > > ;; zero_words { > 0x0000007fa880f5f0: cmp x11, #0x20 > 0x0000007fa880f5f4: b.lt 0x0000007fa880f62c > > 0x0000007fa880f5f8: neg x8, x10 > 0x0000007fa880f5fc: and x8, x8, #0x7f > 0x0000007fa880f600: cbz x8, 0x0000007fa880f614 > 0x0000007fa880f604: sub x11, x11, x8, asr #3 > 0x0000007fa880f608: sub x8, x8, #0x8 > 0x0000007fa880f60c: str xzr, [x10],#8 > 0x0000007fa880f610: cbnz x8, 0x0000007fa880f608 > 0x0000007fa880f614: sub x11, x11, #0x10 > 0x0000007fa880f618: dc zva, x10 > 0x0000007fa880f61c: subs x11, x11, #0x10 > 0x0000007fa880f620: add x10, x10, #0x80 > 0x0000007fa880f624: b.ge 0x0000007fa880f618 > 0x0000007fa880f628: add x11, x11, #0x10 > > 0x0000007fa880f62c: and x8, x11, #0x7 > > I don't think this CBZ does anything useful: > > 0x0000007fa880f630: cbz x8, 0x0000007fa880f670 > > (I'm assuming that the 0-7 cases are uniformly distributed.) > > 0x0000007fa880f634: sub x11, x11, x8 > 0x0000007fa880f638: add x10, x10, x8, lsl #3 > 0x0000007fa880f63c: adr x9, 0x0000007fa880f670 > 0x0000007fa880f640: sub x9, x9, x8, lsl #2 > 0x0000007fa880f644: br x9 > 0x0000007fa880f648: add x10, x10, #0x40 > 0x0000007fa880f64c: sub x11, x11, #0x8 > 0x0000007fa880f650: stur xzr, [x10,#-64] > 0x0000007fa880f654: stur xzr, [x10,#-56] > 0x0000007fa880f658: stur xzr, [x10,#-48] > 0x0000007fa880f65c: stur xzr, [x10,#-40] > 0x0000007fa880f660: stur xzr, [x10,#-32] > 0x0000007fa880f664: stur xzr, [x10,#-24] > 0x0000007fa880f668: stur xzr, [x10,#-16] > 0x0000007fa880f66c: stur xzr, [x10,#-8] > 0x0000007fa880f670: cbnz x11, 0x0000007fa880f648 > ;; } zero_words > > We could think about moving the large block case into a stub which is > emitted after the main body of the method, or even into a shared stub. > A shared stub would require the args to be in fixed registers, though. > > Andrew. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Wed Apr 20 06:30:53 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 20 Apr 2016 08:30:53 +0200 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <5716AD45.1060302@oracle.com> References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com> <57167ECC.8080304@redhat.com> <5716AD45.1060302@oracle.com> Message-ID: <5717221D.2010107@redhat.com> > In JPRT. Thanks! Roland. From nils.eliasson at oracle.com Wed Apr 20 07:46:17 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Apr 2016 09:46:17 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> <5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com> <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com> Message-ID: <571733C9.5090302@oracle.com> On 2016-04-19 19:37, Christian Thalinger wrote: > >> On Apr 19, 2016, at 7:13 AM, Nils Eliasson > > wrote: >> >> >> >> On 2016-04-18 12:24, Nils Eliasson wrote: >>> Hi, >>> >>> On 2016-04-15 22:43, Christian Thalinger wrote: >>>> >>>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> On 2016-04-14 20:45, Christian Thalinger wrote: >>>>>> >>>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson >>>>>>> wrote: >>>>>>> >>>>>>> I moved the reasons to CompileTask.hpp and put it together with >>>>>>> the names list. Also changed the type from int to CompileReason >>>>>>> as Igor suggested. >>>>>>> >>>>>>> It gets verbose in the method declarations in compileBroker >>>>>> >>>>>> Don?t worry about this. >>>>>> >>>>>>> and sometimes I think CompileReason should be declared in >>>>>>> CompileBroker because it is mostly used there. On the other >>>>>>> hand, CompileTask is the keeper of the CompileReason so it makes >>>>>>> sense too. >>>>>> >>>>>> Yes, that?s the right place. >>>>>> >>>>>>> >>>>>>> New webrev: >>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>>>>> >>>>>> >>>>>> *+ bool can_become_stale() const {* >>>>>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* >>>>>> *+ }* >>>>>> I?m not a fan of implicit contracts just defined by comments. >>>>>> This method doesn?t seem to be performance critical so I would >>>>>> suggest to use a switch-case. An attribute on the enum would be >>>>>> much better but we all know this isn?t Java. >>>>> >>>>> As you suggested: >>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 >>>> >>>> Thanks. A space is missing and the closing } indent is wrong: >>>> *+ bool can_become_stale() const {* >>>> *+ switch(_compile_reason) {* >>>> *+ case Reason_BackedgeCount:* >>>> *+ case Reason_InvocationCount:* >>>> *+ case Reason_Tiered:* >>>> *+ return !_is_blocking;* >>>> *+ }* >>>> *+ return false;* >>>> *+ }* >> And I fixed the indentation. >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ > > *+ switch(_compile_reason) {* > Space after switch. New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.06/ Thanks, Nils > >> >> Thanks! >> Nils >>>> Also, what about: >>>> *+ Reason_None,* >>>> *+ Reason_CTW, // Compile the world* >>>> *+ Reason_Replay, // ciReplay* >>>> These were covered before. >>> Reason_None - is only used for bounds checking together with >>> Reason_Count. >>> Reason_Replay - if these compilations can get stale we can get >>> indeterminism in replay. >>> Reason_CTW - CTW could silently drop compiles -> more indeterminism. >>> >>> Regards, >>> Nils >>> >>>> >>>>> >>>>> Also made reasons CTW and Replay not stale-able. >>>>> >>>>> Thanks! >>>>> Nils >>>>> >>>>>> >>>>>>> >>>>>>> Thanks! >>>>>>> Nils >>>>>>> >>>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>>>>> Very nice, I like it. >>>>>>>> >>>>>>>> One note. CompileReason (and its names) should be CompileTask >>>>>>>> class where it is recorded. Then >>>>>>>> CompileTask::can_become_stale() can be in header file so it is >>>>>>>> inlinined on all platforms. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> New webrev: >>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Summary >>>>>>>>> Introduced an enum CompileReason with members matching all the old >>>>>>>>> variants, and a table containing all the unchanged strings. I >>>>>>>>> see the >>>>>>>>> possibility of removing/changing/simplifying some >>>>>>>>> CompileReasons but >>>>>>>>> have choosen not to do so in this change. >>>>>>>>> >>>>>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>>>>> >>>>>>>>> Testing: >>>>>>>>> Running Testset hotspot on all platforms and hotspot_all on >>>>>>>>> one platform >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nils Eliawsson >>>>>>>>> >>>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>>>>> Tasks get evicted from the compile_queue if their invocation >>>>>>>>>>> counter >>>>>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>>>>> >>>>>>>>>>> I'll do a proper fix, it is the right thing to do and should >>>>>>>>>>> be pretty >>>>>>>>>>> quick. I'll change the comment to an enum that represent who >>>>>>>>>>> submitted >>>>>>>>>>> the compile, and add a table for the comments. This could be >>>>>>>>>>> useful in >>>>>>>>>>> other settings to. >>>>>>>>>> >>>>>>>>>> Sounds good. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nils >>>>>>>>>>> >>>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>>>>> What do you mean "stale"? >>>>>>>>>>>> I would prefer to see the real fix as you suggested to >>>>>>>>>>>> avoid removing >>>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>>>>> >>>>>>>>>>>>> Summary: >>>>>>>>>>>>> Add method enqueued for compilation with WB API may be >>>>>>>>>>>>> removed from >>>>>>>>>>>>> the compile queue as stale. >>>>>>>>>>>>> >>>>>>>>>>>>> Solution: >>>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure >>>>>>>>>>>>> nothing gets >>>>>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>>>>> checks that may spare us from waiting until timeout for >>>>>>>>>>>>> failing.) >>>>>>>>>>>>> >>>>>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>>>>> task with info about the origin of the compile. The >>>>>>>>>>>>> comment field has >>>>>>>>>>>>> this information - but then it needs to be >>>>>>>>>>>>> converted to an enum. >>>>>>>>>>>>> >>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>>>>> Webrev: >>>>>>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Nils Eliasson >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at oracle.com Tue Apr 19 16:38:33 2016 From: vladimir.kempik at oracle.com (Vladimir Kempik) Date: Tue, 19 Apr 2016 09:38:33 -0700 (PDT) Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <570B9803.2030509@oracle.com> References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com> Message-ID: <57165F09.5050404@oracle.com> Hello Can I get some jdk8u reviewer to take a look at it as well? Thanks, Vladimir. On 11.04.2016 15:26, Tobias Hartmann wrote: > Hi Vladimir, > > On 11.04.2016 14:00, Vladimir Kempik wrote: >> Hello >> >> Please review this backport of 8130309 to jdk8u. >> >> Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. >> >> Testing: jprt, failing test. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 >> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ > Looks good to me. Thanks for backporting this! > > Best regards, > Tobias > >> Thanks >> -Vladimir >> From tobias.hartmann at oracle.com Wed Apr 20 08:05:47 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 10:05:47 +0200 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <57165F09.5050404@oracle.com> References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com> <57165F09.5050404@oracle.com> Message-ID: <5717385B.7040505@oracle.com> Hi Vladimir, I think this should go to jdk8u-dev (CC'ed) as well. Best regards, Tobias On 19.04.2016 18:38, Vladimir Kempik wrote: > Hello > > Can I get some jdk8u reviewer to take a look at it as well? > > Thanks, Vladimir. > > On 11.04.2016 15:26, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 11.04.2016 14:00, Vladimir Kempik wrote: >>> Hello >>> >>> Please review this backport of 8130309 to jdk8u. >>> >>> Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. >>> >>> Testing: jprt, failing test. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 >>> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ >> Looks good to me. Thanks for backporting this! >> >> Best regards, >> Tobias >> >>> Thanks >>> -Vladimir >>> > From jan.civlin at intel.com Wed Apr 20 10:11:52 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Wed, 20 Apr 2016 10:11:52 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <57157726.4030701@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> Vladimir, Please look at the updated patch at http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. Thank you, J [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA throughput = 356.09558280340946 MB/s [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA throughput = 354.1696071938408 MB/s [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA throughput = 349.01408678325697 MB/s -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, April 18, 2016 5:09 PM To: Civlin, Jan; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Hi Jan, The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. Please, move new code in macroAssembler_x86_sha.cpp to the end of file. _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: StubRoutines::x86::_k256_W_adr = generate_k256_W(); What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. Thanks, Vladimir On 4/18/16 2:44 PM, Civlin, Jan wrote: > == Correction in the subject line === > > We would like to contribute the SHA256 AVX2 intrinsic. > > This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. > > The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. > > Contributor: Jan Civlin. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8154495 > webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ > From tobias.hartmann at oracle.com Wed Apr 20 13:31:51 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 15:31:51 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> Message-ID: <571784C7.6020304@oracle.com> Hi John, On 20.04.2016 03:46, John Rose wrote: > So I started looking at your code and my inner SPARC junkie took over. > > This is what happened: > http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ Thanks a lot for having a look! > Perhaps there are some ideas that might be helpful: > - The rampdown logic can lose a couple of instructions by using xorcc and movr. Right, this simplifies the code a bit: http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? > - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. > - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. What do you think? Thanks, Tobias [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ [3] Runtime alignment checks: bind(Lunaligned); Label next; xor3(ary1, ary2, tmp); and3(tmp, 7, tmp); br_null_short(tmp, Assembler::pn, next); STOP("One array is unaligned!"); should_not_reach_here(); bind(next); STOP("Both arrays are unaligned!"); > On the other hand, what you wrote is nice and simple. > > HTH > ? John > > P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more > versions of misalignment, still with vectorization, as with the arraycopy stubs. > But that's neither nice nor simple. > >> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >> >> Thanks, Vladimir! >> >> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>> Very good. Go with basic. We can do SPU special improvements later if needed. >> >> Okay, I'll push the basic version. >> >> For reference, here are the results on a SPARC T4: >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >> >>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>> We do have arraycopy code for it but by default we don't use it: >>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>> >>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >> >> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >> >> java -XX:ArraycopySrcPrefetchDistance=42 -version >> ArraycopySrcPrefetchDistance (42) must be 0 >> Error: Could not create the Java Virtual Machine. >> Error: A fatal exception has occurred. Program will exit >> >> Thanks, >> Tobias >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following enhancement: >>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>> >>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>> >>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>> >>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>> >>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>> >>>> I evaluated the following three versions of the patch. >>>> >>>> -- Basic -- >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>> >>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>> >>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>> Version "small" tries to improve this. >>>> >>>> -- Prefetching -- >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>> >>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>> >>>> -- Small -- >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>> >>>> The numbers can be found here: >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>> >>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>> >>>> What do you think? >>>> >>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>> [3] Microbenchmark results for the "basic" implementation >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>> [4] Microbenchmark results for the "prefetching" implementation >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>>> From tobias.hartmann at oracle.com Wed Apr 20 13:46:53 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 15:46:53 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options Message-ID: <5717884D.2020108@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8086068 http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. Tested with regression test and RBT (running). Thanks, Tobias From zoltan.majo at oracle.com Wed Apr 20 14:02:43 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 20 Apr 2016 16:02:43 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <5717884D.2020108@oracle.com> References: <5717884D.2020108@oracle.com> Message-ID: <57178C03.1010902@oracle.com> Hi Tobias, thank you for looking taking care of this issue. There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems? Otherwise it looks good to me. Best regards, Zoltan On 04/20/2016 03:46 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8086068 > http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ > > The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. > > The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. > > Tested with regression test and RBT (running). > > Thanks, > Tobias From tobias.hartmann at oracle.com Wed Apr 20 14:22:23 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 16:22:23 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <57178C03.1010902@oracle.com> References: <5717884D.2020108@oracle.com> <57178C03.1010902@oracle.com> Message-ID: <5717909F.5040004@oracle.com> Hi Zoltan, On 20.04.2016 16:02, Zolt?n Maj? wrote: > There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems? I checked and combining them with -Xint does not cause any problems because they are guarded by UseCompiler. > Otherwise it looks good to me. Thanks for the review! Best regards, Tobias > Best regards, > > > Zoltan > > On 04/20/2016 03:46 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8086068 >> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ >> >> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. >> >> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. >> >> Tested with regression test and RBT (running). >> >> Thanks, >> Tobias > From zoltan.majo at oracle.com Wed Apr 20 15:01:18 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 20 Apr 2016 17:01:18 +0200 Subject: [9] RFR(XS): 8153292: AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger out-of-heap prefetching Message-ID: <571799BE.1030203@oracle.com> Hi, please review the patch for 8153292. https://bugs.openjdk.java.net/browse/JDK-8153292 Problem: To avoid out-of-heap accesses by instructions prefetching data, TLABs have a reserved area. The size of that area is supposed to be large enough to accommodate possible prefetching. The amount of prefetched data is controlled separately for instance and array allocations (by the AllocateInstancePrefetchLines and AllocatePrefetchLines flags). The size of the reserved area in the TLAB is, however, determined only based on AllocatePrefetchLines. As a result, AllocateInstancePrefetchLines > AllocatePrefetchLines can trigger out-of-heap memory accesses. Solution: Set the size of the reserved TLAB area to the MAX of both flags. Webrev: http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/ Testing: - JPRT; - local testing on a solaris_sparc machine. Thank you! Best regards, Zoltan From edward.nevill at gmail.com Wed Apr 20 17:08:30 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 20 Apr 2016 18:08:30 +0100 Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <57163063.3020506@redhat.com> References: <5714D930.4090804@redhat.com> <57163063.3020506@redhat.com> Message-ID: <1461172110.2941.63.camel@mylittlepony.linaroharston> On Tue, 2016-04-19 at 14:19 +0100, Andrew Haley wrote: > On 04/19/2016 01:54 PM, Long Chen wrote: > > Would this be fine? > > It might well be. I'd like Ed to do a few measurements of large and > small block zeroing. My guess is that a reasonably small unrolled loop > doing STP ZR, ZR will work better than anything else, but we'll see. OK. So I started by doing some basic measurements of how long it takes to clear a cache line on 3 different partners HW using 3 different methods. 1) A sequence of str zr, [base, #N] instructions 2) A sequence of stp zr, zr, [base, #N] instructions 3) Using dc zva Each test was repeated for 3 different memory sizes, 100 cache lines, 10000 cache lines and 1E7 cache lines to simulate the cases where we are hitting L1, L2 and main memory respectively. The results are here. I have normalised the time for the 100 cache line str to 100 for each partner to avoid disclosing any absolute performance figures. http://people.linaro.org/~edward.nevill/block_zero/zva.pdf >From this I get the following conclusions Partner X: - Significant improvement using stp vs str across all block zero sizes - Significant improvement using dc zva over stp across all sizes Partner Y: - Virtually no performance improvement using stp vs str all sizes - Significant improvement using dc zva Partner Z: - Small improvement using stp vs str on L2 sized clears - Small improvement using dc zva on L1/L2 sizes clears - Large block zeros show no performance improvement str/stp/dc zva (this is probably a feature of the external memory system on the partner Z board) So, guided by this I modified the block zeroing patch as follows if (!small) { } Here is the webrev for this http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v03/ I also made a minor modifcation to Long Chen's v02 patch. In the following code + tbz(cnt, 0, store_pair); + str(zr, Address(post(base, 8))); + sub(cnt, cnt, 1); + bind(store_pair); + cbz(cnt, done); + bind(loop_store_pair); + sub(cnt, cnt, 2); + stp(zr, zr, Address(post(base, 16))); + cbnz(cnt, loop_store_pair); + bind(done); it unnecessarily misaligns the base before continuing to do the stps. We know the base is aligned in the large case because it has just finished clearing cache lines. I moved the single word zero to the end. The number of instructions is the same. The webrev for this is here. http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04 For completeness I also implemented a version using stp only and not using dc zva at all. Webrev here http://people.linaro.org/~edward.nevill/block_zero/stp I have tested all of these, including Long Chens v01 and v02 patches using jmh as before (http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java) Results are here, I have normalised the original value in each case to 1E7uS to avoid disclosing any absolute performance figures. http://people.linaro.org/~edward.nevill/block_zero/zero.pdf In this orig - is a clean jdk9/hs-comp build (results normalised to 1E7uS) stp - is the stp patch above using only stps (no dc zva) bzero1 - is Long Chens v01 patch bzero2 - is Long Chens v02 patch bzero3 - is my patch above bzero4 - is Long Chens v02 patch with the minor mod to avoid misaligning the stps >From this it looks like bzero3 or bzero4 would be the preferred options, and I would suggest bzero4 as bzero3 is significantly larger. If people are happy could I prepare final changeset for review based on bzero4 (ie this one) http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04 All the best, Ed. From vladimir.kozlov at oracle.com Wed Apr 20 17:58:45 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 10:58:45 -0700 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <57165F09.5050404@oracle.com> References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com> <57165F09.5050404@oracle.com> Message-ID: <5717C355.9080509@oracle.com> Reviewed. Looks good. Thanks, Vladimir On 4/19/16 9:38 AM, Vladimir Kempik wrote: > Hello > > Can I get some jdk8u reviewer to take a look at it as well? > > Thanks, Vladimir. > > On 11.04.2016 15:26, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 11.04.2016 14:00, Vladimir Kempik wrote: >>> Hello >>> >>> Please review this backport of 8130309 to jdk8u. >>> >>> Small changes for jdk8 were applied. AArch64 changes were moved out >>> of openjdk scope. >>> >>> Testing: jprt, failing test. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 >>> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ >> Looks good to me. Thanks for backporting this! >> >> Best regards, >> Tobias >> >>> Thanks >>> -Vladimir >>> > From vladimir.kozlov at oracle.com Wed Apr 20 18:01:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 11:01:21 -0700 Subject: [9] RFR(XS): 8153292: AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger out-of-heap prefetching In-Reply-To: <571799BE.1030203@oracle.com> References: <571799BE.1030203@oracle.com> Message-ID: <5717C3F1.6030809@oracle.com> Looks good. Thanks, Vladimir On 4/20/16 8:01 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8153292. > > https://bugs.openjdk.java.net/browse/JDK-8153292 > > > Problem: To avoid out-of-heap accesses by instructions prefetching data, > TLABs have a reserved area. The size of that area is supposed to be > large enough to accommodate possible prefetching. > > The amount of prefetched data is controlled separately for instance and > array allocations (by the AllocateInstancePrefetchLines and > AllocatePrefetchLines flags). The size of the reserved area in the TLAB > is, however, determined only based on AllocatePrefetchLines. As a > result, AllocateInstancePrefetchLines > AllocatePrefetchLines can > trigger out-of-heap memory accesses. > > > Solution: Set the size of the reserved TLAB area to the MAX of both flags. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/ > > Testing: > - JPRT; > - local testing on a solaris_sparc machine. > > Thank you! > > Best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Wed Apr 20 18:37:55 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 11:37:55 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> Message-ID: <5717CC83.3070401@oracle.com> Looks good to me. I submitted testing on all platforms before integrating. Thanks, Vladimir On 4/20/16 3:11 AM, Civlin, Jan wrote: > Vladimir, > > Please look at the updated patch at > http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ > > I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). > > The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. > > The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. > > Thank you, > > J > > [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 > provider = SUN > algorithm = SHA-256 > msgSize = 1024 bytes > offset = 0 > iters = 10000000 > warmupIters = 20000 > hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 > TestSHA runtime = 28.756324129 seconds > TestSHA throughput = 356.09558280340946 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 > provider = SUN > algorithm = SHA-256 > msgSize = 1024 bytes > offset = 0 > iters = 10000000 > warmupIters = 20000 > hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 > TestSHA runtime = 28.912701124 seconds > TestSHA throughput = 354.1696071938408 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 > provider = SUN > algorithm = SHA-256 > msgSize = 1024 bytes > offset = 0 > iters = 10000000 > warmupIters = 20000 > hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 > TestSHA runtime = 29.339789962 seconds > TestSHA throughput = 349.01408678325697 MB/s > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, April 18, 2016 5:09 PM > To: Civlin, Jan; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) > > Hi Jan, > > The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. > > I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. > > Please, move new code in macroAssembler_x86_sha.cpp to the end of file. > > _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: > > StubRoutines::x86::_k256_W_adr = generate_k256_W(); > > What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. > > Thanks, > Vladimir > > On 4/18/16 2:44 PM, Civlin, Jan wrote: >> == Correction in the subject line === >> >> We would like to contribute the SHA256 AVX2 intrinsic. >> >> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >> >> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >> >> Contributor: Jan Civlin. >> >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >> From jan.civlin at intel.com Wed Apr 20 19:07:28 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Wed, 20 Apr 2016 19:07:28 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <5717CC83.3070401@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> Thank you! -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 20, 2016 11:38 AM To: Civlin, Jan ; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Looks good to me. I submitted testing on all platforms before integrating. Thanks, Vladimir On 4/20/16 3:11 AM, Civlin, Jan wrote: > Vladimir, > > Please look at the updated patch at > http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ > > I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). > > The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. > > The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. > > Thank you, > > J > > [jcivlin at HSW-EP02 TestSHA]$ > ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java > -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics > -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar > 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes > offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 > fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 > 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA > throughput = 356.09558280340946 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ > ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ja > va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics > -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar > 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes > offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 > fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 > 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA > throughput = 354.1696071938408 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ > ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ja > va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics > -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar > 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes > offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 > fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 > 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA > throughput = 349.01408678325697 MB/s > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, April 18, 2016 5:09 PM > To: Civlin, Jan; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no > supports_sha() available) > > Hi Jan, > > The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. > > I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. > > Please, move new code in macroAssembler_x86_sha.cpp to the end of file. > > _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: > > StubRoutines::x86::_k256_W_adr = generate_k256_W(); > > What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. > > Thanks, > Vladimir > > On 4/18/16 2:44 PM, Civlin, Jan wrote: >> == Correction in the subject line === >> >> We would like to contribute the SHA256 AVX2 intrinsic. >> >> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >> >> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >> >> Contributor: Jan Civlin. >> >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >> From nils.eliasson at oracle.com Wed Apr 20 19:26:32 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Apr 2016 21:26:32 +0200 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> Message-ID: <5717D7E8.5000108@oracle.com> In vmSymbols.cpp together with the other flag checks. Regards, Nils On 2016-04-20 02:44, Deshpande, Vivek R wrote: > > HI Nils > > Yes you are right the function accesses the command line flag > DisableIntrinsic and changes are static. > > Could you point me the right location for the function ? > > Also I have updated the webrev with rest of the comments here: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ > > Regards, > > Vivek > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of > *Nils Eliasson > *Sent:* Tuesday, April 19, 2016 5:55 AM > *To:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR (S): 8154473: Update for CompilerDirectives to > control stub generation and intrinsics > > Hi Vivek, > > The changes in is_intrinsic_disabled in compilerDirectives.* are > static and only access the command line flag DisableIntrinsics. As > long as stubs are only generated during startup and don't have a > method context - that is ok - but it doesn't belong in the > compilerDirectives-files if it doens't use directives. > > Regards, > Nils > > On 2016-04-18 19:38, Deshpande, Vivek R wrote: > > Hi all > > I would like to contribute a patch which helps to control the > intrinsics in interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > > Thanks and regards, > > Vivek > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry.dmitriev at oracle.com Wed Apr 20 19:51:24 2016 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Wed, 20 Apr 2016 22:51:24 +0300 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <5717884D.2020108@oracle.com> References: <5717884D.2020108@oracle.com> Message-ID: <5717DDBC.4030909@oracle.com> Hi Tobias, Can comment only about new test: I think that you don't need @library and @modules for this simple test. Not need a new webrev for that. Thank you! Dmitry On 20.04.2016 16:46, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8086068 > http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ > > The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. > > The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. > > Tested with regression test and RBT (running). > > Thanks, > Tobias From nils.eliasson at oracle.com Wed Apr 20 19:56:54 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Apr 2016 21:56:54 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <571519BE.605@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com> Message-ID: <5717DF06.3090305@oracle.com> Hi, Thanks for the help, I got it to work, and added NoSafePointVerifiers to make sure I hadn't missed anything. Then after many test iterations it failed again. It didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to the temporary buffer too, or relax the tag checks in the xml and accept that the output may need to be sorted by writer-thread before use. The output looks like: ... releases tty when blocking on a safepoint ... // back again after safepoint writing without ttylock now. // Here we fail on an assert today when we expect a closing print_nmethod tag This is malformed xml but has enough information to be reconstructed. Would this be an acceptable output? Regards, Nils On 2016-04-18 19:30, Vladimir Kozlov wrote: > tty would have the same problem but it use C_HEAP to allocate: > > defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) > defaultStream(); > > Please, look if you can do something similar. > > Thanks, > Vladimir > > On 4/18/16 4:24 AM, Nils Eliasson wrote: >> Resizeable is better, but then we assert on expanding the stringbuffer >> while being under a different ResourceMark. >> >> Regards, >> Nils >> >> On 2016-04-15 19:44, Vladimir Kozlov wrote: >>> Use resizable stream: >>> >>> stringStream(size_t initial_bufsize = 256); >>> >>> 1024 may not be enough. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this fix of print opto_assembly. >>>> >>>> Summary: >>>> The compilelog can get corrupted and the VM may assert on "failed: >>>> bad tag in log". >>>> >>>> When printing assembly in output.cpp we first take the ttylock, print >>>> the head and then the method metadata. However the >>>> metadata printing makes a vm entry and may block for a safepoint and >>>> will then release the lock >>>> (break_tty_lock_for_safepoint). After that some of the other compiler >>>> thread that haven't safepointed will take the lock >>>> and the broken log will be a fact when the safepoint is over and the >>>> first thread starts logging again. >>>> >>>> Solution: >>>> Print the method metadata to a temporary buffer, then take the tty >>>> lock. >>>> >>>> Testing: >>>> Repro from bug stops failing. >>>> Running :hotspot_all >>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>>> >>>> >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>>> >>>> Regards, >>>> Nils Eliasson >> From vladimir.kozlov at oracle.com Wed Apr 20 20:04:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 13:04:20 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> Message-ID: <5717E0C4.8050006@oracle.com> One thing was caught during build is ',' at the last line of enum: + STACK_SIZE = _RSP + _RSP_SIZE, +}; Compiler complains about it so I removed it in my local repo. Vladimir On 4/20/16 12:07 PM, Civlin, Jan wrote: > Thank you! > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 11:38 AM > To: Civlin, Jan ; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) > > Looks good to me. I submitted testing on all platforms before integrating. > > Thanks, > Vladimir > > On 4/20/16 3:11 AM, Civlin, Jan wrote: >> Vladimir, >> >> Please look at the updated patch at >> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >> >> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >> >> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >> >> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >> >> Thank you, >> >> J >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java >> -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 >> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 >> 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >> throughput = 356.09558280340946 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ja >> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 >> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 >> 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >> throughput = 354.1696071938408 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ja >> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 >> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 >> 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >> throughput = 349.01408678325697 MB/s >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, April 18, 2016 5:09 PM >> To: Civlin, Jan; hotspot compiler >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> Hi Jan, >> >> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >> >> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >> >> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >> >> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >> >> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >> >> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >> >> Thanks, >> Vladimir >> >> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>> == Correction in the subject line === >>> >>> We would like to contribute the SHA256 AVX2 intrinsic. >>> >>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>> >>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>> >>> Contributor: Jan Civlin. >>> >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>> From vladimir.kozlov at oracle.com Wed Apr 20 20:07:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 13:07:44 -0700 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <5717DF06.3090305@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com> <5717DF06.3090305@oracle.com> Message-ID: <5717E190.5070107@oracle.com> On 4/20/16 12:56 PM, Nils Eliasson wrote: > Hi, > > Thanks for the help, > > I got it to work, and added NoSafePointVerifiers to make sure I hadn't > missed anything. Then after many test iterations it failed again. It > didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get > a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to > the temporary buffer too, or relax the tag checks in the xml and accept > that the output may need to be sorted by writer-thread before use. The > output looks like: > > > > ... > releases tty when blocking on a safepoint > > > ... > // back again after safepoint writing without > ttylock now. > // Here we fail on an assert today when we expect > a closing print_nmethod tag > > > > This is malformed xml but has enough information to be reconstructed. > Would this be an acceptable output? Yes, I think it is acceptable - we don't loose information. And it is not worse than it was before. Thanks, Vladimir > > Regards, > Nils > > > On 2016-04-18 19:30, Vladimir Kozlov wrote: >> tty would have the same problem but it use C_HEAP to allocate: >> >> defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) >> defaultStream(); >> >> Please, look if you can do something similar. >> >> Thanks, >> Vladimir >> >> On 4/18/16 4:24 AM, Nils Eliasson wrote: >>> Resizeable is better, but then we assert on expanding the stringbuffer >>> while being under a different ResourceMark. >>> >>> Regards, >>> Nils >>> >>> On 2016-04-15 19:44, Vladimir Kozlov wrote: >>>> Use resizable stream: >>>> >>>> stringStream(size_t initial_bufsize = 256); >>>> >>>> 1024 may not be enough. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this fix of print opto_assembly. >>>>> >>>>> Summary: >>>>> The compilelog can get corrupted and the VM may assert on "failed: >>>>> bad tag in log". >>>>> >>>>> When printing assembly in output.cpp we first take the ttylock, print >>>>> the head and then the method metadata. However the >>>>> metadata printing makes a vm entry and may block for a safepoint and >>>>> will then release the lock >>>>> (break_tty_lock_for_safepoint). After that some of the other compiler >>>>> thread that haven't safepointed will take the lock >>>>> and the broken log will be a fact when the safepoint is over and the >>>>> first thread starts logging again. >>>>> >>>>> Solution: >>>>> Print the method metadata to a temporary buffer, then take the tty >>>>> lock. >>>>> >>>>> Testing: >>>>> Repro from bug stops failing. >>>>> Running :hotspot_all >>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>>>> >>>>> >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>>>> >>>>> Regards, >>>>> Nils Eliasson >>> > From jan.civlin at intel.com Wed Apr 20 20:13:37 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Wed, 20 Apr 2016 20:13:37 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <5717E0C4.8050006@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> <5717E0C4.8050006@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> Thank you, Vladimir. I guess it was a warning. I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new. Section 6.7.2.2 of C99 lists the syntax as: enum-specifier: enum identifieropt { enumerator-list } enum identifieropt { enumerator-list , } enum identifier enumerator-list: enumerator enumerator-list , enumerator enumerator: enumeration-constant enumeration-constant = constant-expression -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 20, 2016 1:04 PM To: Civlin, Jan ; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) One thing was caught during build is ',' at the last line of enum: + STACK_SIZE = _RSP + _RSP_SIZE, +}; Compiler complains about it so I removed it in my local repo. Vladimir On 4/20/16 12:07 PM, Civlin, Jan wrote: > Thank you! > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 11:38 AM > To: Civlin, Jan ; hotspot compiler > > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no > supports_sha() available) > > Looks good to me. I submitted testing on all platforms before integrating. > > Thanks, > Vladimir > > On 4/20/16 3:11 AM, Civlin, Jan wrote: >> Vladimir, >> >> Please look at the updated patch at >> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >> >> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >> >> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >> >> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >> >> Thank you, >> >> J >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/jav >> a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >> throughput = 356.09558280340946 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/j >> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >> throughput = 354.1696071938408 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/j >> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >> throughput = 349.01408678325697 MB/s >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, April 18, 2016 5:09 PM >> To: Civlin, Jan; hotspot compiler >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> Hi Jan, >> >> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >> >> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >> >> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >> >> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >> >> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >> >> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >> >> Thanks, >> Vladimir >> >> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>> == Correction in the subject line === >>> >>> We would like to contribute the SHA256 AVX2 intrinsic. >>> >>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>> >>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>> >>> Contributor: Jan Civlin. >>> >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>> From vladimir.kozlov at oracle.com Wed Apr 20 20:17:15 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 13:17:15 -0700 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <5717909F.5040004@oracle.com> References: <5717884D.2020108@oracle.com> <57178C03.1010902@oracle.com> <5717909F.5040004@oracle.com> Message-ID: <5717E3CB.6060107@oracle.com> An other interesting combination is -Xshare:dump -Xcomp (or -XX:+UseCompiler) because -Xshare:dump tries to disable compilation. I think the fix is good for -Xint -XX:+UseCompiler combination. Thanks, Vladimir On 4/20/16 7:22 AM, Tobias Hartmann wrote: > Hi Zoltan, > > On 20.04.2016 16:02, Zolt?n Maj? wrote: >> There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems? > > I checked and combining them with -Xint does not cause any problems because they are guarded by UseCompiler. > >> Otherwise it looks good to me. > > Thanks for the review! > > Best regards, > Tobias > >> Best regards, >> >> >> Zoltan >> >> On 04/20/2016 03:46 PM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8086068 >>> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ >>> >>> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. >>> >>> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. >>> >>> Tested with regression test and RBT (running). >>> >>> Thanks, >>> Tobias >> From christian.thalinger at oracle.com Wed Apr 20 21:47:52 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 20 Apr 2016 11:47:52 -1000 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <571733C9.5090302@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> <5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com> <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com> <571733C9.5090302@oracle.com> Message-ID: Looks good. > On Apr 19, 2016, at 9:46 PM, Nils Eliasson wrote: > > > > On 2016-04-19 19:37, Christian Thalinger wrote: >> >>> On Apr 19, 2016, at 7:13 AM, Nils Eliasson < nils.eliasson at oracle.com > wrote: >>> >>> >>> >>> On 2016-04-18 12:24, Nils Eliasson wrote: >>>> Hi, >>>> >>>> On 2016-04-15 22:43, Christian Thalinger wrote: >>>>> >>>>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson > wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 2016-04-14 20:45, Christian Thalinger wrote: >>>>>>> >>>>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson < nils.eliasson at oracle.com > wrote: >>>>>>>> >>>>>>>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. >>>>>>>> >>>>>>>> It gets verbose in the method declarations in compileBroker >>>>>>> >>>>>>> Don?t worry about this. >>>>>>> >>>>>>>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. >>>>>>> >>>>>>> Yes, that?s the right place. >>>>>>> >>>>>>>> >>>>>>>> New webrev: >>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>>>>> >>>>>>> + bool can_become_stale() const { >>>>>>> + return !_is_blocking && (_compile_reason < Reason_Whitebox); >>>>>>> + } >>>>>>> I?m not a fan of implicit contracts just defined by comments. This method doesn?t seem to be performance critical so I would suggest to use a switch-case. An attribute on the enum would be much better but we all know this isn?t Java. >>>>>> >>>>>> As you suggested: >>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 >>>>> >>>>> Thanks. A space is missing and the closing } indent is wrong: >>>>> + bool can_become_stale() const { >>>>> + switch(_compile_reason) { >>>>> + case Reason_BackedgeCount: >>>>> + case Reason_InvocationCount: >>>>> + case Reason_Tiered: >>>>> + return !_is_blocking; >>>>> + } >>>>> + return false; >>>>> + } >>> And I fixed the indentation. >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ >> >> + switch(_compile_reason) { >> Space after switch. > > New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.06/ > > Thanks, > Nils > >> >>> >>> Thanks! >>> Nils >>>>> Also, what about: >>>>> + Reason_None, >>>>> + Reason_CTW, // Compile the world >>>>> + Reason_Replay, // ciReplay >>>>> These were covered before. >>>> Reason_None - is only used for bounds checking together with Reason_Count. >>>> Reason_Replay - if these compilations can get stale we can get indeterminism in replay. >>>> Reason_CTW - CTW could silently drop compiles -> more indeterminism. >>>> >>>> Regards, >>>> Nils >>>> >>>>> >>>>>> >>>>>> Also made reasons CTW and Replay not stale-able. >>>>>> >>>>>> Thanks! >>>>>> Nils >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Nils >>>>>>>> >>>>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>>>>>> Very nice, I like it. >>>>>>>>> >>>>>>>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> New webrev: >>>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>>>>>> >>>>>>>>>> Summary >>>>>>>>>> Introduced an enum CompileReason with members matching all the old >>>>>>>>>> variants, and a table containing all the unchanged strings. I see the >>>>>>>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>>>>>>> have choosen not to do so in this change. >>>>>>>>>> >>>>>>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>>>>>> >>>>>>>>>> Testing: >>>>>>>>>> Running Testset hotspot on all platforms and hotspot_all on one platform >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nils Eliawsson >>>>>>>>>> >>>>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>>>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>>>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>>>>>> >>>>>>>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>>>>>>>>> quick. I'll change the comment to an enum that represent who submitted >>>>>>>>>>>> the compile, and add a table for the comments. This could be useful in >>>>>>>>>>>> other settings to. >>>>>>>>>>> >>>>>>>>>>> Sounds good. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Nils >>>>>>>>>>>> >>>>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>>>>>> What do you mean "stale"? >>>>>>>>>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Vladimir >>>>>>>>>>>>> >>>>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>>>>>>>>> the compile queue as stale. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Solution: >>>>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>>>>>> task with info about the origin of the compile. The comment field has >>>>>>>>>>>>>> this information - but then it needs to be >>>>>>>>>>>>>> converted to an enum. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Nils Eliasson >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 20 22:51:29 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 15:51:29 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> <5717E0C4.8050006@oracle.com> <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> Message-ID: <571807F1.10105@oracle.com> Testing is continued but it found next problem already when running tests with -XX:UseSSE=2: # Internal Error (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.cpp:3693), pid=52652, tid=3587 # Error: assert(VM_Version::supports_ssse3()) failed V [libjvm.dylib+0x4193d7] report_vm_error(char const*, int, char const*, char const*, ...)+0xcd V [libjvm.dylib+0x1eedd2] Assembler::vpshufb(XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V [libjvm.dylib+0xa4dc47] StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b Vladimir On 4/20/16 1:13 PM, Civlin, Jan wrote: > Thank you, Vladimir. > > I guess it was a warning. > I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new. > > > Section 6.7.2.2 of C99 lists the syntax as: > > enum-specifier: > enum identifieropt { enumerator-list } > enum identifieropt { enumerator-list , } > enum identifier > enumerator-list: > enumerator > enumerator-list , enumerator > enumerator: > enumeration-constant > enumeration-constant = constant-expression > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 1:04 PM > To: Civlin, Jan ; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) > > One thing was caught during build is ',' at the last line of enum: > > + STACK_SIZE = _RSP + _RSP_SIZE, > +}; > > Compiler complains about it so I removed it in my local repo. > > Vladimir > > On 4/20/16 12:07 PM, Civlin, Jan wrote: >> Thank you! >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, April 20, 2016 11:38 AM >> To: Civlin, Jan ; hotspot compiler >> >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> Looks good to me. I submitted testing on all platforms before integrating. >> >> Thanks, >> Vladimir >> >> On 4/20/16 3:11 AM, Civlin, Jan wrote: >>> Vladimir, >>> >>> Please look at the updated patch at >>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >>> >>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >>> >>> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >>> >>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >>> >>> Thank you, >>> >>> J >>> >>> [jcivlin at HSW-EP02 TestSHA]$ >>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/jav >>> a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >>> throughput = 356.09558280340946 MB/s >>> >>> [jcivlin at HSW-EP02 TestSHA]$ >>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/j >>> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >>> throughput = 354.1696071938408 MB/s >>> >>> [jcivlin at HSW-EP02 TestSHA]$ >>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/j >>> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >>> throughput = 349.01408678325697 MB/s >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, April 18, 2016 5:09 PM >>> To: Civlin, Jan; hotspot compiler >>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>> supports_sha() available) >>> >>> Hi Jan, >>> >>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >>> >>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >>> >>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >>> >>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >>> >>> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >>> >>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>>> == Correction in the subject line === >>>> >>>> We would like to contribute the SHA256 AVX2 intrinsic. >>>> >>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>>> >>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>>> >>>> Contributor: Jan Civlin. >>>> >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>>> From vivek.r.deshpande at intel.com Thu Apr 21 00:06:48 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 21 Apr 2016 00:06:48 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <5717D7E8.5000108@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> <5717D7E8.5000108@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84830@ORSMSX106.amr.corp.intel.com> Hi Nils I have updated the webrev with all the suggestions. updated webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Thanks for your comments and review. @Vladimir, I have taken care of all the comments. Would you please review and sponsor the patch. Thanks and regards, Vivek From: Nils Eliasson [mailto:nils.eliasson at oracle.com] Sent: Wednesday, April 20, 2016 12:27 PM To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In vmSymbols.cpp together with the other flag checks. Regards, Nils On 2016-04-20 02:44, Deshpande, Vivek R wrote: HI Nils Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static. Could you point me the right location for the function ? Also I have updated the webrev with rest of the comments here: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Regards, Vivek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson Sent: Tuesday, April 19, 2016 5:55 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Thu Apr 21 00:09:54 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 21 Apr 2016 00:09:54 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> <5717D7E8.5000108@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84860@ORSMSX106.amr.corp.intel.com> Sent out the wrong link by mistake. updated webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ Regards Vivek From: Deshpande, Vivek R Sent: Wednesday, April 20, 2016 5:07 PM To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Nils I have updated the webrev with all the suggestions. updated webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Thanks for your comments and review. @Vladimir, I have taken care of all the comments. Would you please review and sponsor the patch. Thanks and regards, Vivek From: Nils Eliasson [mailto:nils.eliasson at oracle.com] Sent: Wednesday, April 20, 2016 12:27 PM To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In vmSymbols.cpp together with the other flag checks. Regards, Nils On 2016-04-20 02:44, Deshpande, Vivek R wrote: HI Nils Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static. Could you point me the right location for the function ? Also I have updated the webrev with rest of the comments here: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Regards, Vivek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson Sent: Tuesday, April 19, 2016 5:55 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Thu Apr 21 00:13:29 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 21 Apr 2016 00:13:29 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> <5717D7E8.5000108@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com> Hi The correct URL for the updated webrev is this. http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ Sorry for the spam. Regards, Vivek From: Deshpande, Vivek R Sent: Wednesday, April 20, 2016 5:10 PM To: Deshpande, Vivek R; 'Nils Eliasson'; 'hotspot-compiler-dev at openjdk.java.net' Cc: 'Vladimir Kozlov'; 'Volker Simonis'; 'Christian Thalinger'; Viswanathan, Sandhya Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Sent out the wrong link by mistake. updated webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ Regards Vivek From: Deshpande, Vivek R Sent: Wednesday, April 20, 2016 5:07 PM To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Nils I have updated the webrev with all the suggestions. updated webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Thanks for your comments and review. @Vladimir, I have taken care of all the comments. Would you please review and sponsor the patch. Thanks and regards, Vivek From: Nils Eliasson [mailto:nils.eliasson at oracle.com] Sent: Wednesday, April 20, 2016 12:27 PM To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In vmSymbols.cpp together with the other flag checks. Regards, Nils On 2016-04-20 02:44, Deshpande, Vivek R wrote: HI Nils Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static. Could you point me the right location for the function ? Also I have updated the webrev with rest of the comments here: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Regards, Vivek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson Sent: Tuesday, April 19, 2016 5:55 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Thu Apr 21 06:50:32 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Apr 2016 08:50:32 +0200 Subject: [9] RFR(XS): 8153292: AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger out-of-heap prefetching In-Reply-To: <571799BE.1030203@oracle.com> References: <571799BE.1030203@oracle.com> Message-ID: <57187838.5070607@oracle.com> Hi Zoltan, looks good to me! Best regards, Tobias On 20.04.2016 17:01, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8153292. > > https://bugs.openjdk.java.net/browse/JDK-8153292 > > > Problem: To avoid out-of-heap accesses by instructions prefetching data, TLABs have a reserved area. The size of that area is supposed to be large enough to accommodate possible prefetching. > > The amount of prefetched data is controlled separately for instance and array allocations (by the AllocateInstancePrefetchLines and AllocatePrefetchLines flags). The size of the reserved area in the TLAB is, however, determined only based on AllocatePrefetchLines. As a result, AllocateInstancePrefetchLines > AllocatePrefetchLines can trigger out-of-heap memory accesses. > > > Solution: Set the size of the reserved TLAB area to the MAX of both flags. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/ > > Testing: > - JPRT; > - local testing on a solaris_sparc machine. > > Thank you! > > Best regards, > > > Zoltan > From zoltan.majo at oracle.com Thu Apr 21 07:26:39 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 21 Apr 2016 09:26:39 +0200 Subject: [9] RFR(XS): 8153292: AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger out-of-heap prefetching In-Reply-To: <57187838.5070607@oracle.com> References: <571799BE.1030203@oracle.com> <57187838.5070607@oracle.com> Message-ID: <571880AF.2010608@oracle.com> Thank you, Vladimir and Tobias, for the review! Best regards, Zoltan On 04/21/2016 08:50 AM, Tobias Hartmann wrote: > Hi Zoltan, > > looks good to me! > > Best regards, > Tobias > > On 20.04.2016 17:01, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8153292. >> >> https://bugs.openjdk.java.net/browse/JDK-8153292 >> >> >> Problem: To avoid out-of-heap accesses by instructions prefetching data, TLABs have a reserved area. The size of that area is supposed to be large enough to accommodate possible prefetching. >> >> The amount of prefetched data is controlled separately for instance and array allocations (by the AllocateInstancePrefetchLines and AllocatePrefetchLines flags). The size of the reserved area in the TLAB is, however, determined only based on AllocatePrefetchLines. As a result, AllocateInstancePrefetchLines > AllocatePrefetchLines can trigger out-of-heap memory accesses. >> >> >> Solution: Set the size of the reserved TLAB area to the MAX of both flags. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/ >> >> Testing: >> - JPRT; >> - local testing on a solaris_sparc machine. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> From tobias.hartmann at oracle.com Thu Apr 21 07:34:21 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Apr 2016 09:34:21 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <5717DDBC.4030909@oracle.com> References: <5717884D.2020108@oracle.com> <5717DDBC.4030909@oracle.com> Message-ID: <5718827D.5070200@oracle.com> Hi Dmitry, On 20.04.2016 21:51, Dmitry Dmitriev wrote: > Hi Tobias, > > Can comment only about new test: I think that you don't need @library and @modules for this simple test. Not need a new webrev for that. Thank you! Right, I will remove the tags before pushing. Thanks for the review! Best regards, Tobias > Dmitry > > On 20.04.2016 16:46, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8086068 >> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ >> >> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. >> >> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. >> >> Tested with regression test and RBT (running). >> >> Thanks, >> Tobias > From tobias.hartmann at oracle.com Thu Apr 21 07:35:17 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Apr 2016 09:35:17 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <5717E3CB.6060107@oracle.com> References: <5717884D.2020108@oracle.com> <57178C03.1010902@oracle.com> <5717909F.5040004@oracle.com> <5717E3CB.6060107@oracle.com> Message-ID: <571882B5.2090706@oracle.com> Hi Vladimir, thanks for the review! On 20.04.2016 22:17, Vladimir Kozlov wrote: > An other interesting combination is -Xshare:dump -Xcomp (or -XX:+UseCompiler) because -Xshare:dump tries to disable compilation. With "-Xshare:dump -XX:+UseCompiler", UseCompiler is disabled by this fix. With "-Xshare:dump -Xcomp", UseCompiler is enabled but no compilations are triggered because DumpSharedSpaces is set and no bytecode is executed. I checked several other combinations and did not encounter any problems. > I think the fix is good for -Xint -XX:+UseCompiler combination. Okay, I'm going to push this with the changes Dmitry suggested. Thanks, Tobias > > Thanks, > Vladimir > > On 4/20/16 7:22 AM, Tobias Hartmann wrote: >> Hi Zoltan, >> >> On 20.04.2016 16:02, Zolt?n Maj? wrote: >>> There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems? >> >> I checked and combining them with -Xint does not cause any problems because they are guarded by UseCompiler. >> >>> Otherwise it looks good to me. >> >> Thanks for the review! >> >> Best regards, >> Tobias >> >>> Best regards, >>> >>> >>> Zoltan >>> >>> On 04/20/2016 03:46 PM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8086068 >>>> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ >>>> >>>> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. >>>> >>>> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. >>>> >>>> Tested with regression test and RBT (running). >>>> >>>> Thanks, >>>> Tobias >>> From adinn at redhat.com Thu Apr 21 07:43:41 2016 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 21 Apr 2016 08:43:41 +0100 Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <1461172110.2941.63.camel@mylittlepony.linaroharston> References: <5714D930.4090804@redhat.com> <57163063.3020506@redhat.com> <1461172110.2941.63.camel@mylittlepony.linaroharston> Message-ID: <571884AD.1030906@redhat.com> On 20/04/16 18:08, Edward Nevill wrote: > If people are happy could I prepare final changeset for review based on bzero4 (ie this one) > > http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04 Yeah, science rocks! bzero4 is it for me but you need a nod from the boss. regards, Andrew Dinn ----------- From rwestrel at redhat.com Thu Apr 21 08:23:31 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 21 Apr 2016 10:23:31 +0200 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode Message-ID: <57188E03.5070303@redhat.com> http://cr.openjdk.java.net/~roland/8154826/webrev.00/ The aarch64 port implicitly transforms: (AddP base (AddP base address (LShiftL index con)) offset) into: (AddP base (AddP base offset) (LShiftL index con)) in the ad file to embed the shift (and possibly and i2l conversion) into the addressing mode of a memory operation. Exposing that transformation in the ideal graph allows: - (AddP base offset) to be scheduled (for instance outside a loop) - multiple identical (AddP base offset) to be commoned - (LShiftL index con) to be cloned during matching so that each memory access has its own Roland. From tobias.hartmann at oracle.com Thu Apr 21 10:11:03 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Apr 2016 12:11:03 +0200 Subject: [9] RFR(XS): 8086057: Crash with "modified node is not on IGVN._worklist" when running with -XX:-SplitIfBlocks Message-ID: <5718A737.1050708@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8086057 http://cr.openjdk.java.net/~thartmann/8086057/webrev.00/ The fastdebug VM crashes with "modified node is not on IGVN._worklist" because SuperWord::align_initial_loop_index() resets the input of the pre-loop Opaque1 node 'pre_opaq' without putting it on the IGVN worklist afterwards. This only shows up with -XX:-SplitIfBlocks because otherwise surrounding nodes are changed causing the OpagueNode to be put on the worklist. The method should use PhaseIterGVN::replace_input_of() which uses rehash_node_delayed() to ensure the modified node is put on the worklist. Tested with regression test, Nashorn+Octane with -XX:-SplitIfBlocks and RBT (running). Thanks, Tobias From zoltan.majo at oracle.com Thu Apr 21 11:30:04 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 21 Apr 2016 13:30:04 +0200 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 Message-ID: <5718B9BC.10001@oracle.com> Hi, please review the patch for 8153340. https://bugs.openjdk.java.net/browse/JDK-8153340 Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way the address for the first prefetch instruction is calculated [1]: If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of the bits of cache_adr. That result in accesses *before* the newly allocated object. Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). Unquarantine test. Webrev: http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ Testing: - JPRT (incl. TestOptionsWithRanges.java) - local testing on a SPARC machine. Thank you! Best regards, Zoltan [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 From tobias.hartmann at oracle.com Thu Apr 21 13:56:56 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Apr 2016 15:56:56 +0200 Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled Message-ID: <5718DC28.7080502@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8154763 http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/ JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled. I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check. Tested with regression test and RBT (running). Thanks, Tobias From michael.c.berg at intel.com Thu Apr 21 14:42:05 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 21 Apr 2016 14:42:05 +0000 Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled In-Reply-To: <5718DC28.7080502@oracle.com> References: <5718DC28.7080502@oracle.com> Message-ID: Hi Tobias, sure when RangeCheckElimination is not enabled this makes sense. Thanks, Michael -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Thursday, April 21, 2016 6:57 AM To: hotspot-compiler-dev at openjdk.java.net Cc: Berg, Michael C Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8154763 http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/ JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled. I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check. Tested with regression test and RBT (running). Thanks, Tobias From tobias.hartmann at oracle.com Thu Apr 21 14:44:40 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Apr 2016 16:44:40 +0200 Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled In-Reply-To: References: <5718DC28.7080502@oracle.com> Message-ID: <5718E758.8040301@oracle.com> Hi Michael, On 21.04.2016 16:42, Berg, Michael C wrote: > Hi Tobias, sure when RangeCheckElimination is not enabled this makes sense. thanks for the review! Best regards, Tobias > > Thanks, > Michael > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Thursday, April 21, 2016 6:57 AM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Berg, Michael C > Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled > > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8154763 > http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/ > > JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled. > > I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check. > > Tested with regression test and RBT (running). > > Thanks, > Tobias > From jan.civlin at intel.com Thu Apr 21 18:15:11 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Thu, 21 Apr 2016 18:15:11 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <571807F1.10105@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> <5717E0C4.8050006@oracle.com> <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> <571807F1.10105@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com> Vladimir, I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function. Please look at http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/ Thank you, J -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 20, 2016 3:51 PM To: Civlin, Jan ; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Testing is continued but it found next problem already when running tests with -XX:UseSSE=2: # Internal Error (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.cpp:3693), pid=52652, tid=3587 # Error: assert(VM_Version::supports_ssse3()) failed V [libjvm.dylib+0x4193d7] report_vm_error(char const*, int, char const*, char const*, ...)+0xcd V [libjvm.dylib+0x1eedd2] Assembler::vpshufb(XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V [libjvm.dylib+0xa4dc47] StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b Vladimir On 4/20/16 1:13 PM, Civlin, Jan wrote: > Thank you, Vladimir. > > I guess it was a warning. > I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new. > > > Section 6.7.2.2 of C99 lists the syntax as: > > enum-specifier: > enum identifieropt { enumerator-list } > enum identifieropt { enumerator-list , } > enum identifier > enumerator-list: > enumerator > enumerator-list , enumerator > enumerator: > enumeration-constant > enumeration-constant = constant-expression > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 1:04 PM > To: Civlin, Jan ; hotspot compiler > > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no > supports_sha() available) > > One thing was caught during build is ',' at the last line of enum: > > + STACK_SIZE = _RSP + _RSP_SIZE, > +}; > > Compiler complains about it so I removed it in my local repo. > > Vladimir > > On 4/20/16 12:07 PM, Civlin, Jan wrote: >> Thank you! >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, April 20, 2016 11:38 AM >> To: Civlin, Jan ; hotspot compiler >> >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> Looks good to me. I submitted testing on all platforms before integrating. >> >> Thanks, >> Vladimir >> >> On 4/20/16 3:11 AM, Civlin, Jan wrote: >>> Vladimir, >>> >>> Please look at the updated patch at >>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >>> >>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >>> >>> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >>> >>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >>> >>> Thank you, >>> >>> J >>> >>> [jcivlin at HSW-EP02 TestSHA]$ >>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/ja >>> v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >>> throughput = 356.09558280340946 MB/s >>> >>> [jcivlin at HSW-EP02 TestSHA]$ >>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ >>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >>> throughput = 354.1696071938408 MB/s >>> >>> [jcivlin at HSW-EP02 TestSHA]$ >>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ >>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >>> throughput = 349.01408678325697 MB/s >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, April 18, 2016 5:09 PM >>> To: Civlin, Jan; hotspot compiler >>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>> supports_sha() available) >>> >>> Hi Jan, >>> >>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >>> >>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >>> >>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >>> >>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >>> >>> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >>> >>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>>> == Correction in the subject line === >>>> >>>> We would like to contribute the SHA256 AVX2 intrinsic. >>>> >>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>>> >>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>>> >>>> Contributor: Jan Civlin. >>>> >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>>> From vladimir.kozlov at oracle.com Thu Apr 21 19:30:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Apr 2016 12:30:40 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> <5717E0C4.8050006@oracle.com> <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> <571807F1.10105@oracle.com> <39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com> Message-ID: <57192A60.5030206@oracle.com> Good. But testing found that jrteg SHA tests failed because they don't expect the SHA2 is supported: "Expected message not found: 'SHA instructions are not available on this CPU'" hotspot/test/compiler/intrinsics/sha/ Do you know how to run jtreg tests? May be Sandhya or Michael can help. Thanks, Vladimir On 4/21/16 11:15 AM, Civlin, Jan wrote: > Vladimir, > > I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function. > Please look at > > http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/ > > Thank you, > > J > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 3:51 PM > To: Civlin, Jan ; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) > > Testing is continued but it found next problem already when running tests with -XX:UseSSE=2: > > # Internal Error > (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86.cpp:3693), > pid=52652, tid=3587 > # Error: assert(VM_Version::supports_ssse3()) failed > > V [libjvm.dylib+0x4193d7] report_vm_error(char const*, int, char const*, char const*, ...)+0xcd V [libjvm.dylib+0x1eedd2] Assembler::vpshufb(XMMRegisterImpl*, > XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V [libjvm.dylib+0xa4dc47] StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b > > > Vladimir > > On 4/20/16 1:13 PM, Civlin, Jan wrote: >> Thank you, Vladimir. >> >> I guess it was a warning. >> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new. >> >> >> Section 6.7.2.2 of C99 lists the syntax as: >> >> enum-specifier: >> enum identifieropt { enumerator-list } >> enum identifieropt { enumerator-list , } >> enum identifier >> enumerator-list: >> enumerator >> enumerator-list , enumerator >> enumerator: >> enumeration-constant >> enumeration-constant = constant-expression >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, April 20, 2016 1:04 PM >> To: Civlin, Jan ; hotspot compiler >> >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> One thing was caught during build is ',' at the last line of enum: >> >> + STACK_SIZE = _RSP + _RSP_SIZE, >> +}; >> >> Compiler complains about it so I removed it in my local repo. >> >> Vladimir >> >> On 4/20/16 12:07 PM, Civlin, Jan wrote: >>> Thank you! >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, April 20, 2016 11:38 AM >>> To: Civlin, Jan ; hotspot compiler >>> >>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>> supports_sha() available) >>> >>> Looks good to me. I submitted testing on all platforms before integrating. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/20/16 3:11 AM, Civlin, Jan wrote: >>>> Vladimir, >>>> >>>> Please look at the updated patch at >>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >>>> >>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >>>> >>>> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >>>> >>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >>>> >>>> Thank you, >>>> >>>> J >>>> >>>> [jcivlin at HSW-EP02 TestSHA]$ >>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/ja >>>> v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >>>> throughput = 356.09558280340946 MB/s >>>> >>>> [jcivlin at HSW-EP02 TestSHA]$ >>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ >>>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >>>> throughput = 354.1696071938408 MB/s >>>> >>>> [jcivlin at HSW-EP02 TestSHA]$ >>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ >>>> j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >>>> throughput = 349.01408678325697 MB/s >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Monday, April 18, 2016 5:09 PM >>>> To: Civlin, Jan; hotspot compiler >>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>>> supports_sha() available) >>>> >>>> Hi Jan, >>>> >>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >>>> >>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >>>> >>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >>>> >>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >>>> >>>> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >>>> >>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>>>> == Correction in the subject line === >>>>> >>>>> We would like to contribute the SHA256 AVX2 intrinsic. >>>>> >>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>>>> >>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>>>> >>>>> Contributor: Jan Civlin. >>>>> >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>>>> From jan.civlin at intel.com Thu Apr 21 19:58:18 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Thu, 21 Apr 2016 19:58:18 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <57192A60.5030206@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> <5717E0C4.8050006@oracle.com> <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> <571807F1.10105@oracle.com> <39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com> <57192A60.5030206@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C8DB@ORSMSX104.amr.corp.intel.com> I know how to run jtreg. I tested it on TestSHA.jar, admittedly it was rather a standalone run. $ /cygdrive/E/Java/sha-041116/build/windows-x86_64-normal-server-fastdebug/jdk/bin/java.exe -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:UseSSE=2 -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 TestSHA runtime = 33.507065719 seconds TestSHA throughput = 305.60718404516876 MB/s jcivlin at JCIVLIN-DESK /cygdrive/C/Java/hotspot/test/civlin/TestSHA/TestSHA/dist $ I'll take a look what is wrong with jtreg. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 21, 2016 12:31 PM To: Civlin, Jan ; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Good. But testing found that jrteg SHA tests failed because they don't expect the SHA2 is supported: "Expected message not found: 'SHA instructions are not available on this CPU'" hotspot/test/compiler/intrinsics/sha/ Do you know how to run jtreg tests? May be Sandhya or Michael can help. Thanks, Vladimir On 4/21/16 11:15 AM, Civlin, Jan wrote: > Vladimir, > > I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function. > Please look at > > http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/ > > Thank you, > > J > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 3:51 PM > To: Civlin, Jan ; hotspot compiler > > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no > supports_sha() available) > > Testing is continued but it found next problem already when running tests with -XX:UseSSE=2: > > # Internal Error > (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86. > cpp:3693), > pid=52652, tid=3587 > # Error: assert(VM_Version::supports_ssse3()) failed > > V [libjvm.dylib+0x4193d7] report_vm_error(char const*, int, char > const*, char const*, ...)+0xcd V [libjvm.dylib+0x1eedd2] > Assembler::vpshufb(XMMRegisterImpl*, > XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V > [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, > XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, > XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, > XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, > RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V > [libjvm.dylib+0xa4dc47] > StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b > > > Vladimir > > On 4/20/16 1:13 PM, Civlin, Jan wrote: >> Thank you, Vladimir. >> >> I guess it was a warning. >> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new. >> >> >> Section 6.7.2.2 of C99 lists the syntax as: >> >> enum-specifier: >> enum identifieropt { enumerator-list } >> enum identifieropt { enumerator-list , } >> enum identifier >> enumerator-list: >> enumerator >> enumerator-list , enumerator >> enumerator: >> enumeration-constant >> enumeration-constant = constant-expression >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, April 20, 2016 1:04 PM >> To: Civlin, Jan ; hotspot compiler >> >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> One thing was caught during build is ',' at the last line of enum: >> >> + STACK_SIZE = _RSP + _RSP_SIZE, >> +}; >> >> Compiler complains about it so I removed it in my local repo. >> >> Vladimir >> >> On 4/20/16 12:07 PM, Civlin, Jan wrote: >>> Thank you! >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, April 20, 2016 11:38 AM >>> To: Civlin, Jan ; hotspot compiler >>> >>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>> supports_sha() available) >>> >>> Looks good to me. I submitted testing on all platforms before integrating. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/20/16 3:11 AM, Civlin, Jan wrote: >>>> Vladimir, >>>> >>>> Please look at the updated patch at >>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >>>> >>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >>>> >>>> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >>>> >>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >>>> >>>> Thank you, >>>> >>>> J >>>> >>>> [jcivlin at HSW-EP02 TestSHA]$ >>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/j >>>> a v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 >>>> af >>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >>>> throughput = 356.09558280340946 MB/s >>>> >>>> [jcivlin at HSW-EP02 TestSHA]$ >>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin >>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 >>>> af >>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >>>> throughput = 354.1696071938408 MB/s >>>> >>>> [jcivlin at HSW-EP02 TestSHA]$ >>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin >>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 >>>> af >>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >>>> throughput = 349.01408678325697 MB/s >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Monday, April 18, 2016 5:09 PM >>>> To: Civlin, Jan; hotspot compiler >>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>>> supports_sha() available) >>>> >>>> Hi Jan, >>>> >>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >>>> >>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >>>> >>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >>>> >>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >>>> >>>> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >>>> >>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>>>> == Correction in the subject line === >>>>> >>>>> We would like to contribute the SHA256 AVX2 intrinsic. >>>>> >>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>>>> >>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>>>> >>>>> Contributor: Jan Civlin. >>>>> >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>>>> From christian.thalinger at oracle.com Thu Apr 21 21:18:36 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 21 Apr 2016 11:18:36 -1000 Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled In-Reply-To: <5718DC28.7080502@oracle.com> References: <5718DC28.7080502@oracle.com> Message-ID: Not a review but I think PostLoopMultiversioning should be a diagnostic_pd. I?ve added a comment to this Enhancement: https://bugs.openjdk.java.net/browse/JDK-8150900 > On Apr 21, 2016, at 3:56 AM, Tobias Hartmann wrote: > > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8154763 > http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/ > > JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled. > > I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check. > > Tested with regression test and RBT (running). > > Thanks, > Tobias > From christian.thalinger at oracle.com Thu Apr 21 21:34:38 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 21 Apr 2016 11:34:38 -1000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> <5717D7E8.5000108@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com> Message-ID: <74581C88-CC26-441E-933B-73954C56F077@oracle.com> > On Apr 20, 2016, at 2:13 PM, Deshpande, Vivek R wrote: > > Hi > > The correct URL for the updated webrev is this. > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ +void MacroAssembler::mathfunc(address runtime_entry) { I don?t like the name of this method. Mainly because it?s only aligning the stack (shouldn?t that happen somewhere else?) and doing this 0x20 stack frame thing which I still don?t understand. Right, this is the one I was thinking about: void MacroAssembler::call_VM_leaf_base(address entry_point, int num_args) { > Sorry for the spam. > > Regards, > Vivek > > From: Deshpande, Vivek R > Sent: Wednesday, April 20, 2016 5:10 PM > To: Deshpande, Vivek R; 'Nils Eliasson'; 'hotspot-compiler-dev at openjdk.java.net ' > Cc: 'Vladimir Kozlov'; 'Volker Simonis'; 'Christian Thalinger'; Viswanathan, Sandhya > Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > Sent out the wrong link by mistake. > > updated webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ > > Regards > Vivek > > > From: Deshpande, Vivek R > Sent: Wednesday, April 20, 2016 5:07 PM > To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net > Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya > Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > Hi Nils > > I have updated the webrev with all the suggestions. > updated webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ > Thanks for your comments and review. > > @Vladimir, > I have taken care of all the comments. Would you please review and sponsor the patch. > > Thanks and regards, > Vivek > > From: Nils Eliasson [mailto:nils.eliasson at oracle.com ] > Sent: Wednesday, April 20, 2016 12:27 PM > To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net > Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > In vmSymbols.cpp together with the other flag checks. > > Regards, > Nils > > On 2016-04-20 02:44, Deshpande, Vivek R wrote: > HI Nils > > Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static. > Could you point me the right location for the function ? > Also I have updated the webrev with rest of the comments here: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ > > Regards, > Vivek > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net ]On Behalf Of Nils Eliasson > Sent: Tuesday, April 19, 2016 5:55 AM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > Hi Vivek, > > The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. > > Regards, > Nils > > On 2016-04-18 19:38, Deshpande, Vivek R wrote: > Hi all > > I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. > This uses -XX:DisableIntrinsic option to achieve the same. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > Thanks and regards, > Vivek > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 21 22:08:43 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Apr 2016 15:08:43 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C8DB@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> <5717E0C4.8050006@oracle.com> <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> <571807F1.10105@oracle.com> <39F83597C33E5F408096702907E6C4500F16C896@ORSMSX104.amr.corp.intel.com> <57192A60.5030206@oracle.com> <39F83597C33E5F408096702907E6C4500F16C8DB@ORSMSX104.amr.corp.intel.com> Message-ID: <57194F6B.4050008@oracle.com> I think there is assumption in some tests that x86 does not support sha2. Next tests failed: compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnUnsupportedCPU.java java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA-224 and SHA-256 crypto hash functions not available on this CPU.'. JVM should start with '-XX:+UseSHA256Intrinsics' flag, but output should contain warning. compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java java.lang.AssertionError: Expected message not found: 'SHA instructions are not available on this CPU'. JVM should start with '-XX:+UseSHA' flag, but output should contain warning. compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java java.lang.RuntimeException: Unexpected count of intrinsic _sha2_implCompress is expected:false, matched: 2, suspected: 5 compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java java.lang.RuntimeException: Unexpected count of intrinsic _digestBase_implCompressMB is expected:false, matched: 1, suspected: 6 Regards, Vladimir On 4/21/16 12:58 PM, Civlin, Jan wrote: > I know how to run jtreg. > I tested it on TestSHA.jar, admittedly it was rather a standalone run. > > $ /cygdrive/E/Java/sha-041116/build/windows-x86_64-normal-server-fastdebug/jdk/bin/java.exe -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:UseSSE=2 -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 > provider = SUN > algorithm = SHA-256 > msgSize = 1024 bytes > offset = 0 > iters = 10000000 > warmupIters = 20000 > hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 > TestSHA runtime = 33.507065719 seconds > TestSHA throughput = 305.60718404516876 MB/s > > > jcivlin at JCIVLIN-DESK /cygdrive/C/Java/hotspot/test/civlin/TestSHA/TestSHA/dist > $ > > I'll take a look what is wrong with jtreg. > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 21, 2016 12:31 PM > To: Civlin, Jan ; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) > > Good. But testing found that jrteg SHA tests failed because they don't expect the SHA2 is supported: > > "Expected message not found: 'SHA instructions are not available on this CPU'" > > hotspot/test/compiler/intrinsics/sha/ > > Do you know how to run jtreg tests? May be Sandhya or Michael can help. > > Thanks, > Vladimir > > On 4/21/16 11:15 AM, Civlin, Jan wrote: >> Vladimir, >> >> I corrected the asserting guards in added instructions, also the guard for the very sha-avx2 function. >> Please look at >> >> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.02/ >> >> Thank you, >> >> J >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, April 20, 2016 3:51 PM >> To: Civlin, Jan ; hotspot compiler >> >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> Testing is continued but it found next problem already when running tests with -XX:UseSSE=2: >> >> # Internal Error >> (/opt/jprt/T/P1/185544.vkozlov/s/hotspot/src/cpu/x86/vm/assembler_x86. >> cpp:3693), >> pid=52652, tid=3587 >> # Error: assert(VM_Version::supports_ssse3()) failed >> >> V [libjvm.dylib+0x4193d7] report_vm_error(char const*, int, char >> const*, char const*, ...)+0xcd V [libjvm.dylib+0x1eedd2] >> Assembler::vpshufb(XMMRegisterImpl*, >> XMMRegisterImpl*, XMMRegisterImpl*, int)+0x4e V >> [libjvm.dylib+0x87c237] MacroAssembler::sha256_AVX2(XMMRegisterImpl*, >> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, >> XMMRegisterImpl*, XMMRegisterImpl*, XMMRegisterImpl*, >> XMMRegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, >> RegisterImpl*, RegisterImpl*, bool, XMMRegisterImpl*)+0x1297 V >> [libjvm.dylib+0xa4dc47] >> StubGenerator::generate_sha256_implCompress(bool, char const*)+0x27b >> >> >> Vladimir >> >> On 4/20/16 1:13 PM, Civlin, Jan wrote: >>> Thank you, Vladimir. >>> >>> I guess it was a warning. >>> I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new. >>> >>> >>> Section 6.7.2.2 of C99 lists the syntax as: >>> >>> enum-specifier: >>> enum identifieropt { enumerator-list } >>> enum identifieropt { enumerator-list , } >>> enum identifier >>> enumerator-list: >>> enumerator >>> enumerator-list , enumerator >>> enumerator: >>> enumeration-constant >>> enumeration-constant = constant-expression >>> >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, April 20, 2016 1:04 PM >>> To: Civlin, Jan ; hotspot compiler >>> >>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>> supports_sha() available) >>> >>> One thing was caught during build is ',' at the last line of enum: >>> >>> + STACK_SIZE = _RSP + _RSP_SIZE, >>> +}; >>> >>> Compiler complains about it so I removed it in my local repo. >>> >>> Vladimir >>> >>> On 4/20/16 12:07 PM, Civlin, Jan wrote: >>>> Thank you! >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Wednesday, April 20, 2016 11:38 AM >>>> To: Civlin, Jan ; hotspot compiler >>>> >>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>>> supports_sha() available) >>>> >>>> Looks good to me. I submitted testing on all platforms before integrating. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/20/16 3:11 AM, Civlin, Jan wrote: >>>>> Vladimir, >>>>> >>>>> Please look at the updated patch at >>>>> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >>>>> >>>>> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >>>>> >>>>> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >>>>> >>>>> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >>>>> >>>>> Thank you, >>>>> >>>>> J >>>>> >>>>> [jcivlin at HSW-EP02 TestSHA]$ >>>>> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/j >>>>> a v a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 >>>>> af >>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >>>>> throughput = 356.09558280340946 MB/s >>>>> >>>>> [jcivlin at HSW-EP02 TestSHA]$ >>>>> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin >>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 >>>>> af >>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >>>>> throughput = 354.1696071938408 MB/s >>>>> >>>>> [jcivlin at HSW-EP02 TestSHA]$ >>>>> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin >>>>> / j a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >>>>> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >>>>> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >>>>> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >>>>> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 >>>>> af >>>>> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >>>>> throughput = 349.01408678325697 MB/s >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Monday, April 18, 2016 5:09 PM >>>>> To: Civlin, Jan; hotspot compiler >>>>> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >>>>> supports_sha() available) >>>>> >>>>> Hi Jan, >>>>> >>>>> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >>>>> >>>>> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >>>>> >>>>> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >>>>> >>>>> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >>>>> >>>>> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >>>>> >>>>> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>>>>> == Correction in the subject line === >>>>>> >>>>>> We would like to contribute the SHA256 AVX2 intrinsic. >>>>>> >>>>>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>>>>> >>>>>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>>>>> >>>>>> Contributor: Jan Civlin. >>>>>> >>>>>> >>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>>>>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>>>>> From dean.long at oracle.com Thu Apr 21 22:09:31 2016 From: dean.long at oracle.com (Dean Long) Date: Thu, 21 Apr 2016 15:09:31 -0700 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <57188E03.5070303@redhat.com> References: <57188E03.5070303@redhat.com> Message-ID: Hi Roland. This sounds like it has a lot of overlap with JDK-6217251. If so, could you update JDK-6217251 explaining what more, if anything, needs to be done? thanks, dl On 4/21/2016 1:23 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8154826/webrev.00/ > > The aarch64 port implicitly transforms: > (AddP base (AddP base address (LShiftL index con)) offset) > into: > (AddP base (AddP base offset) (LShiftL index con)) > in the ad file to embed the shift (and possibly and i2l conversion) into > the addressing mode of a memory operation. Exposing that transformation > in the ideal graph allows: > > - (AddP base offset) to be scheduled (for instance outside a loop) > - multiple identical (AddP base offset) to be commoned > - (LShiftL index con) to be cloned during matching so that each memory > access has its own > > Roland. From vladimir.kozlov at oracle.com Thu Apr 21 22:18:47 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Apr 2016 15:18:47 -0700 Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled In-Reply-To: <5718DC28.7080502@oracle.com> References: <5718DC28.7080502@oracle.com> Message-ID: <571951C7.3060806@oracle.com> Looks good. Thanks, Vladimir On 4/21/16 6:56 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8154763 > http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/ > > JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled. > > I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check. > > Tested with regression test and RBT (running). > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Thu Apr 21 22:37:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Apr 2016 15:37:21 -0700 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <5718B9BC.10001@oracle.com> References: <5718B9BC.10001@oracle.com> Message-ID: <57195621.7050307@oracle.com> Hi, Zoltan I think we should change code in prefetch_allocation() instead: Node *cache_adr = new AddPNode(old_eden_top, old_eden_top, _igvn.MakeConX(step_size + distance)); These way we allow AllocatePrefetchDistance == 0 in all AllocatePrefetchStyle cases - it is consistent. Thanks, Vladimir On 4/21/16 4:30 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8153340. > > https://bugs.openjdk.java.net/browse/JDK-8153340 > > > Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way the address for the first prefetch instruction is calculated [1]: > > If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of the bits of cache_adr. That result in accesses *before* the newly allocated object. > > > Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). Unquarantine test. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ > > Testing: > - JPRT (incl. TestOptionsWithRanges.java) > - local testing on a SPARC machine. > > Thank you! > > Best regards, > > > Zoltan > > [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 From vladimir.kozlov at oracle.com Thu Apr 21 22:39:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Apr 2016 15:39:05 -0700 Subject: [9] RFR(XS): 8086057: Crash with "modified node is not on IGVN._worklist" when running with -XX:-SplitIfBlocks In-Reply-To: <5718A737.1050708@oracle.com> References: <5718A737.1050708@oracle.com> Message-ID: <57195689.1050508@oracle.com> Good. Thanks, Vladimir On 4/21/16 3:11 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8086057 > http://cr.openjdk.java.net/~thartmann/8086057/webrev.00/ > > The fastdebug VM crashes with "modified node is not on IGVN._worklist" because SuperWord::align_initial_loop_index() resets the input of the pre-loop Opaque1 node 'pre_opaq' without putting it on the IGVN worklist afterwards. This only shows up with -XX:-SplitIfBlocks because otherwise surrounding nodes are changed causing the OpagueNode to be put on the worklist. The method should use PhaseIterGVN::replace_input_of() which uses rehash_node_delayed() to ensure the modified node is put on the worklist. > > Tested with regression test, Nashorn+Octane with -XX:-SplitIfBlocks and RBT (running). > > Thanks, > Tobias > From tom.rodriguez at oracle.com Fri Apr 22 01:22:35 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 21 Apr 2016 18:22:35 -0700 Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly handle private methods in interfaces Message-ID: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com> http://cr.openjdk.java.net/~never/8152903/webrev JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes. A modified version of the test which found the issue also passes now. I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes. https://bugs.openjdk.java.net/browse/JDK-8154904 tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Fri Apr 22 03:23:32 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 21 Apr 2016 20:23:32 -0700 Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly handle private methods in interfaces In-Reply-To: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com> References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com> Message-ID: <80797340-B145-42F2-ABED-902197C9AC17@oracle.com> Looks good! igor > On Apr 21, 2016, at 6:22 PM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8152903/webrev > > JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. > > Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes. A modified version of the test which found the issue also passes now. I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes. https://bugs.openjdk.java.net/browse/JDK-8154904 > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Fri Apr 22 05:19:06 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 22 Apr 2016 07:19:06 +0200 Subject: [9] RFR(XS): 8086057: Crash with "modified node is not on IGVN._worklist" when running with -XX:-SplitIfBlocks In-Reply-To: <57195689.1050508@oracle.com> References: <5718A737.1050708@oracle.com> <57195689.1050508@oracle.com> Message-ID: <5719B44A.8080200@oracle.com> Thanks, Vladimir! Best regards, Tobias On 22.04.2016 00:39, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 4/21/16 3:11 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8086057 >> http://cr.openjdk.java.net/~thartmann/8086057/webrev.00/ >> >> The fastdebug VM crashes with "modified node is not on IGVN._worklist" because SuperWord::align_initial_loop_index() resets the input of the pre-loop Opaque1 node 'pre_opaq' without putting it on the IGVN worklist afterwards. This only shows up with -XX:-SplitIfBlocks because otherwise surrounding nodes are changed causing the OpagueNode to be put on the worklist. The method should use PhaseIterGVN::replace_input_of() which uses rehash_node_delayed() to ensure the modified node is put on the worklist. >> >> Tested with regression test, Nashorn+Octane with -XX:-SplitIfBlocks and RBT (running). >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Fri Apr 22 05:20:01 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 22 Apr 2016 07:20:01 +0200 Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled In-Reply-To: References: <5718DC28.7080502@oracle.com> Message-ID: <5719B481.2090100@oracle.com> Hi Chris, On 21.04.2016 23:18, Christian Thalinger wrote: > Not a review but I think PostLoopMultiversioning should be a diagnostic_pd. I?ve added a comment to this Enhancement: > > https://bugs.openjdk.java.net/browse/JDK-8150900 Yes, I agree. We should avoid adding too many new product flags. Thanks, Tobias > >> On Apr 21, 2016, at 3:56 AM, Tobias Hartmann wrote: >> >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8154763 >> http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/ >> >> JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled. >> >> I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check. >> >> Tested with regression test and RBT (running). >> >> Thanks, >> Tobias >> > From tobias.hartmann at oracle.com Fri Apr 22 05:20:21 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 22 Apr 2016 07:20:21 +0200 Subject: [9] RFR(XS): 8154763: Crash with "assert(RangeCheckElimination)" if RangeCheckElimination is disabled In-Reply-To: <571951C7.3060806@oracle.com> References: <5718DC28.7080502@oracle.com> <571951C7.3060806@oracle.com> Message-ID: <5719B495.9010103@oracle.com> Thanks, Vladimir! Best regards, Tobias On 22.04.2016 00:18, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/21/16 6:56 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8154763 >> http://cr.openjdk.java.net/~thartmann/8154763/webrev.00/ >> >> JDK-8151573 introduced multiversioning for range check elimination. Explicitly turning of range check elimination now crashes the VM with an assert in PhaseIdealLoop::has_range_checks() because we assume that this is only called if range check elimination is enabled. >> >> I think we should disable multiversioning if range check elimination is turned off. I added the corresponding check. >> >> Tested with regression test and RBT (running). >> >> Thanks, >> Tobias >> From rwestrel at redhat.com Fri Apr 22 12:38:32 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 22 Apr 2016 14:38:32 +0200 Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64 Message-ID: <571A1B48.8040304@redhat.com> http://cr.openjdk.java.net/~roland/8154939/webrev.00/ 8153998 added a test that assumes SuperWordLoopUnrollAnalysis is enabled but it's not the case on aarch64. As a consequence, vectorization doesn't trigger on aarch64 anymore. Roland. From nils.eliasson at oracle.com Fri Apr 22 13:53:11 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 22 Apr 2016 15:53:11 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <5717E190.5070107@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com> <5717DF06.3090305@oracle.com> <5717E190.5070107@oracle.com> Message-ID: <571A2CC7.30506@oracle.com> Hi, Background: The compilelog can get corrupted and the VM may assert on "failed: bad tag in log" when printing opto_assembly. (Print assembly turns on print opto_assembly if hs_dis is not present.) When printing opto_assembly in output.cpp we may loose the ttylock (break_tty_lock_for_safepoint) due to a safepoint in both print_metadata and dump_asm. Another thread can claim the lock and start printing. When the safepoint is over both threads will think they own the lock. The content will look ok thanks to the xml stream adding the writing thread tag to the log. The closing xml-tag has two problems: 1) It uses a raw_print and may get intermingled with other output 2) The xml tag stack tracking may see a bad sequence of tags. Solution: Retake the ttylock before printing the closing print_optoassembly tag. (I have only observed this safepoint issue with print_optoassembly.) If another tag already has the lock and is printing print_nmethod for example, print opto_assembly will block. Here we can have two variants: 1) the other thread will print something else (like print_nmethod) - then that tag will be closed before releaseing the lock, and the tag stack will be consistent but the output may look like ...... 2) the other thread is also printing opto_assembly. Then that thread may yield the look during a safepoint while the first one is retaking the look. Then we can get or Fortunately the xml stack consistency will be ok since it doesn't make any difference on what thread wrote the print_optoassembly tag. Pre-mortem If this issue pops-up again we must investigate if there are more places in the compile log code that yields the tty lock on a safepoint. NoSafePointVerifiers don't seem to check on all transitions. Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02 Testing: All regression tests. Regards, Nils Eliasson On 2016-04-20 22:07, Vladimir Kozlov wrote: > On 4/20/16 12:56 PM, Nils Eliasson wrote: >> Hi, >> >> Thanks for the help, >> >> I got it to work, and added NoSafePointVerifiers to make sure I hadn't >> missed anything. Then after many test iterations it failed again. It >> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get >> a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to >> the temporary buffer too, or relax the tag checks in the xml and accept >> that the output may need to be sorted by writer-thread before use. The >> output looks like: >> >> >> >> ... >> releases tty when blocking on a safepoint >> >> >> ... >> // back again after safepoint writing without >> ttylock now. >> // Here we fail on an assert today when we expect >> a closing print_nmethod tag >> >> >> >> This is malformed xml but has enough information to be reconstructed. >> Would this be an acceptable output? > > Yes, I think it is acceptable - we don't loose information. And it is > not worse than it was before. > > Thanks, > Vladimir > >> >> Regards, >> Nils >> >> >> On 2016-04-18 19:30, Vladimir Kozlov wrote: >>> tty would have the same problem but it use C_HEAP to allocate: >>> >>> defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) >>> defaultStream(); >>> >>> Please, look if you can do something similar. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/18/16 4:24 AM, Nils Eliasson wrote: >>>> Resizeable is better, but then we assert on expanding the stringbuffer >>>> while being under a different ResourceMark. >>>> >>>> Regards, >>>> Nils >>>> >>>> On 2016-04-15 19:44, Vladimir Kozlov wrote: >>>>> Use resizable stream: >>>>> >>>>> stringStream(size_t initial_bufsize = 256); >>>>> >>>>> 1024 may not be enough. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this fix of print opto_assembly. >>>>>> >>>>>> Summary: >>>>>> The compilelog can get corrupted and the VM may assert on "failed: >>>>>> bad tag in log". >>>>>> >>>>>> When printing assembly in output.cpp we first take the ttylock, >>>>>> print >>>>>> the head and then the method metadata. However the >>>>>> metadata printing makes a vm entry and may block for a safepoint and >>>>>> will then release the lock >>>>>> (break_tty_lock_for_safepoint). After that some of the other >>>>>> compiler >>>>>> thread that haven't safepointed will take the lock >>>>>> and the broken log will be a fact when the safepoint is over and the >>>>>> first thread starts logging again. >>>>>> >>>>>> Solution: >>>>>> Print the method metadata to a temporary buffer, then take the tty >>>>>> lock. >>>>>> >>>>>> Testing: >>>>>> Repro from bug stops failing. >>>>>> Running :hotspot_all >>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>>>>> >>>>>> Regards, >>>>>> Nils Eliasson >>>> >> From bharadwaj.yadavalli at oracle.com Fri Apr 22 16:56:24 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Fri, 22 Apr 2016 12:56:24 -0400 Subject: Missing aarch64 changes in hs due to mismerge from hs-comp to hs Message-ID: <571A57B8.3080002@oracle.com> As part of the hs-comp to hs push I did yesterday, it appears that I have mismerged the following conflicting pushes to src/cpu/aarch64/vm/macroAssembler_aarch64.hpp done on hs tree and hs-comp tree. Fix for 8153310 (http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/a07a10329f31) pushed to hs Fix for 8153713 (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f9545cf437eb) pushed to hs-comp Fix for 8153797 (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d4636cc092db) pushed to hs-comp I picked the version in hs repo and overrode the changes from hs-comp. I think the next push up to hs should consider merging both the changes in hs as well as in hs-comp. Thanks, Bharadwaj From aph at redhat.com Fri Apr 22 17:02:10 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 22 Apr 2016 18:02:10 +0100 Subject: Missing aarch64 changes in hs due to mismerge from hs-comp to hs In-Reply-To: <571A57B8.3080002@oracle.com> References: <571A57B8.3080002@oracle.com> Message-ID: <571A5912.4060101@redhat.com> On 04/22/2016 05:56 PM, S. Bharadwaj Yadavalli wrote: > As part of the hs-comp to hs push I did yesterday, it appears that I > have mismerged the following conflicting pushes to > src/cpu/aarch64/vm/macroAssembler_aarch64.hpp done on hs tree and > hs-comp tree. > > Fix for 8153310 > (http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/a07a10329f31) pushed to hs > Fix for 8153713 > (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f9545cf437eb) > pushed to hs-comp > Fix for 8153797 > (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d4636cc092db) > pushed to hs-comp > > I picked the version in hs repo and overrode the changes from hs-comp. > > I think the next push up to hs should consider merging both the changes > in hs as well as in hs-comp. OK. I think that the bustage is relatively minor. The AArch64 port is broken anyway at the moment because of one of the security patches. I'll fix it. Andrew. From bharadwaj.yadavalli at oracle.com Fri Apr 22 17:04:38 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Fri, 22 Apr 2016 13:04:38 -0400 Subject: Missing aarch64 changes in hs due to mismerge from hs-comp to hs In-Reply-To: <571A5912.4060101@redhat.com> References: <571A57B8.3080002@oracle.com> <571A5912.4060101@redhat.com> Message-ID: <571A59A6.9060902@oracle.com> Thanks, Andrew! Bharadwaj On 04/22/2016 01:02 PM, Andrew Haley wrote: > On 04/22/2016 05:56 PM, S. Bharadwaj Yadavalli wrote: >> As part of the hs-comp to hs push I did yesterday, it appears that I >> have mismerged the following conflicting pushes to >> src/cpu/aarch64/vm/macroAssembler_aarch64.hpp done on hs tree and >> hs-comp tree. >> >> Fix for 8153310 >> (http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/a07a10329f31) pushed to hs >> Fix for 8153713 >> (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f9545cf437eb) >> pushed to hs-comp >> Fix for 8153797 >> (http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d4636cc092db) >> pushed to hs-comp >> >> I picked the version in hs repo and overrode the changes from hs-comp. >> >> I think the next push up to hs should consider merging both the changes >> in hs as well as in hs-comp. > OK. I think that the bustage is relatively minor. The AArch64 port > is broken anyway at the moment because of one of the security patches. > I'll fix it. > > Andrew. > > From tom.rodriguez at oracle.com Fri Apr 22 17:12:46 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 22 Apr 2016 10:12:46 -0700 Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly handle private methods in interfaces In-Reply-To: <80797340-B145-42F2-ABED-902197C9AC17@oracle.com> References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com> <80797340-B145-42F2-ABED-902197C9AC17@oracle.com> Message-ID: <92ABC0EE-5375-42A3-ADEA-E0BDAE208A9B@oracle.com> Thanks! tom > On Apr 21, 2016, at 8:23 PM, Igor Veresov wrote: > > Looks good! > > igor > >> On Apr 21, 2016, at 6:22 PM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8152903/webrev >> >> JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. >> >> Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes. A modified version of the test which found the issue also passes now. I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes. https://bugs.openjdk.java.net/browse/JDK-8154904 >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Fri Apr 22 17:44:05 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 22 Apr 2016 17:44:05 +0000 Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64 In-Reply-To: <571A1B48.8040304@redhat.com> References: <571A1B48.8040304@redhat.com> Message-ID: The changes seems ok since unroll analysis is not enabled for aarch64. -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin Sent: Friday, April 22, 2016 5:39 AM To: hotspot-compiler-dev at openjdk.java.net Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64 http://cr.openjdk.java.net/~roland/8154939/webrev.00/ 8153998 added a test that assumes SuperWordLoopUnrollAnalysis is enabled but it's not the case on aarch64. As a consequence, vectorization doesn't trigger on aarch64 anymore. Roland. From vladimir.kozlov at oracle.com Fri Apr 22 18:04:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Apr 2016 11:04:28 -0700 Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64 In-Reply-To: <571A1B48.8040304@redhat.com> References: <571A1B48.8040304@redhat.com> Message-ID: Looks good. I will sponsor it after testing. Thanks, Vladimir On 4/22/16 5:38 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8154939/webrev.00/ > > 8153998 added a test that assumes SuperWordLoopUnrollAnalysis is enabled > but it's not the case on aarch64. As a consequence, vectorization > doesn't trigger on aarch64 anymore. > > Roland. > From vladimir.kozlov at oracle.com Sat Apr 23 01:00:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Apr 2016 18:00:28 -0700 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <571A2CC7.30506@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com> <5717DF06.3090305@oracle.com> <5717E190.5070107@oracle.com> <571A2CC7.30506@oracle.com> Message-ID: <571AC92C.30906@oracle.com> Yes, this fix is good enough. Usually when engineers are using print assembly they know about this problem and use CICompilerCount=1 to have only one compiler thread to print output. Thanks, Vladimir On 4/22/16 6:53 AM, Nils Eliasson wrote: > Hi, > > Background: > The compilelog can get corrupted and the VM may assert on "failed: bad tag in log" when printing opto_assembly. (Print assembly turns on print opto_assembly if hs_dis is not present.) > > When printing opto_assembly in output.cpp we may loose the ttylock (break_tty_lock_for_safepoint) due to a safepoint in both print_metadata and dump_asm. Another thread can claim the lock and start > printing. When the safepoint is over both threads will think they own the lock. The content will look ok thanks to the xml stream adding the writing thread tag to the log. The closing xml-tag has two > problems: 1) It uses a raw_print and may get intermingled with other output 2) The xml tag stack tracking may see a bad sequence of tags. > > Solution: > Retake the ttylock before printing the closing print_optoassembly tag. (I have only observed this safepoint issue with print_optoassembly.) > > If another tag already has the lock and is printing print_nmethod for example, print opto_assembly will block. Here we can have two variants: > > 1) the other thread will print something else (like print_nmethod) - then that tag will be closed before releaseing the lock, and the tag stack will be consistent but the output may look like > ...... > > 2) the other thread is also printing opto_assembly. Then that thread may yield the look during a safepoint while the first one is retaking the look. Then we can get id=1> or (for id 1)> > > Fortunately the xml stack consistency will be ok since it doesn't make any difference on what thread wrote the print_optoassembly tag. > > Pre-mortem > If this issue pops-up again we must investigate if there are more places in the compile log code that yields the tty lock on a safepoint. NoSafePointVerifiers don't seem to check on all transitions. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 > Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02 > > Testing: > All regression tests. > > Regards, > Nils Eliasson > > On 2016-04-20 22:07, Vladimir Kozlov wrote: >> On 4/20/16 12:56 PM, Nils Eliasson wrote: >>> Hi, >>> >>> Thanks for the help, >>> >>> I got it to work, and added NoSafePointVerifiers to make sure I hadn't >>> missed anything. Then after many test iterations it failed again. It >>> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get >>> a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to >>> the temporary buffer too, or relax the tag checks in the xml and accept >>> that the output may need to be sorted by writer-thread before use. The >>> output looks like: >>> >>> >>> >>> ... >>> releases tty when blocking on a safepoint >>> >>> >>> ... >>> // back again after safepoint writing without >>> ttylock now. >>> // Here we fail on an assert today when we expect >>> a closing print_nmethod tag >>> >>> >>> >>> This is malformed xml but has enough information to be reconstructed. >>> Would this be an acceptable output? >> >> Yes, I think it is acceptable - we don't loose information. And it is not worse than it was before. >> >> Thanks, >> Vladimir >> >>> >>> Regards, >>> Nils >>> >>> >>> On 2016-04-18 19:30, Vladimir Kozlov wrote: >>>> tty would have the same problem but it use C_HEAP to allocate: >>>> >>>> defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) >>>> defaultStream(); >>>> >>>> Please, look if you can do something similar. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/18/16 4:24 AM, Nils Eliasson wrote: >>>>> Resizeable is better, but then we assert on expanding the stringbuffer >>>>> while being under a different ResourceMark. >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>> On 2016-04-15 19:44, Vladimir Kozlov wrote: >>>>>> Use resizable stream: >>>>>> >>>>>> stringStream(size_t initial_bufsize = 256); >>>>>> >>>>>> 1024 may not be enough. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this fix of print opto_assembly. >>>>>>> >>>>>>> Summary: >>>>>>> The compilelog can get corrupted and the VM may assert on "failed: >>>>>>> bad tag in log". >>>>>>> >>>>>>> When printing assembly in output.cpp we first take the ttylock, print >>>>>>> the head and then the method metadata. However the >>>>>>> metadata printing makes a vm entry and may block for a safepoint and >>>>>>> will then release the lock >>>>>>> (break_tty_lock_for_safepoint). After that some of the other compiler >>>>>>> thread that haven't safepointed will take the lock >>>>>>> and the broken log will be a fact when the safepoint is over and the >>>>>>> first thread starts logging again. >>>>>>> >>>>>>> Solution: >>>>>>> Print the method metadata to a temporary buffer, then take the tty >>>>>>> lock. >>>>>>> >>>>>>> Testing: >>>>>>> Repro from bug stops failing. >>>>>>> Running :hotspot_all >>>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>>>>>> >>>>>>> Regards, >>>>>>> Nils Eliasson >>>>> >>> > From vivek.r.deshpande at intel.com Sat Apr 23 01:10:08 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Sat, 23 Apr 2016 01:10:08 +0000 Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com> Hi all I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154975 webrev: http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sat Apr 23 01:54:04 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Apr 2016 18:54:04 -0700 Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com> Message-ID: <571AD5BC.10309@oracle.com> Hi Vivek, How it is related to next macro instructions from JDK-8153998? + // special instructions for EVEX + void setvectmask(Register dst, Register src); + void restorevectmask(); Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code. _programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking I don't like next code in assembler instructions: + if (zeroing) attributes.set_is_clear_context(); + if (!no_reg_mask) { + attributes.set_embedded_opmask_register_specifier(mask); + if (zeroing) attributes.set_is_clear_context(); + } zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them. _embedded_opmask_register_specifier is not used (only set). Don't add values which are not used. Thanks, Vladimir On 4/22/16 6:10 PM, Deshpande, Vivek R wrote: > Hi all > > I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154975 > > webrev: > > http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/ > > Thanks and regards, > > Vivek > From michael.c.berg at intel.com Sat Apr 23 02:30:00 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sat, 23 Apr 2016 02:30:00 +0000 Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 In-Reply-To: <571AD5BC.10309@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com> <571AD5BC.10309@oracle.com> Message-ID: Vladimir, it Is not related to 8153998. It seems his patch is not sync'd against the head of the tree. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 22, 2016 6:54 PM To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net Cc: Viswanathan, Sandhya ; Berg, Michael C Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 Hi Vivek, How it is related to next macro instructions from JDK-8153998? + // special instructions for EVEX + void setvectmask(Register dst, Register src); void + restorevectmask(); Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code. _programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking I don't like next code in assembler instructions: + if (zeroing) attributes.set_is_clear_context(); + if (!no_reg_mask) { + attributes.set_embedded_opmask_register_specifier(mask); + if (zeroing) attributes.set_is_clear_context(); + } zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them. _embedded_opmask_register_specifier is not used (only set). Don't add values which are not used. Thanks, Vladimir On 4/22/16 6:10 PM, Deshpande, Vivek R wrote: > Hi all > > I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154975 > > webrev: > > http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/ > > Thanks and regards, > > Vivek > From michael.c.berg at intel.com Sun Apr 24 02:14:14 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sun, 24 Apr 2016 02:14:14 +0000 Subject: CR for RFR 8154896 Message-ID: Hi Folks, I would like to contribute a bug fix for SKX/EVEX code gen. There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX. This patch address the minimal set of changes which can have this issue. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896 webrev: http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/ Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Sun Apr 24 02:23:57 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sun, 24 Apr 2016 02:23:57 +0000 Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com> <571AD5BC.10309@oracle.com> Message-ID: More info: Setvectmask and restorevectmask are used for auto code generation. The method attached here is used entirely for stub code assembly and is intended only for stub code usage and checked via assertions as such. Regards, Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Friday, April 22, 2016 7:30 PM To: Vladimir Kozlov ; Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 Vladimir, it Is not related to 8153998. It seems his patch is not sync'd against the head of the tree. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 22, 2016 6:54 PM To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net Cc: Viswanathan, Sandhya ; Berg, Michael C Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 Hi Vivek, How it is related to next macro instructions from JDK-8153998? + // special instructions for EVEX + void setvectmask(Register dst, Register src); void + restorevectmask(); Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code. _programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking I don't like next code in assembler instructions: + if (zeroing) attributes.set_is_clear_context(); + if (!no_reg_mask) { + attributes.set_embedded_opmask_register_specifier(mask); + if (zeroing) attributes.set_is_clear_context(); + } zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them. _embedded_opmask_register_specifier is not used (only set). Don't add values which are not used. Thanks, Vladimir On 4/22/16 6:10 PM, Deshpande, Vivek R wrote: > Hi all > > I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154975 > > webrev: > > http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/ > > Thanks and regards, > > Vivek > From tobias.hartmann at oracle.com Mon Apr 25 06:38:03 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 25 Apr 2016 08:38:03 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <571784C7.6020304@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> Message-ID: <571DBB4B.2070801@oracle.com> Hi, I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). If there are no objections, I would like to push the basic version: http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ If unaligned performance turns out to be a problem, we can still improve the intrinsic. Thanks, Tobias On 20.04.2016 15:31, Tobias Hartmann wrote: > Hi John, > > On 20.04.2016 03:46, John Rose wrote: >> So I started looking at your code and my inner SPARC junkie took over. >> >> This is what happened: >> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ > > Thanks a lot for having a look! > >> Perhaps there are some ideas that might be helpful: >> - The rampdown logic can lose a couple of instructions by using xorcc and movr. > > Right, this simplifies the code a bit: > http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ > > I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? > >> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? > > I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: > http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 > > Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. > >> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). > > Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. > > What do you think? > > Thanks, > Tobias > > [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx > [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ > [3] Runtime alignment checks: > > bind(Lunaligned); > Label next; > xor3(ary1, ary2, tmp); > and3(tmp, 7, tmp); > br_null_short(tmp, Assembler::pn, next); > STOP("One array is unaligned!"); > should_not_reach_here(); > bind(next); > STOP("Both arrays are unaligned!"); > >> On the other hand, what you wrote is nice and simple. >> >> HTH >> ? John >> >> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >> versions of misalignment, still with vectorization, as with the arraycopy stubs. >> But that's neither nice nor simple. >> >>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>> >>> Thanks, Vladimir! >>> >>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>> >>> Okay, I'll push the basic version. >>> >>> For reference, here are the results on a SPARC T4: >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>> >>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>> We do have arraycopy code for it but by default we don't use it: >>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>> >>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>> >>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>> >>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>> ArraycopySrcPrefetchDistance (42) must be 0 >>> Error: Could not create the Java Virtual Machine. >>> Error: A fatal exception has occurred. Program will exit >>> >>> Thanks, >>> Tobias >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> please review the following enhancement: >>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>> >>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>> >>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>> >>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>> >>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>> >>>>> I evaluated the following three versions of the patch. >>>>> >>>>> -- Basic -- >>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>> >>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>> >>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>> Version "small" tries to improve this. >>>>> >>>>> -- Prefetching -- >>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>> >>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>> >>>>> -- Small -- >>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>> >>>>> The numbers can be found here: >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>> >>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>> >>>>> What do you think? >>>>> >>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>> [3] Microbenchmark results for the "basic" implementation >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>>>> From mikael.gerdin at oracle.com Mon Apr 25 07:59:17 2016 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Mon, 25 Apr 2016 09:59:17 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <571DBB4B.2070801@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> Message-ID: <571DCE55.6070506@oracle.com> Hi Tobias On 2016-04-25 08:38, Tobias Hartmann wrote: > Hi, > > I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. With compressed oops enabled the mark word + compressed class + alength add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. /Mikael > > If there are no objections, I would like to push the basic version: > http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ > > If unaligned performance turns out to be a problem, we can still improve the intrinsic. > > Thanks, > Tobias > > On 20.04.2016 15:31, Tobias Hartmann wrote: >> Hi John, >> >> On 20.04.2016 03:46, John Rose wrote: >>> So I started looking at your code and my inner SPARC junkie took over. >>> >>> This is what happened: >>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >> >> Thanks a lot for having a look! >> >>> Perhaps there are some ideas that might be helpful: >>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >> >> Right, this simplifies the code a bit: >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >> >> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >> >>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >> >> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >> >> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >> >>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >> >> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >> >> What do you think? >> >> Thanks, >> Tobias >> >> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >> [3] Runtime alignment checks: >> >> bind(Lunaligned); >> Label next; >> xor3(ary1, ary2, tmp); >> and3(tmp, 7, tmp); >> br_null_short(tmp, Assembler::pn, next); >> STOP("One array is unaligned!"); >> should_not_reach_here(); >> bind(next); >> STOP("Both arrays are unaligned!"); >> >>> On the other hand, what you wrote is nice and simple. >>> >>> HTH >>> ? John >>> >>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>> But that's neither nice nor simple. >>> >>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>> >>>> Thanks, Vladimir! >>>> >>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>> >>>> Okay, I'll push the basic version. >>>> >>>> For reference, here are the results on a SPARC T4: >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>> >>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>> We do have arraycopy code for it but by default we don't use it: >>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>> >>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>> >>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>> >>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>> Error: Could not create the Java Virtual Machine. >>>> Error: A fatal exception has occurred. Program will exit >>>> >>>> Thanks, >>>> Tobias >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> please review the following enhancement: >>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>> >>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>> >>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>> >>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>> >>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>> >>>>>> I evaluated the following three versions of the patch. >>>>>> >>>>>> -- Basic -- >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>> >>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>> >>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>> Version "small" tries to improve this. >>>>>> >>>>>> -- Prefetching -- >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>> >>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>> >>>>>> -- Small -- >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>> >>>>>> The numbers can be found here: >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>> >>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>> >>>>>> What do you think? >>>>>> >>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>>>>> From nils.eliasson at oracle.com Mon Apr 25 08:33:20 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 25 Apr 2016 10:33:20 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <571AC92C.30906@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com> <5717DF06.3090305@oracle.com> <5717E190.5070107@oracle.com> <571A2CC7.30506@oracle.com> <571AC92C.30906@oracle.com> Message-ID: <571DD650.7070400@oracle.com> Thank you Vladimir! //Nils On 2016-04-23 03:00, Vladimir Kozlov wrote: > Yes, this fix is good enough. > > Usually when engineers are using print assembly they know about this > problem and use CICompilerCount=1 to have only one compiler thread to > print output. > > Thanks, > Vladimir > > On 4/22/16 6:53 AM, Nils Eliasson wrote: >> Hi, >> >> Background: >> The compilelog can get corrupted and the VM may assert on "failed: >> bad tag in log" when printing opto_assembly. (Print assembly turns on >> print opto_assembly if hs_dis is not present.) >> >> When printing opto_assembly in output.cpp we may loose the ttylock >> (break_tty_lock_for_safepoint) due to a safepoint in both >> print_metadata and dump_asm. Another thread can claim the lock and start >> printing. When the safepoint is over both threads will think they own >> the lock. The content will look ok thanks to the xml stream adding >> the writing thread tag to the log. The closing xml-tag has two >> problems: 1) It uses a raw_print and may get intermingled with other >> output 2) The xml tag stack tracking may see a bad sequence of tags. >> >> Solution: >> Retake the ttylock before printing the closing print_optoassembly >> tag. (I have only observed this safepoint issue with >> print_optoassembly.) >> >> If another tag already has the lock and is printing print_nmethod for >> example, print opto_assembly will block. Here we can have two variants: >> >> 1) the other thread will print something else (like print_nmethod) - >> then that tag will be closed before releaseing the lock, and the tag >> stack will be consistent but the output may look like >> ...... >> >> >> 2) the other thread is also printing opto_assembly. Then that thread >> may yield the look during a safepoint while the first one is retaking >> the look. Then we can get > id=1>> 1)> or > id=1>> 2)>> (for id 1)> >> >> Fortunately the xml stack consistency will be ok since it doesn't >> make any difference on what thread wrote the print_optoassembly tag. >> >> Pre-mortem >> If this issue pops-up again we must investigate if there are more >> places in the compile log code that yields the tty lock on a >> safepoint. NoSafePointVerifiers don't seem to check on all transitions. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.02 >> >> Testing: >> All regression tests. >> >> Regards, >> Nils Eliasson >> >> On 2016-04-20 22:07, Vladimir Kozlov wrote: >>> On 4/20/16 12:56 PM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Thanks for the help, >>>> >>>> I got it to work, and added NoSafePointVerifiers to make sure I hadn't >>>> missed anything. Then after many test iterations it failed again. It >>>> didn't fail on the NSPV, but in dump_asm we blocked on a VM entry >>>> to get >>>> a ciSymbol->as_utf8. Now I am considering if I should direct >>>> dump_asm to >>>> the temporary buffer too, or relax the tag checks in the xml and >>>> accept >>>> that the output may need to be sorted by writer-thread before use. The >>>> output looks like: >>>> >>>> >>>> >>>> ... >>>> releases tty when blocking on a safepoint >>>> >>>> >>>> ... >>>> // back again after safepoint writing without >>>> ttylock now. >>>> // Here we fail on an assert today when we >>>> expect >>>> a closing print_nmethod tag >>>> >>>> >>>> >>>> This is malformed xml but has enough information to be reconstructed. >>>> Would this be an acceptable output? >>> >>> Yes, I think it is acceptable - we don't loose information. And it >>> is not worse than it was before. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Regards, >>>> Nils >>>> >>>> >>>> On 2016-04-18 19:30, Vladimir Kozlov wrote: >>>>> tty would have the same problem but it use C_HEAP to allocate: >>>>> >>>>> defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) >>>>> defaultStream(); >>>>> >>>>> Please, look if you can do something similar. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/18/16 4:24 AM, Nils Eliasson wrote: >>>>>> Resizeable is better, but then we assert on expanding the >>>>>> stringbuffer >>>>>> while being under a different ResourceMark. >>>>>> >>>>>> Regards, >>>>>> Nils >>>>>> >>>>>> On 2016-04-15 19:44, Vladimir Kozlov wrote: >>>>>>> Use resizable stream: >>>>>>> >>>>>>> stringStream(size_t initial_bufsize = 256); >>>>>>> >>>>>>> 1024 may not be enough. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please review this fix of print opto_assembly. >>>>>>>> >>>>>>>> Summary: >>>>>>>> The compilelog can get corrupted and the VM may assert on "failed: >>>>>>>> bad tag in log". >>>>>>>> >>>>>>>> When printing assembly in output.cpp we first take the ttylock, >>>>>>>> print >>>>>>>> the head and then the method metadata. However the >>>>>>>> metadata printing makes a vm entry and may block for a >>>>>>>> safepoint and >>>>>>>> will then release the lock >>>>>>>> (break_tty_lock_for_safepoint). After that some of the other >>>>>>>> compiler >>>>>>>> thread that haven't safepointed will take the lock >>>>>>>> and the broken log will be a fact when the safepoint is over >>>>>>>> and the >>>>>>>> first thread starts logging again. >>>>>>>> >>>>>>>> Solution: >>>>>>>> Print the method metadata to a temporary buffer, then take the tty >>>>>>>> lock. >>>>>>>> >>>>>>>> Testing: >>>>>>>> Repro from bug stops failing. >>>>>>>> Running :hotspot_all >>>>>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nils Eliasson >>>>>> >>>> >> From edward.nevill at gmail.com Mon Apr 25 09:09:33 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 25 Apr 2016 10:09:33 +0100 Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <1461172110.2941.63.camel@mylittlepony.linaroharston> References: <5714D930.4090804@redhat.com> <57163063.3020506@redhat.com> <1461172110.2941.63.camel@mylittlepony.linaroharston> Message-ID: <1461575373.10032.23.camel@mint> Hi, On Wed, 2016-04-20 at 18:08 +0100, Edward Nevill wrote: > On Tue, 2016-04-19 at 14:19 +0100, Andrew Haley wrote: > > On 04/19/2016 01:54 PM, Long Chen wrote: > > > Would this be fine? > > > > It might well be. I'd like Ed to do a few measurements of large and > > small block zeroing. My guess is that a reasonably small unrolled loop > > doing STP ZR, ZR will work better than anything else, but we'll see. > > OK. So I started by doing some basic measurements of how long it takes to clear a cache line on 3 different partners HW using 3 different methods. > I have redone these benchmarks using a JMH test provided by Andrew Haley. Thanks Andrew! The test is here http://people.linaro.org/~edward.nevill/block_zero/ArrayFill.java And the results are here http://people.linaro.org/~edward.nevill/block_zero/zva1.pdf As a reminder, the different patches are http://people.linaro.org/~edward.nevill/block_zero/stp.patch Uses stp instead of str (no use of dc zva) http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v01.patch Long Chen's V01 patch http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v02.patch Long Chen's V02 patch http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v03.patch if (!small) { } http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04.patch Long Chen's v02 patch modified to avoid unaligned stp instructions. >From this it seems that patches bzero3 and bzero4 produce better performance on all except very small zeros <= 16 bytes. bzero3 significantly larger than bzero4 and would probably need outlining. Also, this cutoff point from using stp/str instead of dc zva is set at 2 x cache lines (to guarantee there is at least 1 use of dc zva). A larger value may be better. What I propose next, is only to look at bzero3 and bzero4, to modify bzero3 to out of line the dc zva loop and to look at the cutoff point from stp/str to dc zva to determine thr optimum cutoff point. Thoughts? Ed. From edward.nevill at gmail.com Mon Apr 25 09:28:14 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 25 Apr 2016 10:28:14 +0100 Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <1461575373.10032.23.camel@mint> References: <5714D930.4090804@redhat.com> <57163063.3020506@redhat.com> <1461172110.2941.63.camel@mylittlepony.linaroharston> <1461575373.10032.23.camel@mint> Message-ID: <1461576494.10032.31.camel@mint> On Mon, 2016-04-25 at 10:09 +0100, Edward Nevill wrote: > Hi, > > > On Wed, 2016-04-20 at 18:08 +0100, Edward Nevill wrote: > > On Tue, 2016-04-19 at 14:19 +0100, Andrew Haley wrote: > > > On 04/19/2016 01:54 PM, Long Chen wrote: > > > > Would this be fine? > > > > > > It might well be. I'd like Ed to do a few measurements of large and > > > small block zeroing. My guess is that a reasonably small unrolled loop > > > doing STP ZR, ZR will work better than anything else, but we'll see. > > > > OK. So I started by doing some basic measurements of how long it takes to clear a cache line on 3 different partners HW using 3 different methods. > > > > I have redone these benchmarks using a JMH test provided by Andrew Haley. Thanks Andrew! > > The test is here > > http://people.linaro.org/~edward.nevill/block_zero/ArrayFill.java > One interesting data point is the interaction between zeroing memory and allocation prefetch. The following shows this. http://people.linaro.org/~edward.nevill/block_zero/noprefetch.pdf Peformance is improved significantly by turning off allocation prefetch. The problem is that allocation prefetch forces the cache line into L1. The zero mem then has to wait until the cache line has loaded before it can zero it. Therefore performance is much better turning off allocation prefetch altogether. When allocation prefetch is turned off, the str/stp/dc zva just zeros L2 cache and L1 remains unaffected. Also ZeroTlab should improve performance since this will reverse the order of the str/stp/dc zva and the prefetch. However, I think the tuning of prefetch / zero tlab is a separate exercise. All the best, Ed. From aph at redhat.com Mon Apr 25 10:05:32 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 25 Apr 2016 11:05:32 +0100 Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <1461575373.10032.23.camel@mint> References: <5714D930.4090804@redhat.com> <57163063.3020506@redhat.com> <1461172110.2941.63.camel@mylittlepony.linaroharston> <1461575373.10032.23.camel@mint> Message-ID: <571DEBEC.3050504@redhat.com> On 04/25/2016 10:09 AM, Edward Nevill wrote: > And the results are here > > http://people.linaro.org/~edward.nevill/block_zero/zva1.pdf Bigger numbers are worse, right? Andrew. From martin.doerr at sap.com Mon Apr 25 11:04:28 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 25 Apr 2016 11:04:28 +0000 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" Message-ID: Hi all, we have already seen such an assertion in final_graph_reshaping. We found 2 AddP nodes in a row. The ideal graph looked like this (simplified): N0 ConP N1 ConN N2 AddP(Base = N0, Address = N0) N3 AddP(Base = N0, Address = N2) Final graph reshaping visited N2 before N3 first and changed the graph: N0 ConP N1 ConN N4 DecodeN N2 AddP(Base = N4, Address = N4) N3 AddP(Base = N0, Address = N2) Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected. I made a change to reconnect N3's Base input to N4, too. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/ In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row. I wouldn't expect that to happen. Not sure if this assertion is desired. I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive. Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode. (I believe X86 is the only platform which can match the decoding in the operand in this case.) Please review. I will also need a sponsor, please. Best regards, Martin From rwestrel at redhat.com Mon Apr 25 13:18:06 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 25 Apr 2016 15:18:06 +0200 Subject: RFR(XS): 8155015: Aarch64: bad assert in spill generation code Message-ID: <571E190E.5050600@redhat.com> http://cr.openjdk.java.net/~roland/8155015/webrev.00/ I hit that broken assert when doing some testing. Roland. From tobias.hartmann at oracle.com Mon Apr 25 13:26:00 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 25 Apr 2016 15:26:00 +0200 Subject: RFR(XS): 8155015: Aarch64: bad assert in spill generation code In-Reply-To: <571E190E.5050600@redhat.com> References: <571E190E.5050600@redhat.com> Message-ID: <571E1AE8.4020000@oracle.com> Hi Roland, On 25.04.2016 15:18, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8155015/webrev.00/ > > I hit that broken assert when doing some testing. That looks good to me! Best regards, Tobias > > Roland. > From rwestrel at redhat.com Mon Apr 25 13:43:03 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 25 Apr 2016 15:43:03 +0200 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: References: <57188E03.5070303@redhat.com> Message-ID: <571E1EE7.7090204@redhat.com> Hi Dean, Thanks for looking at this. > Hi Roland. This sounds like it has a lot of overlap with JDK-6217251. > If so, could you update JDK-6217251 explaining what more, if anything, > needs to be done? It's similar indeed except that 6217251 suggests using knowledge of the loop structures to trigger the optimization. My change assumes it's much more common that the index of an array access is loop variant and the array itself is not so while it could lead to suboptimal code, it's uncommon. Roland. > > thanks, > > dl > > On 4/21/2016 1:23 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8154826/webrev.00/ >> >> The aarch64 port implicitly transforms: >> (AddP base (AddP base address (LShiftL index con)) offset) >> into: >> (AddP base (AddP base offset) (LShiftL index con)) >> in the ad file to embed the shift (and possibly and i2l conversion) into >> the addressing mode of a memory operation. Exposing that transformation >> in the ideal graph allows: >> >> - (AddP base offset) to be scheduled (for instance outside a loop) >> - multiple identical (AddP base offset) to be commoned >> - (LShiftL index con) to be cloned during matching so that each memory >> access has its own >> >> Roland. > From rwestrel at redhat.com Mon Apr 25 13:45:57 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 25 Apr 2016 15:45:57 +0200 Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64 In-Reply-To: References: <571A1B48.8040304@redhat.com> Message-ID: <571E1F95.9040309@redhat.com> Thank Vladimir and Michael for reviewing this. FWIW, I'll investigate whether it makes sense for aarch64 to use unroll analysis. If it does, then that fix isn't required. It might be nice to have anyway. What do you think? Roland. From rwestrel at redhat.com Mon Apr 25 13:50:53 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 25 Apr 2016 15:50:53 +0200 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <57188E03.5070303@redhat.com> References: <57188E03.5070303@redhat.com> Message-ID: <571E20BD.3030907@redhat.com> On 04/21/2016 10:23 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8154826/webrev.00/ > > The aarch64 port implicitly transforms: > (AddP base (AddP base address (LShiftL index con)) offset) > into: > (AddP base (AddP base offset) (LShiftL index con)) > in the ad file to embed the shift (and possibly and i2l conversion) into > the addressing mode of a memory operation. Exposing that transformation > in the ideal graph allows: > > - (AddP base offset) to be scheduled (for instance outside a loop) > - multiple identical (AddP base offset) to be commoned > - (LShiftL index con) to be cloned during matching so that each memory > access has its own Further testing revealed some problems with the previous change so here is a new webrev: http://cr.openjdk.java.net/~roland/8154826/webrev.01/ Memory access instructions only accept a shift that matches the size of the data being accessed (i.e. 2 for a 4 byte load). It's not always the case that address expressions produced by c2 follow that constraint. As expected, tt's very rare that it doesn't and seem to occur only with unsafe. I added a predicate that skips the transformation of the address subtree at the end of the optimization passes and prevent matching of the address subtree if the constraint is not met for one use. As this is uncommon, that seems good enough. Roland. From tobias.hartmann at oracle.com Mon Apr 25 14:11:54 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 25 Apr 2016 16:11:54 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <571DCE55.6070506@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> <571DCE55.6070506@oracle.com> Message-ID: <571E25AA.6090303@oracle.com> Hi Mikael, On 25.04.2016 09:59, Mikael Gerdin wrote: > Hi Tobias > > On 2016-04-25 08:38, Tobias Hartmann wrote: >> Hi, >> >> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). > > I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. > > With compressed oops enabled the mark word + compressed class + alength > add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. > If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned. Best regards, Tobias > > /Mikael > >> >> If there are no objections, I would like to push the basic version: >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >> >> If unaligned performance turns out to be a problem, we can still improve the intrinsic. >> >> Thanks, >> Tobias >> >> On 20.04.2016 15:31, Tobias Hartmann wrote: >>> Hi John, >>> >>> On 20.04.2016 03:46, John Rose wrote: >>>> So I started looking at your code and my inner SPARC junkie took over. >>>> >>>> This is what happened: >>>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >>> >>> Thanks a lot for having a look! >>> >>>> Perhaps there are some ideas that might be helpful: >>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >>> >>> Right, this simplifies the code a bit: >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >>> >>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >>> >>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >>> >>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >>> >>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >>> >>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >>> >>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >>> >>> What do you think? >>> >>> Thanks, >>> Tobias >>> >>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>> [3] Runtime alignment checks: >>> >>> bind(Lunaligned); >>> Label next; >>> xor3(ary1, ary2, tmp); >>> and3(tmp, 7, tmp); >>> br_null_short(tmp, Assembler::pn, next); >>> STOP("One array is unaligned!"); >>> should_not_reach_here(); >>> bind(next); >>> STOP("Both arrays are unaligned!"); >>> >>>> On the other hand, what you wrote is nice and simple. >>>> >>>> HTH >>>> ? John >>>> >>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>>> But that's neither nice nor simple. >>>> >>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>>> >>>>> Thanks, Vladimir! >>>>> >>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>>> >>>>> Okay, I'll push the basic version. >>>>> >>>>> For reference, here are the results on a SPARC T4: >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>>> >>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>>> We do have arraycopy code for it but by default we don't use it: >>>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>>> >>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>>> >>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>>> >>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>>> Error: Could not create the Java Virtual Machine. >>>>> Error: A fatal exception has occurred. Program will exit >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please review the following enhancement: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>>> >>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>>> >>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>>> >>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>>> >>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>>> >>>>>>> I evaluated the following three versions of the patch. >>>>>>> >>>>>>> -- Basic -- >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>> >>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>>> >>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>> Version "small" tries to improve this. >>>>>>> >>>>>>> -- Prefetching -- >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>> >>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>>> >>>>>>> -- Small -- >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>>> >>>>>>> The numbers can be found here: >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>>> >>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>>>>>> From edward.nevill at gmail.com Mon Apr 25 14:47:03 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 25 Apr 2016 15:47:03 +0100 Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <571DEBEC.3050504@redhat.com> References: <5714D930.4090804@redhat.com> <57163063.3020506@redhat.com> <1461172110.2941.63.camel@mylittlepony.linaroharston> <1461575373.10032.23.camel@mint> <571DEBEC.3050504@redhat.com> Message-ID: <1461595623.10032.37.camel@mint> On Mon, 2016-04-25 at 11:05 +0100, Andrew Haley wrote: > On 04/25/2016 10:09 AM, Edward Nevill wrote: > > And the results are here > > > > http://people.linaro.org/~edward.nevill/block_zero/zva1.pdf > > Bigger numbers are worse, right? > Right, original normalised to 100%, Regards, Ed. From martin.doerr at sap.com Mon Apr 25 16:06:54 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 25 Apr 2016 16:06:54 +0000 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <571E20BD.3030907@redhat.com> References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> Message-ID: <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> Hi Roland, seems like this issue is related to what I have sent out today: RFR(S): 8154836: VM crash due to "Base pointers must match" I also had to change the AddP case of final graph reshaping. In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode. Maybe I have to remove that part of my change, or at least adapt it. We should make sure that the changes don't get pushed on the same day. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin Sent: Montag, 25. April 2016 15:51 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode On 04/21/2016 10:23 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8154826/webrev.00/ > > The aarch64 port implicitly transforms: > (AddP base (AddP base address (LShiftL index con)) offset) > into: > (AddP base (AddP base offset) (LShiftL index con)) > in the ad file to embed the shift (and possibly and i2l conversion) into > the addressing mode of a memory operation. Exposing that transformation > in the ideal graph allows: > > - (AddP base offset) to be scheduled (for instance outside a loop) > - multiple identical (AddP base offset) to be commoned > - (LShiftL index con) to be cloned during matching so that each memory > access has its own Further testing revealed some problems with the previous change so here is a new webrev: http://cr.openjdk.java.net/~roland/8154826/webrev.01/ Memory access instructions only accept a shift that matches the size of the data being accessed (i.e. 2 for a 4 byte load). It's not always the case that address expressions produced by c2 follow that constraint. As expected, tt's very rare that it doesn't and seem to occur only with unsafe. I added a predicate that skips the transformation of the address subtree at the end of the optimization passes and prevent matching of the address subtree if the constraint is not met for one use. As this is uncommon, that seems good enough. Roland. From christian.thalinger at oracle.com Mon Apr 25 19:29:13 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 25 Apr 2016 09:29:13 -1000 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <571E25AA.6090303@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> <571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com> Message-ID: > On Apr 25, 2016, at 4:11 AM, Tobias Hartmann wrote: > > Hi Mikael, > > On 25.04.2016 09:59, Mikael Gerdin wrote: >> Hi Tobias >> >> On 2016-04-25 08:38, Tobias Hartmann wrote: >>> Hi, >>> >>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). >> >> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. >> >> With compressed oops enabled the mark word + compressed class + alength >> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. >> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. > > Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned. See arrayOop.hpp: // Returns the offset of the first element. static int base_offset_in_bytes(BasicType type) { return header_size(type) * HeapWordSize; } > > Best regards, > Tobias > >> >> /Mikael >> >>> >>> If there are no objections, I would like to push the basic version: >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>> >>> If unaligned performance turns out to be a problem, we can still improve the intrinsic. >>> >>> Thanks, >>> Tobias >>> >>> On 20.04.2016 15:31, Tobias Hartmann wrote: >>>> Hi John, >>>> >>>> On 20.04.2016 03:46, John Rose wrote: >>>>> So I started looking at your code and my inner SPARC junkie took over. >>>>> >>>>> This is what happened: >>>>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >>>> >>>> Thanks a lot for having a look! >>>> >>>>> Perhaps there are some ideas that might be helpful: >>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >>>> >>>> Right, this simplifies the code a bit: >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >>>> >>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >>>> >>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >>>> >>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >>>> >>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >>>> >>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >>>> >>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >>>> >>>> What do you think? >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>> [3] Runtime alignment checks: >>>> >>>> bind(Lunaligned); >>>> Label next; >>>> xor3(ary1, ary2, tmp); >>>> and3(tmp, 7, tmp); >>>> br_null_short(tmp, Assembler::pn, next); >>>> STOP("One array is unaligned!"); >>>> should_not_reach_here(); >>>> bind(next); >>>> STOP("Both arrays are unaligned!"); >>>> >>>>> On the other hand, what you wrote is nice and simple. >>>>> >>>>> HTH >>>>> ? John >>>>> >>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>>>> But that's neither nice nor simple. >>>>> >>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>>>> >>>>>> Thanks, Vladimir! >>>>>> >>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>>>> >>>>>> Okay, I'll push the basic version. >>>>>> >>>>>> For reference, here are the results on a SPARC T4: >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>>>> >>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>>>> We do have arraycopy code for it but by default we don't use it: >>>>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>>>> >>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>>>> >>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>>>> >>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>>>> Error: Could not create the Java Virtual Machine. >>>>>> Error: A fatal exception has occurred. Program will exit >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> please review the following enhancement: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>>>> >>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>>>> >>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>>>> >>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>>>> >>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>>>> >>>>>>>> I evaluated the following three versions of the patch. >>>>>>>> >>>>>>>> -- Basic -- >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>> >>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>>>> >>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>> Version "small" tries to improve this. >>>>>>>> >>>>>>>> -- Prefetching -- >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>> >>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>>>> >>>>>>>> -- Small -- >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>>>> >>>>>>>> The numbers can be found here: >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>>>> >>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>>>> >>>>>>>> What do you think? >>>>>>>> >>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tobias >>>>>>>> >>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Apr 25 22:40:04 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 25 Apr 2016 15:40:04 -0700 Subject: CR for RFR 8154896 In-Reply-To: References: Message-ID: <571E9CC4.2010107@oracle.com> Looks good. I will start testing. Thanks, Vladimir On 4/23/16 7:14 PM, Berg, Michael C wrote: > Hi Folks, > > I would like to contribute a bug fix for SKX/EVEX code gen. There is a > guarantee of isBit(imm8) for jccb which can sometimes fail when upper > bank register marshaling is required for instructions without EVEX or > conditionally EVEX support on SKX. This patch address the minimal set > of changes which can have this issue. > > This code was tested as follows(see jbs entry below): > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/ > > Thanks, > > Michael > From tobias.hartmann at oracle.com Tue Apr 26 11:25:02 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 26 Apr 2016 13:25:02 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> <571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com> Message-ID: <571F500E.5020904@oracle.com> Thanks, Chris! On 25.04.2016 21:29, Christian Thalinger wrote: > >> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann > wrote: >> >> Hi Mikael, >> >> On 25.04.2016 09:59, Mikael Gerdin wrote: >>> Hi Tobias >>> >>> On 2016-04-25 08:38, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). >>> >>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. >>> >>> With compressed oops enabled the mark word + compressed class + alength >>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. >>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. >> >> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned. > > See arrayOop.hpp: > > // Returns the offset of the first element. > static int base_offset_in_bytes(BasicType type) { > return header_size(type) * HeapWordSize; > } I verified that Mikael's claim is right: With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4). Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4). However, the header size is always aligned to HeapWordSize: static int header_size_in_bytes() { size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize); and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned. Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance: http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/ Thanks, Tobias >> >> Best regards, >> Tobias >> >>> >>> /Mikael >>> >>>> >>>> If there are no objections, I would like to push the basic version: >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>> >>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic. >>>> >>>> Thanks, >>>> Tobias >>>> >>>> On 20.04.2016 15:31, Tobias Hartmann wrote: >>>>> Hi John, >>>>> >>>>> On 20.04.2016 03:46, John Rose wrote: >>>>>> So I started looking at your code and my inner SPARC junkie took over. >>>>>> >>>>>> This is what happened: >>>>>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >>>>> >>>>> Thanks a lot for having a look! >>>>> >>>>>> Perhaps there are some ideas that might be helpful: >>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >>>>> >>>>> Right, this simplifies the code a bit: >>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >>>>> >>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >>>>> >>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >>>>> >>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >>>>> >>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >>>>> >>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >>>>> >>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >>>>> >>>>> What do you think? >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>> [3] Runtime alignment checks: >>>>> >>>>> bind(Lunaligned); >>>>> Label next; >>>>> xor3(ary1, ary2, tmp); >>>>> and3(tmp, 7, tmp); >>>>> br_null_short(tmp, Assembler::pn, next); >>>>> STOP("One array is unaligned!"); >>>>> should_not_reach_here(); >>>>> bind(next); >>>>> STOP("Both arrays are unaligned!"); >>>>> >>>>>> On the other hand, what you wrote is nice and simple. >>>>>> >>>>>> HTH >>>>>> ? John >>>>>> >>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>>>>> But that's neither nice nor simple. >>>>>> >>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>>>>> >>>>>>> Thanks, Vladimir! >>>>>>> >>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>>>>> >>>>>>> Okay, I'll push the basic version. >>>>>>> >>>>>>> For reference, here are the results on a SPARC T4: >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>>>>> >>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>>>>> We do have arraycopy code for it but by default we don't use it: >>>>>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>>>>> >>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>>>>> >>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>>>>> >>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>>>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>>>>> Error: Could not create the Java Virtual Machine. >>>>>>> Error: A fatal exception has occurred. Program will exit >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> please review the following enhancement: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>>>>> >>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>>>>> >>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>>>>> >>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>>>>> >>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>>>>> >>>>>>>>> I evaluated the following three versions of the patch. >>>>>>>>> >>>>>>>>> -- Basic -- >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>> >>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>>>>> >>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>> Version "small" tries to improve this. >>>>>>>>> >>>>>>>>> -- Prefetching -- >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>> >>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>>>>> >>>>>>>>> -- Small -- >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>>>>> >>>>>>>>> The numbers can be found here: >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>>>>> >>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>>>>> >>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Tobias >>>>>>>>> >>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png > From zoltan.majo at oracle.com Tue Apr 26 11:56:06 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 26 Apr 2016 13:56:06 +0200 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <57195621.7050307@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> Message-ID: <571F5756.7020007@oracle.com> Hi Vladimir, thank you for the feedback! Please see comments below. On 04/22/2016 12:37 AM, Vladimir Kozlov wrote: > Hi, Zoltan > > I think we should change code in prefetch_allocation() instead: > > Node *cache_adr = new AddPNode(old_eden_top, old_eden_top, > _igvn.MakeConX(step_size + distance)); The problem is that AllocatePrefetchStyle must align the first prefetched address to 8 bytes, otherwise the emitted stxa instruction could cause a SIGBUS. But the alignment does not have to be at step_size boundary, 8-byte alignment is sufficient. > > These way we allow AllocatePrefetchDistance == 0 in all > AllocatePrefetchStyle cases - it is consistent. Unfortunately, it is not easy to have the same limit for AllocatePrefetchDistance on all platforms. Due to the 8-byte alignment performed by compiled code, the lower limit of 8 for AllocatePrefetchDistance is needed on SPARC; the lower limit of 0 works fine on all other platforms. I've started looking at the consistency of flags controlling allocation prefetch in general, as we have other issues open related to them (e.g., JDK-8151622). We're touching related code now, so I thought, maybe it makes sense to fix all remaining issues at once. The updated webrev does the following (in addition to fixing the original problem with AllocatePrefetchDistance): 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., BIS instructions are used for prefetching). As far as I understand, AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so if BIS is enabled, we should use AllocatePrefetchStyle = 3. 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on non-SPARC platforms. 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if AllocatePrefetchStyle is 3 (due to alignment requirements). 3. Determine the number of lines to prefetch in the same way for all prefetch styles: lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : AllocatePrefetchLines Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/ Testing: - JPRT (incl. TestOptionsWithRanges) - local testing on a SPARC machine. Thank you! Best regards, Zoltan > > Thanks, > Vladimir > > On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8153340. >> >> https://bugs.openjdk.java.net/browse/JDK-8153340 >> >> >> Problem: The VM crashes if AllocatePrefetchStyle==3 and >> AllocatePrefetchDistance==0. The crash happens due to the way the >> address for the first prefetch instruction is calculated [1]: >> >> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= >> ~(AllocatePrefetchStepSize - 1) which can zero some of the bits of >> cache_adr. That result in accesses *before* the newly allocated object. >> >> >> Solution: Set lower limit of AllocatePrefetchDistance to >> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >> Unquarantine test. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >> >> Testing: >> - JPRT (incl. TestOptionsWithRanges.java) >> - local testing on a SPARC machine. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >> [1] >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 From rwestrel at redhat.com Tue Apr 26 14:29:50 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 26 Apr 2016 16:29:50 +0200 Subject: SuperWord::unrolling_analysis() question Message-ID: Why does SuperWord::unrolling_analysis() use: int max_vector = Matcher::max_vector_size(T_INT); instead of: int max_vector = Matcher::max_vector_size(T_BYTE); ? For a loop like this: static void test_byte(byte[] src, byte[] dst) { for (int i = 0; i < src.length; i++) { dst[i] = src[i]; } } It limits the size of the vectors that are used below what the hardware can support. Roland. From andreas.woess at oracle.com Fri Apr 22 14:36:07 2016 From: andreas.woess at oracle.com (Andreas Woess) Date: Fri, 22 Apr 2016 16:36:07 +0200 Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable array fields Message-ID: <571A36D7.2030701@oracle.com> Please review: http://cr.openjdk.java.net/~aw/8154765/webrev/ https://bugs.openjdk.java.net/browse/JDK-8154765 This change adds an optional stableDimensions parameter to ConstantReflectionProvider.readStableFieldValue that allows the number of stable array dimensions to be specified more fine-granular than inferring it from the type of the field. Thanks, Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.woess at oracle.com Tue Apr 26 15:42:17 2016 From: andreas.woess at oracle.com (Andreas Woess) Date: Tue, 26 Apr 2016 17:42:17 +0200 Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable array fields Message-ID: <571F8C59.8050707@oracle.com> Please review: http://cr.openjdk.java.net/~aw/8154765/webrev/ https://bugs.openjdk.java.net/browse/JDK-8154765 This change adds an optional stableDimensions parameter to ConstantReflectionProvider.readStableFieldValue that allows the number of stable array dimensions to be specified more fine-granular than inferring it from the type of the field. Thanks, Andreas From christian.thalinger at oracle.com Tue Apr 26 17:11:09 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 26 Apr 2016 07:11:09 -1000 Subject: CR for RFR 8154896 In-Reply-To: References: Message-ID: > On Apr 23, 2016, at 4:14 PM, Berg, Michael C wrote: > > <>Hi Folks, > > I would like to contribute a bug fix for SKX/EVEX code gen. The bug fix is the change in src/cpu/x86/vm/assembler_x86.cpp, correct? > There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX. This patch address the minimal set of changes which can have this issue. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896 > > webrev: > http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/ > > Thanks, > Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 26 17:56:04 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 26 Apr 2016 10:56:04 -0700 Subject: SuperWord::unrolling_analysis() question In-Reply-To: References: Message-ID: <3e4d91ad-3391-8c73-0be6-ebeddb811613@oracle.com> The question is for Michael. I would say it is typo but let Michael to answer. Thanks, Vladimir On 4/26/16 7:29 AM, Roland Westrelin wrote: > > Why does SuperWord::unrolling_analysis() use: > > int max_vector = Matcher::max_vector_size(T_INT); > > instead of: > > int max_vector = Matcher::max_vector_size(T_BYTE); > > ? > > For a loop like this: > > static void test_byte(byte[] src, byte[] dst) { > for (int i = 0; i < src.length; i++) { > dst[i] = src[i]; > } > } > > > It limits the size of the vectors that are used below what the hardware > can support. > > Roland. > From michael.c.berg at intel.com Tue Apr 26 18:15:58 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 26 Apr 2016 18:15:58 +0000 Subject: CR for RFR 8154896 In-Reply-To: References: Message-ID: No, the actual bug fix is in the jccb to jcc changes. The assembler change is a correction for compressed displacement. -Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Tuesday, April 26, 2016 10:11 AM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8154896 On Apr 23, 2016, at 4:14 PM, Berg, Michael C > wrote: Hi Folks, I would like to contribute a bug fix for SKX/EVEX code gen. The bug fix is the change in src/cpu/x86/vm/assembler_x86.cpp, correct? There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX. This patch address the minimal set of changes which can have this issue. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896 webrev: http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/ Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Tue Apr 26 18:22:43 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 26 Apr 2016 18:22:43 +0000 Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com> <571AD5BC.10309@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A893B5@ORSMSX106.amr.corp.intel.com> HI Vladimir I have updated the webrev with all suggested changes. The webrev is at this location: http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.01/ Regards Vivek -----Original Message----- From: Berg, Michael C Sent: Saturday, April 23, 2016 7:24 PM To: Vladimir Kozlov; Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 More info: Setvectmask and restorevectmask are used for auto code generation. The method attached here is used entirely for stub code assembly and is intended only for stub code usage and checked via assertions as such. Regards, Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Friday, April 22, 2016 7:30 PM To: Vladimir Kozlov ; Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 Vladimir, it Is not related to 8153998. It seems his patch is not sync'd against the head of the tree. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 22, 2016 6:54 PM To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net Cc: Viswanathan, Sandhya ; Berg, Michael C Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 Hi Vivek, How it is related to next macro instructions from JDK-8153998? + // special instructions for EVEX + void setvectmask(Register dst, Register src); void + restorevectmask(); Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code. _programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking I don't like next code in assembler instructions: + if (zeroing) attributes.set_is_clear_context(); + if (!no_reg_mask) { + attributes.set_embedded_opmask_register_specifier(mask); + if (zeroing) attributes.set_is_clear_context(); + } zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them. _embedded_opmask_register_specifier is not used (only set). Don't add values which are not used. Thanks, Vladimir On 4/22/16 6:10 PM, Deshpande, Vivek R wrote: > Hi all > > I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154975 > > webrev: > > http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/ > > Thanks and regards, > > Vivek > From michael.c.berg at intel.com Tue Apr 26 18:27:42 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 26 Apr 2016 18:27:42 +0000 Subject: SuperWord::unrolling_analysis() question In-Reply-To: References: Message-ID: Roland, The answer could be conditional if we had a machines with enough byte or short components to make vectors with, I chose INT as it is the current consistent minimum configuration for complete vector mapping. The best answer would be to create some code which mines the common type used in the current loops expressions, but I think we would be stuck with two passes over the code, the first to bind the common type, the second for finding the optimal sub vector mapping. Or possibly moving the question to the machine layer as a query, where compiler writers choose the minimum consistent configuration based on current info on the machine we compile on. -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Roland Westrelin Sent: Tuesday, April 26, 2016 7:30 AM To: hotspot-compiler-dev at openjdk.java.net Subject: SuperWord::unrolling_analysis() question Why does SuperWord::unrolling_analysis() use: int max_vector = Matcher::max_vector_size(T_INT); instead of: int max_vector = Matcher::max_vector_size(T_BYTE); ? For a loop like this: static void test_byte(byte[] src, byte[] dst) { for (int i = 0; i < src.length; i++) { dst[i] = src[i]; } } It limits the size of the vectors that are used below what the hardware can support. Roland. From christian.thalinger at oracle.com Tue Apr 26 19:27:06 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 26 Apr 2016 09:27:06 -1000 Subject: CR for RFR 8154896 In-Reply-To: References: Message-ID: > On Apr 26, 2016, at 8:15 AM, Berg, Michael C wrote: > > No, the actual bug fix is in the jccb to jcc changes. > The assembler change is a correction for compressed displacement. A ?correction?? :-) Anyway, looks good. > > -Michael > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Tuesday, April 26, 2016 10:11 AM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8154896 > > > On Apr 23, 2016, at 4:14 PM, Berg, Michael C > wrote: > > <>Hi Folks, > > I would like to contribute a bug fix for SKX/EVEX code gen. > > The bug fix is the change in src/cpu/x86/vm/assembler_x86.cpp, correct? > > > There is a guarantee of isBit(imm8) for jccb which can sometimes fail when upper bank register marshaling is required for instructions without EVEX or conditionally EVEX support on SKX. This patch address the minimal set of changes which can have this issue. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154896 > > webrev: > http://cr.openjdk.java.net/~mcberg/8154896/webrev.01/ > > Thanks, > Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Apr 26 20:33:38 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 26 Apr 2016 10:33:38 -1000 Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable array fields In-Reply-To: <571A36D7.2030701@oracle.com> References: <571A36D7.2030701@oracle.com> Message-ID: Looks good. > On Apr 22, 2016, at 4:36 AM, Andreas Woess wrote: > > Please review: > http://cr.openjdk.java.net/~aw/8154765/webrev/ > https://bugs.openjdk.java.net/browse/JDK-8154765 > > This change adds an optional stableDimensions parameter to ConstantReflectionProvider.readStableFieldValue that allows the number of stable array dimensions to be specified more fine-granular than inferring it from the type of the field. > > Thanks, > Andreas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 26 22:39:14 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 26 Apr 2016 15:39:14 -0700 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <571F5756.7020007@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> <571F5756.7020007@oracle.com> Message-ID: <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> On 4/26/16 4:56 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! Please see comments below. > > On 04/22/2016 12:37 AM, Vladimir Kozlov wrote: >> Hi, Zoltan >> >> I think we should change code in prefetch_allocation() instead: >> >> Node *cache_adr = new AddPNode(old_eden_top, old_eden_top, >> _igvn.MakeConX(step_size + distance)); > > The problem is that AllocatePrefetchStyle must align the first prefetched address to 8 bytes, otherwise the emitted stxa > instruction could cause a SIGBUS. But the alignment does not have to be at step_size boundary, 8-byte alignment is > sufficient. Actually it has to be step_size (cache line size) aligned - BIS instruction is triggered when the address of stxa is the beginning of cache line for AllocatePrefetchStyle == 3. If it is just 8 bytes aligned it will be simple store. We should require AllocatePrefetchStepSize to be 8 bytes aligned in vm_version_sparc.cpp instead of: + // BIS instructions require 8-byte aligned addresses + Node* mask = _igvn.MakeConX(~(intptr_t)(wordSize - 1)); > >> >> These way we allow AllocatePrefetchDistance == 0 in all AllocatePrefetchStyle cases - it is consistent. > > Unfortunately, it is not easy to have the same limit for AllocatePrefetchDistance on all platforms. Due to the 8-byte > alignment performed by compiled code, the lower limit of 8 for AllocatePrefetchDistance is needed on SPARC; the lower > limit of 0 works fine on all other platforms. > > I've started looking at the consistency of flags controlling allocation prefetch in general, as we have other issues > open related to them (e.g., JDK-8151622). We're touching related code now, so I thought, maybe it makes sense to fix all > remaining issues at once. Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it should be used only if UseTLAB is true. Please, look. And I think Abstract_VM_Version::_reserve_for_allocation_prefetch should be set for all styles on all platforms to avoid accessing beyond heap. Prefetch instructions doc say that they does not trap but we should be careful. > > The updated webrev does the following (in addition to fixing the original problem with AllocatePrefetchDistance): > > 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., BIS instructions are used for prefetching). As > far as I understand, AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so if BIS is enabled, we > should use AllocatePrefetchStyle = 3. Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select AllocatePrefetchStyle = 3. But we should allow AllocatePrefetchStyle = 3 if normal prefetch instructions (or other platforms) are used. I think we should update comment in globals.hpp to say "generate one prefetch per cache line" without saying BIS. But I agree if BIS is not available we should not use BIS AllocatePrefetchInstr = 1. > > 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on non-SPARC platforms. It could be useful on other platforms since it does one access per cache line. > > 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if AllocatePrefetchStyle is 3 (due to alignment requirements). That is correct since stxa requires at least 8 bytes alignment (as stx). > > 3. Determine the number of lines to prefetch in the same way for all prefetch styles: > lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : AllocatePrefetchLines Agree. > > Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/ vm_version_sparc.cpp AllocatePrefetchInstr = 0 should be set for all styles (not only 1) when BIS is not available. Thanks, Vladimir > > Testing: > - JPRT (incl. TestOptionsWithRanges) > - local testing on a SPARC machine. > > Thank you! > > Best regards, > > > Zoltan > > >> >> Thanks, >> Vladimir >> >> On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the patch for 8153340. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153340 >>> >>> >>> Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way >>> the address for the first prefetch instruction is calculated [1]: >>> >>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of >>> the bits of cache_adr. That result in accesses *before* the newly allocated object. >>> >>> >>> Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >>> Unquarantine test. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >>> >>> Testing: >>> - JPRT (incl. TestOptionsWithRanges.java) >>> - local testing on a SPARC machine. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 > From vladimir.kozlov at oracle.com Wed Apr 27 01:03:53 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 26 Apr 2016 18:03:53 -0700 Subject: RF(XS): 8154939: 8153998 broke vectorization on aarch64 In-Reply-To: <571E1F95.9040309@redhat.com> References: <571A1B48.8040304@redhat.com> <571E1F95.9040309@redhat.com> Message-ID: <57200FF9.5000100@oracle.com> The fix is general and helps other platforms. I am waiting results of testing (this weekend and yesterday it was interrupted quarterly maintenance). I will push it when I get results. Thanks, Vladimir On 4/25/16 6:45 AM, Roland Westrelin wrote: > Thank Vladimir and Michael for reviewing this. > > FWIW, I'll investigate whether it makes sense for aarch64 to use unroll > analysis. If it does, then that fix isn't required. It might be nice to > have anyway. What do you think? > > Roland. > From vladimir.kozlov at oracle.com Wed Apr 27 01:47:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 26 Apr 2016 18:47:12 -0700 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" In-Reply-To: References: Message-ID: <57201A20.7000504@oracle.com> Thank you, Martin On 4/25/16 4:04 AM, Doerr, Martin wrote: > Hi all, > > we have already seen such an assertion in final_graph_reshaping. > > We found 2 AddP nodes in a row. The ideal graph looked like this (simplified): > N0 ConP > N1 ConN > N2 AddP(Base = N0, Address = N0) > N3 AddP(Base = N0, Address = N2) > > Final graph reshaping visited N2 before N3 first and changed the graph: > N0 ConP > N1 ConN > N4 DecodeN > N2 AddP(Base = N4, Address = N4) > N3 AddP(Base = N0, Address = N2) > > Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected. > > I made a change to reconnect N3's Base input to N4, too. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/ Fix looks good. > > In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row. > I wouldn't expect that to happen. Not sure if this assertion is desired. We should not have 3rd AddP but I agree with assert. You should add additional check to the assert: || out_j->in(AddPNode::Base) != addp > > I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive. > Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode. > (I believe X86 is the only platform which can match the decoding in the operand in this case.) It is not related to the fix and should be done separately. Or don't do at all. There is comment above the code which explains why it could beneficial on SPARC too: 2845 // On sparc loading 32-bits constant and decoding it have less 2846 // instructions (4) then load 64-bits constant (7). Thanks, Vladimir > > Please review. I will also need a sponsor, please. > > Best regards, > Martin > > From vladimir.kozlov at oracle.com Wed Apr 27 05:02:55 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 26 Apr 2016 22:02:55 -0700 Subject: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A893B5@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8658C@ORSMSX106.amr.corp.intel.com> <571AD5BC.10309@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A893B5@ORSMSX106.amr.corp.intel.com> Message-ID: <572047FF.8000903@oracle.com> Looks good to me. I will start testing. Thanks, Vladimir On 4/26/16 11:22 AM, Deshpande, Vivek R wrote: > HI Vladimir > > I have updated the webrev with all suggested changes. > The webrev is at this location: > http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.01/ > > Regards > Vivek > > -----Original Message----- > From: Berg, Michael C > Sent: Saturday, April 23, 2016 7:24 PM > To: Vladimir Kozlov; Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 > > More info: > > Setvectmask and restorevectmask are used for auto code generation. > > The method attached here is used entirely for stub code assembly and is intended only for stub code usage and checked via assertions as such. > > Regards, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C > Sent: Friday, April 22, 2016 7:30 PM > To: Vladimir Kozlov ; Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 > > Vladimir, it Is not related to 8153998. > It seems his patch is not sync'd against the head of the tree. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, April 22, 2016 6:54 PM > To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net > Cc: Viswanathan, Sandhya ; Berg, Michael C > Subject: Re: RFR (S): 8154975: Update for vectorizedMismatch with AVX512 > > Hi Vivek, > > How it is related to next macro instructions from JDK-8153998? > > + // special instructions for EVEX > + void setvectmask(Register dst, Register src); void > + restorevectmask(); > > Can you reuse them? Or add variants which you can use. I see difference kmovql vs kmovdl in code. > > _programmed_mask_reg/clear_programmed_mask_reg/set_programmed_mask_reg should be named _vector_masking/clear_vector_masking/set_vector_masking > > > I don't like next code in assembler instructions: > + if (zeroing) attributes.set_is_clear_context(); > > + if (!no_reg_mask) { > + attributes.set_embedded_opmask_register_specifier(mask); > + if (zeroing) attributes.set_is_clear_context(); > + } > > zeroing is false and mask is not NULL in your code. I would prefer to have separate instructions when you need them. > _embedded_opmask_register_specifier is not used (only set). Don't add values which are not used. > > Thanks, > Vladimir > > On 4/22/16 6:10 PM, Deshpande, Vivek R wrote: >> Hi all >> >> I would like to contribute a patch with AVX512 support for the vectorizedMismatch intrinsic. >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8154975 >> >> webrev: >> >> http://cr.openjdk.java.net/~vdeshpande/8154975/webrev.00/ >> >> Thanks and regards, >> >> Vivek >> From vivek.r.deshpande at intel.com Wed Apr 27 06:53:21 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 27 Apr 2016 06:53:21 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <74581C88-CC26-441E-933B-73954C56F077@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> <5717D7E8.5000108@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A84884@ORSMSX106.amr.corp.intel.com> <74581C88-CC26-441E-933B-73954C56F077@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A89A1C@ORSMSX106.amr.corp.intel.com> Hi Christian I have updated the webrev and link for the same is here: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.03/ I am using mathfunc() to call the masm version of call_VM_leaf_base() and not the InterpreterMacroAssembler version. Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Thursday, April 21, 2016 2:35 PM To: Deshpande, Vivek R Cc: Nils Eliasson ; hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics On Apr 20, 2016, at 2:13 PM, Deshpande, Vivek R > wrote: Hi The correct URL for the updated webrev is this. http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ +void MacroAssembler::mathfunc(address runtime_entry) { I don?t like the name of this method. Mainly because it?s only aligning the stack (shouldn?t that happen somewhere else?) and doing this 0x20 stack frame thing which I still don?t understand. Right, this is the one I was thinking about: void MacroAssembler::call_VM_leaf_base(address entry_point, int num_args) { Sorry for the spam. Regards, Vivek From: Deshpande, Vivek R Sent: Wednesday, April 20, 2016 5:10 PM To: Deshpande, Vivek R; 'Nils Eliasson'; 'hotspot-compiler-dev at openjdk.java.net' Cc: 'Vladimir Kozlov'; 'Volker Simonis'; 'Christian Thalinger'; Viswanathan, Sandhya Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Sent out the wrong link by mistake. updated webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.02/ Regards Vivek From: Deshpande, Vivek R Sent: Wednesday, April 20, 2016 5:07 PM To: 'Nils Eliasson'; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya Subject: RE: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Nils I have updated the webrev with all the suggestions. updated webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Thanks for your comments and review. @Vladimir, I have taken care of all the comments. Would you please review and sponsor the patch. Thanks and regards, Vivek From: Nils Eliasson [mailto:nils.eliasson at oracle.com] Sent: Wednesday, April 20, 2016 12:27 PM To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir Kozlov; Volker Simonis; Christian Thalinger; Viswanathan, Sandhya Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In vmSymbols.cpp together with the other flag checks. Regards, Nils On 2016-04-20 02:44, Deshpande, Vivek R wrote: HI Nils Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static. Could you point me the right location for the function ? Also I have updated the webrev with rest of the comments here: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Regards, Vivek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net]On Behalf Of Nils Eliasson Sent: Tuesday, April 19, 2016 5:55 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 27 07:48:24 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Apr 2016 00:48:24 -0700 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <571F500E.5020904@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> <571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com> <571F500E.5020904@oracle.com> Message-ID: <499d9f6b-378f-e350-eb72-c35930360825@oracle.com> I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array. + // Bail out if we reached the end (but still do the comparison) + br(Assembler::positive, false, Assembler::pn, Lremaining); An other trick you can is use xorcc instead of cmp. Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()). Thanks, Vladimir On 4/26/16 4:25 AM, Tobias Hartmann wrote: > Thanks, Chris! > > On 25.04.2016 21:29, Christian Thalinger wrote: >> >>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann > wrote: >>> >>> Hi Mikael, >>> >>> On 25.04.2016 09:59, Mikael Gerdin wrote: >>>> Hi Tobias >>>> >>>> On 2016-04-25 08:38, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). >>>> >>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. >>>> >>>> With compressed oops enabled the mark word + compressed class + alength >>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. >>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. >>> >>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned. >> >> See arrayOop.hpp: >> >> // Returns the offset of the first element. >> static int base_offset_in_bytes(BasicType type) { >> return header_size(type) * HeapWordSize; >> } > > I verified that Mikael's claim is right: > With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4). > Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4). > > However, the header size is always aligned to HeapWordSize: > > static int header_size_in_bytes() { > size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize); > > and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned. > > Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance: > http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/ > > Thanks, > Tobias > >>> >>> Best regards, >>> Tobias >>> >>>> >>>> /Mikael >>>> >>>>> >>>>> If there are no objections, I would like to push the basic version: >>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>> >>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic. >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> On 20.04.2016 15:31, Tobias Hartmann wrote: >>>>>> Hi John, >>>>>> >>>>>> On 20.04.2016 03:46, John Rose wrote: >>>>>>> So I started looking at your code and my inner SPARC junkie took over. >>>>>>> >>>>>>> This is what happened: >>>>>>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >>>>>> >>>>>> Thanks a lot for having a look! >>>>>> >>>>>>> Perhaps there are some ideas that might be helpful: >>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >>>>>> >>>>>> Right, this simplifies the code a bit: >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >>>>>> >>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >>>>>> >>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >>>>>> >>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >>>>>> >>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >>>>>> >>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >>>>>> >>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >>>>>> >>>>>> What do you think? >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>> [3] Runtime alignment checks: >>>>>> >>>>>> bind(Lunaligned); >>>>>> Label next; >>>>>> xor3(ary1, ary2, tmp); >>>>>> and3(tmp, 7, tmp); >>>>>> br_null_short(tmp, Assembler::pn, next); >>>>>> STOP("One array is unaligned!"); >>>>>> should_not_reach_here(); >>>>>> bind(next); >>>>>> STOP("Both arrays are unaligned!"); >>>>>> >>>>>>> On the other hand, what you wrote is nice and simple. >>>>>>> >>>>>>> HTH >>>>>>> ? John >>>>>>> >>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>>>>>> But that's neither nice nor simple. >>>>>>> >>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>>>>>> >>>>>>>> Thanks, Vladimir! >>>>>>>> >>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>>>>>> >>>>>>>> Okay, I'll push the basic version. >>>>>>>> >>>>>>>> For reference, here are the results on a SPARC T4: >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>>>>>> >>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>>>>>> We do have arraycopy code for it but by default we don't use it: >>>>>>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>>>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>>>>>> >>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>>>>>> >>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>>>>>> >>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>>>>>> Error: Could not create the Java Virtual Machine. >>>>>>>> Error: A fatal exception has occurred. Program will exit >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tobias >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> please review the following enhancement: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>>>>>> >>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>>>>>> >>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>>>>>> >>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>>>>>> >>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>>>>>> >>>>>>>>>> I evaluated the following three versions of the patch. >>>>>>>>>> >>>>>>>>>> -- Basic -- >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>> >>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>>>>>> >>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>> Version "small" tries to improve this. >>>>>>>>>> >>>>>>>>>> -- Prefetching -- >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>> >>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>>>>>> >>>>>>>>>> -- Small -- >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>>>>>> >>>>>>>>>> The numbers can be found here: >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>>>>>> >>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>>>>>> >>>>>>>>>> What do you think? >>>>>>>>>> >>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Tobias >>>>>>>>>> >>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >> From rwestrel at redhat.com Wed Apr 27 08:00:11 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 27 Apr 2016 10:00:11 +0200 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> Message-ID: <5720718B.6020102@redhat.com> Hi Martin, > seems like this issue is related to what I have sent out today: > RFR(S): 8154836: VM crash due to "Base pointers must match" > I also had to change the AddP case of final graph reshaping. > > In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode. > > Maybe I have to remove that part of my change, or at least adapt it. > We should make sure that the changes don't get pushed on the same day. Thanks for the heads up. It looks like your change will get in before mine. I'll send an updated webrev once it's integrated. Roland. From stefan.karlsson at oracle.com Wed Apr 27 08:19:38 2016 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 27 Apr 2016 10:19:38 +0200 Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose Message-ID: <5720761A.4010004@oracle.com> Hi all, Please review this patch to silence the DirectiveParser_test internal VM test. http://cr.openjdk.java.net/~stefank/8155206/webrev.01 https://bugs.openjdk.java.net/browse/JDK-8155206 Before the patch, we got the following output when running with -XX:+ExecuteInternalVMTests: ... Running test: Test_log_file_startup_truncation Running test: Test_invalid_log_file Running test: DirectivesParser_test Internal error on line 1 byte 2: Directive missing required match. Got EOS. At '}'. {} Internal error on line 1 byte 3: Directive missing required match. At '}]'. [{}] Internal error on line 1 byte 3: Directive missing required match. At '},{}]'. [{},{}] Internal error on line 1 byte 2: Directive missing required match. At '},{}'. {},{} Syntax error on line 2 byte 3: DirectivesParser can only start with an array containing directive objects, or one single directive. At '['. [ { match: "foo/bar.*", inline : "+java/util.*", PrintAssembly: true, BreakAtExecute: true, } ] ] Value error on line 4 byte 20: The key 'PrintInlining' does not allow an array of values. At '['. PrintInlining: [ true, false ], } ] Warning: +LogCompilation must be set to enable compilation logging from directives Warning: +LogCompilation must be set to enable compilation logging from directives Value error on line 7 byte 9: Method pattern error: Missing leading inline type (+/-) At '"foo",'. "foo", "bar", ] } } ] Value error on line 8 byte 9: Method pattern error: Missing leading inline type (+/-) At '"bar",'. "bar", ] } } ] Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key. At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'. [{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}] Value error on line 5 byte 12: Key of type match needs a value of type string At 'true,'. match: true, inline: true, enable: true, c1: { preset: true, } } ] Running test: Test_TempNewSymbol Running test: VMStructs_test ... With the patch the output is much less noisy: ... Running test: Test_log_file_startup_truncation Running test: Test_invalid_log_file Running test: DirectivesParser_test Warning: +LogCompilation must be set to enable compilation logging from directives Warning: +LogCompilation must be set to enable compilation logging from directives Running test: Test_TempNewSymbol Running test: VMStructs_test ... We might want to get rid of the Warning messages, but I think the proposed patch is a good first step. You can turn on the old output with -XX:+VerboseInternalVMTests. Tested with -XX:+ExecuteInternalVMTests :) Thanks, StefanK From zoltan.majo at oracle.com Wed Apr 27 08:31:07 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 27 Apr 2016 10:31:07 +0200 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" In-Reply-To: <57201A20.7000504@oracle.com> References: <57201A20.7000504@oracle.com> Message-ID: <572078CB.9010201@oracle.com> Hi Martin, thank you for fixing this problem! The fix looks good to me as well, I have only one minor comment (please see inline). On 04/27/2016 03:47 AM, Vladimir Kozlov wrote: > [...] > >> >> I made an additional change: I think the graph transformation doesn't >> make sense if decoding is expensive. >> Therefore, I skip it on non-X86 platforms when we're running in heap >> based compressed oops mode. >> (I believe X86 is the only platform which can match the decoding in >> the operand in this case.) > > It is not related to the fix and should be done separately. Or don't > do at all. > There is comment above the code which explains why it could beneficial > on SPARC too: > > 2845 // On sparc loading 32-bits constant and decoding it have less > 2846 // instructions (4) then load 64-bits constant (7). If you decide to fix this, the condition 2856 if ((op == Op_ConN && Universe::narrow_oop_shift() != 0) || 2857 (op == Op_ConNKlass && Universe::narrow_klass_shift() != 0)) { seems better than 2856 if ((op == Op_ConN && Universe::narrow_oop_base() != NULL) || 2857 (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) { for two reasons:- on non-x86 platforms matching is guarded by 'narrow.*shift' methods - if the heap is between 4GB and 32GB, 'narrow.*base' is NULL and 'narrow.*shift' != 0 therefore the instructions are not emitted. Please correct me if I'm wrong. > [...] >> Please review. I will also need a sponsor, please. I'd be happy to sponsor the change. Thank you! Best regards, Zoltan >> >> Best regards, >> Martin >> >> From martin.doerr at sap.com Wed Apr 27 08:40:39 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 27 Apr 2016 08:40:39 +0000 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" In-Reply-To: <57201A20.7000504@oracle.com> References: <57201A20.7000504@oracle.com> Message-ID: <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap> Hi Vladimir and Zoltan, thanks for reviewing. I have made the requested changes: - Removed the code which skips the transformation on non-X86 platforms in heap-based compressed oops mode. I think we can discuss that independently. - Improved the assertion as suggested by Vladimir. New webrev is here: http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/ Zoltan, you have already offered to sponsor this change. Thanks. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Mittwoch, 27. April 2016 03:47 To: Doerr, Martin ; Zolt?n Maj? ; hotspot-compiler-dev at openjdk.java.net compiler Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match" Thank you, Martin On 4/25/16 4:04 AM, Doerr, Martin wrote: > Hi all, > > we have already seen such an assertion in final_graph_reshaping. > > We found 2 AddP nodes in a row. The ideal graph looked like this (simplified): > N0 ConP > N1 ConN > N2 AddP(Base = N0, Address = N0) > N3 AddP(Base = N0, Address = N2) > > Final graph reshaping visited N2 before N3 first and changed the graph: > N0 ConP > N1 ConN > N4 DecodeN > N2 AddP(Base = N4, Address = N4) > N3 AddP(Base = N0, Address = N2) > > Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected. > > I made a change to reconnect N3's Base input to N4, too. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/ Fix looks good. > > In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row. > I wouldn't expect that to happen. Not sure if this assertion is desired. We should not have 3rd AddP but I agree with assert. You should add additional check to the assert: || out_j->in(AddPNode::Base) != addp > > I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive. > Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode. > (I believe X86 is the only platform which can match the decoding in the operand in this case.) It is not related to the fix and should be done separately. Or don't do at all. There is comment above the code which explains why it could beneficial on SPARC too: 2845 // On sparc loading 32-bits constant and decoding it have less 2846 // instructions (4) then load 64-bits constant (7). Thanks, Vladimir > > Please review. I will also need a sponsor, please. > > Best regards, > Martin > > From zoltan.majo at oracle.com Wed Apr 27 08:44:07 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 27 Apr 2016 10:44:07 +0200 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" In-Reply-To: <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap> References: <57201A20.7000504@oracle.com> <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap> Message-ID: <57207BD7.9070701@oracle.com> Hi Martin, On 04/27/2016 10:40 AM, Doerr, Martin wrote: > Hi Vladimir and Zoltan, > > thanks for reviewing. > > I have made the requested changes: > - Removed the code which skips the transformation on non-X86 platforms in heap-based compressed oops mode. I think we can discuss that independently. > - Improved the assertion as suggested by Vladimir. > > New webrev is here: > http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/ looks good to me! > Zoltan, you have already offered to sponsor this change. Thanks. Sure. I can push the change once Vladimir has also agreed to it. Best regards, Zoltan > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Mittwoch, 27. April 2016 03:47 > To: Doerr, Martin ; Zolt?n Maj? ; hotspot-compiler-dev at openjdk.java.net compiler > Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match" > > Thank you, Martin > > On 4/25/16 4:04 AM, Doerr, Martin wrote: >> Hi all, >> >> we have already seen such an assertion in final_graph_reshaping. >> >> We found 2 AddP nodes in a row. The ideal graph looked like this (simplified): >> N0 ConP >> N1 ConN >> N2 AddP(Base = N0, Address = N0) >> N3 AddP(Base = N0, Address = N2) >> >> Final graph reshaping visited N2 before N3 first and changed the graph: >> N0 ConP >> N1 ConN >> N4 DecodeN >> N2 AddP(Base = N4, Address = N4) >> N3 AddP(Base = N0, Address = N2) >> >> Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected. >> >> I made a change to reconnect N3's Base input to N4, too. >> >> Webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/ > Fix looks good. > >> In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row. >> I wouldn't expect that to happen. Not sure if this assertion is desired. > We should not have 3rd AddP but I agree with assert. You should add additional check to the assert: > > || out_j->in(AddPNode::Base) != addp > >> I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive. >> Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode. >> (I believe X86 is the only platform which can match the decoding in the operand in this case.) > It is not related to the fix and should be done separately. Or don't do at all. > There is comment above the code which explains why it could beneficial on SPARC too: > > 2845 // On sparc loading 32-bits constant and decoding it have less > 2846 // instructions (4) then load 64-bits constant (7). > > Thanks, > Vladimir > >> Please review. I will also need a sponsor, please. >> >> Best regards, >> Martin >> >> From tobias.hartmann at oracle.com Wed Apr 27 08:45:42 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 27 Apr 2016 10:45:42 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <499d9f6b-378f-e350-eb72-c35930360825@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> <571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com> <571F500E.5020904@oracle.com> <499d9f6b-378f-e350-eb72-c35930360825@oracle.com> Message-ID: <57207C36.3030101@oracle.com> Hi Vladimir, On 27.04.2016 09:48, Vladimir Kozlov wrote: > I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array. > > + // Bail out if we reached the end (but still do the comparison) > + br(Assembler::positive, false, Assembler::pn, Lremaining); According to the SPARC manual [1], bpos branches if "not N", i.e. if limit is >= 0. I verified this with a small program (attached) using inline function templates [2]: $ cc -O main.c code.il $ ./a.out 1: 1 0: 1 -1: 0 That means the bpos branch is taken for 0 and 1 but not for -1. So the code should be correct. > An other trick you can is use xorcc instead of cmp. > Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()). Yes, John already suggested this but as I wrote in another email, it does not improve but slightly degrade performance. Thanks, Tobias [1] http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf [2] https://docs.oracle.com/cd/E26502_01/html/E28387/gmabr.html > > Thanks, > Vladimir > > On 4/26/16 4:25 AM, Tobias Hartmann wrote: >> Thanks, Chris! >> >> On 25.04.2016 21:29, Christian Thalinger wrote: >>> >>>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann > wrote: >>>> >>>> Hi Mikael, >>>> >>>> On 25.04.2016 09:59, Mikael Gerdin wrote: >>>>> Hi Tobias >>>>> >>>>> On 2016-04-25 08:38, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). >>>>> >>>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. >>>>> >>>>> With compressed oops enabled the mark word + compressed class + alength >>>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. >>>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. >>>> >>>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned. >>> >>> See arrayOop.hpp: >>> >>> // Returns the offset of the first element. >>> static int base_offset_in_bytes(BasicType type) { >>> return header_size(type) * HeapWordSize; >>> } >> >> I verified that Mikael's claim is right: >> With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4). >> Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4). >> >> However, the header size is always aligned to HeapWordSize: >> >> static int header_size_in_bytes() { >> size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize); >> >> and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned. >> >> Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance: >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/ >> >> Thanks, >> Tobias >> >>>> >>>> Best regards, >>>> Tobias >>>> >>>>> >>>>> /Mikael >>>>> >>>>>> >>>>>> If there are no objections, I would like to push the basic version: >>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>> >>>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic. >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>> On 20.04.2016 15:31, Tobias Hartmann wrote: >>>>>>> Hi John, >>>>>>> >>>>>>> On 20.04.2016 03:46, John Rose wrote: >>>>>>>> So I started looking at your code and my inner SPARC junkie took over. >>>>>>>> >>>>>>>> This is what happened: >>>>>>>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >>>>>>> >>>>>>> Thanks a lot for having a look! >>>>>>> >>>>>>>> Perhaps there are some ideas that might be helpful: >>>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >>>>>>> >>>>>>> Right, this simplifies the code a bit: >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >>>>>>> >>>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >>>>>>> >>>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >>>>>>> >>>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >>>>>>> >>>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >>>>>>> >>>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >>>>>>> >>>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >>>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>> [3] Runtime alignment checks: >>>>>>> >>>>>>> bind(Lunaligned); >>>>>>> Label next; >>>>>>> xor3(ary1, ary2, tmp); >>>>>>> and3(tmp, 7, tmp); >>>>>>> br_null_short(tmp, Assembler::pn, next); >>>>>>> STOP("One array is unaligned!"); >>>>>>> should_not_reach_here(); >>>>>>> bind(next); >>>>>>> STOP("Both arrays are unaligned!"); >>>>>>> >>>>>>>> On the other hand, what you wrote is nice and simple. >>>>>>>> >>>>>>>> HTH >>>>>>>> ? John >>>>>>>> >>>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>>>>>>> But that's neither nice nor simple. >>>>>>>> >>>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>>>>>>> >>>>>>>>> Thanks, Vladimir! >>>>>>>>> >>>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>>>>>>> >>>>>>>>> Okay, I'll push the basic version. >>>>>>>>> >>>>>>>>> For reference, here are the results on a SPARC T4: >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>>>>>>> >>>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>>>>>>> We do have arraycopy code for it but by default we don't use it: >>>>>>>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>>>>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>>>>>>> >>>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>>>>>>> >>>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>>>>>>> >>>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>>>>>>> Error: Could not create the Java Virtual Machine. >>>>>>>>> Error: A fatal exception has occurred. Program will exit >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Tobias >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> please review the following enhancement: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>>>>>>> >>>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>>>>>>> >>>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>>>>>>> >>>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>>>>>>> >>>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>>>>>>> >>>>>>>>>>> I evaluated the following three versions of the patch. >>>>>>>>>>> >>>>>>>>>>> -- Basic -- >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>>> >>>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>>>>>>> >>>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>>> Version "small" tries to improve this. >>>>>>>>>>> >>>>>>>>>>> -- Prefetching -- >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>>> >>>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>>>>>>> >>>>>>>>>>> -- Small -- >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>>>>>>> >>>>>>>>>>> The numbers can be found here: >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>>>>>>> >>>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>>>>>>> >>>>>>>>>>> What do you think? >>>>>>>>>>> >>>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Tobias >>>>>>>>>>> >>>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>>>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>> -------------- next part -------------- A non-text attachment was scrubbed... Name: code.c Type: text/x-csrc Size: 127 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main.c Type: text/x-csrc Size: 143 bytes Desc: not available URL: From martin.doerr at sap.com Wed Apr 27 08:46:46 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 27 Apr 2016 08:46:46 +0000 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" In-Reply-To: <572078CB.9010201@oracle.com> References: <57201A20.7000504@oracle.com> <572078CB.9010201@oracle.com> Message-ID: <3b9e4a9d845a4b83859c61cd6459ca24@DEWDFE13DE14.global.corp.sap> Hi Zoltan, I will comment on the skipping the transformation on non-x86 platforms in 8154826 and include Roland who is changing final graph reshaping of the AddP nodes. Best regards, Martin -----Original Message----- From: Zolt?n Maj? [mailto:zoltan.majo at oracle.com] Sent: Mittwoch, 27. April 2016 10:31 To: Vladimir Kozlov ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net compiler Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match" Hi Martin, thank you for fixing this problem! The fix looks good to me as well, I have only one minor comment (please see inline). On 04/27/2016 03:47 AM, Vladimir Kozlov wrote: > [...] > >> >> I made an additional change: I think the graph transformation doesn't >> make sense if decoding is expensive. >> Therefore, I skip it on non-X86 platforms when we're running in heap >> based compressed oops mode. >> (I believe X86 is the only platform which can match the decoding in >> the operand in this case.) > > It is not related to the fix and should be done separately. Or don't > do at all. > There is comment above the code which explains why it could beneficial > on SPARC too: > > 2845 // On sparc loading 32-bits constant and decoding it have less > 2846 // instructions (4) then load 64-bits constant (7). If you decide to fix this, the condition 2856 if ((op == Op_ConN && Universe::narrow_oop_shift() != 0) || 2857 (op == Op_ConNKlass && Universe::narrow_klass_shift() != 0)) { seems better than 2856 if ((op == Op_ConN && Universe::narrow_oop_base() != NULL) || 2857 (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) { for two reasons:- on non-x86 platforms matching is guarded by 'narrow.*shift' methods - if the heap is between 4GB and 32GB, 'narrow.*base' is NULL and 'narrow.*shift' != 0 therefore the instructions are not emitted. Please correct me if I'm wrong. > [...] >> Please review. I will also need a sponsor, please. I'd be happy to sponsor the change. Thank you! Best regards, Zoltan >> >> Best regards, >> Martin >> >> From martin.doerr at sap.com Wed Apr 27 09:11:28 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 27 Apr 2016 09:11:28 +0000 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <5720718B.6020102@redhat.com> References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> Message-ID: Hi Roland, I have removed the piece of my change which would interfere with your change. But I'd like to explain the intention of skipping the transformation on non-x86 platforms in heap-based compressed oops mode. (In simpler compressed oops modes decoding is very cheap so the transformation is probably good.) Without transformation we have: LoadConP + Storage access With transformation we have: LoadConN + DecodeN heap-based + Storage access I believe X86 is the only platform which can match the DecodeN heap-based into the Storage access. I guess other platforms should prefer the untransformed version: PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded. I didn't take a closer look at SPARC, but I thought it would use the constant pool as well. Not sure if the following comment is still correct: // On sparc loading 32-bits constant and decoding it have less // instructions (4) then load 64-bits constant (7). Therefore, I had proposed the following code to skip: + // Matching decode heap based into an operand only works on X86. + #if !defined(X86) + if ((op == Op_ConN && Universe::narrow_oop_base() != NULL) || + (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) { + break; + } + #endif + Would this be good for aarch64 as well? Would you like to include code which skips the transformation in your change or should this better be discussed independently? Best regards, Martin -----Original Message----- From: Roland Westrelin [mailto:rwestrel at redhat.com] Sent: Mittwoch, 27. April 2016 10:00 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode Hi Martin, > seems like this issue is related to what I have sent out today: > RFR(S): 8154836: VM crash due to "Base pointers must match" > I also had to change the AddP case of final graph reshaping. > > In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode. > > Maybe I have to remove that part of my change, or at least adapt it. > We should make sure that the changes don't get pushed on the same day. Thanks for the heads up. It looks like your change will get in before mine. I'll send an updated webrev once it's integrated. Roland. From nassim.halli at gmail.com Wed Apr 27 09:12:07 2016 From: nassim.halli at gmail.com (Nassim Halli) Date: Wed, 27 Apr 2016 11:12:07 +0200 Subject: Are Java native methods inlined ? Message-ID: Hi guys, Just a simple question : are Java native methods inlined ? By "native methods" I mean the interface between the Java caller and the targeted native JNI function in the dl (for which the JIT-compiled code can be observed using *+PrintNativeNMethods)*. I ask this question since I don't observe inlining with Java 8 and I heard it was. Although I understand that it could not necessarily lead to great improvements it's just for a clarification. Thanks a lot ! Nassim -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.v.raghavan at oracle.com Wed Apr 27 09:45:18 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 27 Apr 2016 02:45:18 -0700 (PDT) Subject: RFR: 8153655: Make intrinsics flags diagnostic and update intrinsics tests to enable diagnostic options. Message-ID: Hi, Please review the following patch for JDK-8153655. Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 Webrev: http://cr.openjdk.java.net/~rraghavan/8153655/webrev.00/ Notes: 1. This 8153655/webrev.00 re-includes earlier backed out, same JDK-8145348 changes (https://bugs.openjdk.java.net/browse/JDK-8145348 - Make intrinsics flags diagnostic) and also additional fixes in failing intrinsic tests. 2. Checked all the usages of changed intrinsic flags in tests and found JDK-8153655 type test failure issue (after initial JDK-8145348 fix) is present only for following tests - a. UseAESIntrinsics test (compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java) b. UseSHA* tests (at compiler/intrinsics/sha/cli/) 3. Summary of 8153655/webrev.00 changes. - Includes earlier backed out, same JDK-8145348 changes: src/share/vm/c1/c1_globals.hpp src/share/vm/opto/c2_globals.hpp src/share/vm/runtime/globals.hpp test/compiler/intrinsics/muladd/TestMulAdd.java test/compiler/runtime/6859338/Test6859338.java - 'test/compiler/cpuflags/AESIntrinsicsBase.java' Options were passed in wrong order. Changes done so that 'UnlockDiagnosticVMOptions' option precedes the diagnostic flags. - 'test/compiler/intrinsics/sha/cli/*' - (UseSHA* tests) 'UnlockDiagnosticVMOptions' option was not getting passed. Added support to precede intrinsic flag usages with explicit 'UnlockDiagnosticVMOptions'. 4. No issues found with testing done using product builds with proposed changes (hotspot/test/compiler/cpuflags/*, hotspot/test/compiler/intrinsics/*, hotspot/test/compiler/runtime/6859338/Test6859338.java) Complete pre-integration testing using product builds is in progress. Thanks, Rahul From aph at redhat.com Wed Apr 27 10:07:28 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 27 Apr 2016 11:07:28 +0100 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> Message-ID: <57208F60.5000607@redhat.com> On 27/04/16 10:11, Doerr, Martin wrote: > Would this be good for aarch64 as well? On AArch64, LoadConP is mov reg, #x movk reg, #y shl #16 movk reg, #z shl 32 (3 cycles latency) LoadConN + DecodeN heap-based mov reg, #x movk reg, #y shl #16 add reg, heapbase, reg shl #3 (4 cycles latency) Andrew. From aph at redhat.com Wed Apr 27 10:33:19 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 27 Apr 2016 11:33:19 +0100 Subject: Are Java native methods inlined ? In-Reply-To: References: Message-ID: <5720956F.1060808@redhat.com> On 27/04/16 10:12, Nassim Halli wrote: > Just a simple question : are Java native methods inlined ? Not exactly. > By "native methods" I mean the interface between the Java caller and > the targeted native JNI function in the dl (for which the > JIT-compiled code can be observed using *+PrintNativeNMethods)*. > > I ask this question since I don't observe inlining with Java 8 and I > heard it was. Although I understand that it could not necessarily > lead to great improvements it's just for a clarification. The glue code between Java and native code is not inlined into its caller but it is generated by the JIT compiler. Like every optimization in HotSpot, this is only done for hot code. If you run something like Netbeans you'll see a lot of native nmethods generated. Andrew. From roland.schatz at oracle.com Wed Apr 27 13:31:18 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Wed, 27 Apr 2016 15:31:18 +0200 Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable array fields In-Reply-To: <571A36D7.2030701@oracle.com> References: <571A36D7.2030701@oracle.com> Message-ID: <5720BF26.2050400@oracle.com> Please don't integrate this yet. We're looking for a more general solution, it's possible this won't be needed in the final version. Thanks, Roland On 04/22/2016 04:36 PM, Andreas Woess wrote: > Please review: > http://cr.openjdk.java.net/~aw/8154765/webrev/ > https://bugs.openjdk.java.net/browse/JDK-8154765 > > This change adds an optional stableDimensions parameter to > ConstantReflectionProvider.readStableFieldValue that allows the number > of stable array dimensions to be specified more fine-granular than > inferring it from the type of the field. > > Thanks, > Andreas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Wed Apr 27 14:20:08 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 27 Apr 2016 16:20:08 +0200 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> <571F5756.7020007@oracle.com> <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> Message-ID: <5720CA98.10605@oracle.com> Hi Vladimir, thank you for the feedback! On 04/27/2016 12:39 AM, Vladimir Kozlov wrote: > On 4/26/16 4:56 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! Please see comments below. >> >> On 04/22/2016 12:37 AM, Vladimir Kozlov wrote: >>> Hi, Zoltan >>> >>> I think we should change code in prefetch_allocation() instead: >>> >>> Node *cache_adr = new AddPNode(old_eden_top, old_eden_top, >>> _igvn.MakeConX(step_size + distance)); >> >> The problem is that AllocatePrefetchStyle must align the first >> prefetched address to 8 bytes, otherwise the emitted stxa >> instruction could cause a SIGBUS. But the alignment does not have to >> be at step_size boundary, 8-byte alignment is >> sufficient. > > Actually it has to be step_size (cache line size) aligned - BIS > instruction is triggered when the address of stxa is the beginning of > cache line for AllocatePrefetchStyle == 3. If it is just 8 bytes > aligned it will be simple store. Thank you for clarifying. > We should require AllocatePrefetchStepSize to be 8 bytes aligned in > vm_version_sparc.cpp instead of: > + // BIS instructions require 8-byte aligned addresses > + Node* mask = _igvn.MakeConX(~(intptr_t)(wordSize - 1)); I agree. But I'd prefer we do that in the AllocatePrefetchStepSizeConstraintFunc() constraint function in commandLineFlagConstraintsCompiler.cpp). The reason is that since JEP-245 the preferred place to validate of command-line arguments is in constraint functions. Most compiler-related checks were moved there with JDK-8078554 and JDK-8146478. I'd also like to set the minimum value for AllocatePrefetchDistance to AllocatePrefetchStepSize, otherwise we can access the heap before newly allocated objects/arrays. > > >> >>> >>> These way we allow AllocatePrefetchDistance == 0 in all >>> AllocatePrefetchStyle cases - it is consistent. >> >> Unfortunately, it is not easy to have the same limit for >> AllocatePrefetchDistance on all platforms. Due to the 8-byte >> alignment performed by compiled code, the lower limit of 8 for >> AllocatePrefetchDistance is needed on SPARC; the lower >> limit of 0 works fine on all other platforms. >> >> I've started looking at the consistency of flags controlling >> allocation prefetch in general, as we have other issues >> open related to them (e.g., JDK-8151622). We're touching related code >> now, so I thought, maybe it makes sense to fix all >> remaining issues at once. > > Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it > should be used only if UseTLAB is true. Please, look. It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true. http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844 Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, I've started an RBT run with all hotspot on all platforms using AllocatePrefetchStyle=2. So far no problems have shown up. > And I think Abstract_VM_Version::_reserve_for_allocation_prefetch > should be set for all styles on all platforms to avoid accessing > beyond heap. Prefetch instructions doc say that they does not trap but > we should be careful. I agree. That means we initialize _reserve_for_allocation_prefetch in a platform-independent way. So I think it would make sense to move that field to ThreadLocalAllocBuffer, as TLAB is the only user of that field and we don't support platform-independent initialization of Abstract_VM_Version. I did that in the updated webrev. >> >> The updated webrev does the following (in addition to fixing the >> original problem with AllocatePrefetchDistance): >> >> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 >> (i.e., BIS instructions are used for prefetching). As >> far as I understand, AllocatePrefetchStyle = 3 was added to support >> prefetching with BIS, so if BIS is enabled, we >> should use AllocatePrefetchStyle = 3. > > Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select > AllocatePrefetchStyle = 3. OK. > But we should allow AllocatePrefetchStyle = 3 if normal prefetch > instructions (or other platforms) are used. OK, I've removed that restriction. > I think we should update comment in globals.hpp to say "generate one > prefetch per cache line" without saying BIS. OK. > > But I agree if BIS is not available we should not use BIS > AllocatePrefetchInstr = 1. OK. > >> >> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on >> non-SPARC platforms. > > It could be useful on other platforms since it does one access per > cache line. OK, let's keep it available then. > > >> >> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if >> AllocatePrefetchStyle is 3 (due to alignment requirements). > > That is correct since stxa requires at least 8 bytes alignment (as stx). OK. > >> >> 3. Determine the number of lines to prefetch in the same way for all >> prefetch styles: >> lines = (prefecth instance allocation) ? >> AllocateInstancePrefetchLines : AllocatePrefetchLines > > Agree. OK. > >> >> Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/ > > vm_version_sparc.cpp > AllocatePrefetchInstr = 0 should be set for all styles (not only 1) > when BIS is not available. OK. I think it is sufficient to do 52 if (!has_blk_init()) { 53 if (AllocatePrefetchInstr == 1) { 54 warning("BIS instructions required for AllocatePrefetchInstr 1 unavailable"); 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0); 56 } I hope I'm not missing anything here. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT (incl. TestOptionsWithRanges); - local testing on SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best regards, Zoltan > > Thanks, > Vladimir > >> >> Testing: >> - JPRT (incl. TestOptionsWithRanges) >> - local testing on a SPARC machine. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8153340. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153340 >>>> >>>> >>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and >>>> AllocatePrefetchDistance==0. The crash happens due to the way >>>> the address for the first prefetch instruction is calculated [1]: >>>> >>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= >>>> ~(AllocatePrefetchStepSize - 1) which can zero some of >>>> the bits of cache_adr. That result in accesses *before* the newly >>>> allocated object. >>>> >>>> >>>> Solution: Set lower limit of AllocatePrefetchDistance to >>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >>>> Unquarantine test. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >>>> >>>> Testing: >>>> - JPRT (incl. TestOptionsWithRanges.java) >>>> - local testing on a SPARC machine. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >>>> [1] >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 >> From zoltan.majo at oracle.com Wed Apr 27 14:23:06 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 27 Apr 2016 16:23:06 +0200 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <5720CA98.10605@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> <571F5756.7020007@oracle.com> <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> <5720CA98.10605@oracle.com> Message-ID: <5720CB4A.2030904@oracle.com> On 04/27/2016 04:20 PM, Zolt?n Maj? wrote: > [...] > > OK. I think it is sufficient to do > > 52 if (!has_blk_init()) { > 53 if (AllocatePrefetchInstr == 1) { > 54 warning("BIS instructions required for AllocatePrefetchInstr 1 > unavailable"); > 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0); > 56 } I hope I'm not missing anything here. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT > (incl. TestOptionsWithRanges); - local testing on SPARC; - all hotspot > tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best > regards, Zoltan Sorry for the missing newlines above -- my mail client is playing tricks sometimes. > >> >> Thanks, >> Vladimir >> >>> >>> Testing: >>> - JPRT (incl. TestOptionsWithRanges) >>> - local testing on a SPARC machine. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >>>>> Hi, >>>>> >>>>> >>>>> please review the patch for 8153340. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8153340 >>>>> >>>>> >>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and >>>>> AllocatePrefetchDistance==0. The crash happens due to the way >>>>> the address for the first prefetch instruction is calculated [1]: >>>>> >>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= >>>>> ~(AllocatePrefetchStepSize - 1) which can zero some of >>>>> the bits of cache_adr. That result in accesses *before* the newly >>>>> allocated object. >>>>> >>>>> >>>>> Solution: Set lower limit of AllocatePrefetchDistance to >>>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >>>>> Unquarantine test. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >>>>> >>>>> Testing: >>>>> - JPRT (incl. TestOptionsWithRanges.java) >>>>> - local testing on a SPARC machine. >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>>>> [1] >>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 >>> > From vladimir.kozlov at oracle.com Wed Apr 27 14:54:14 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Apr 2016 07:54:14 -0700 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" In-Reply-To: <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap> References: <57201A20.7000504@oracle.com> <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap> Message-ID: This looks good. Thanks, Vladimir On 4/27/16 1:40 AM, Doerr, Martin wrote: > Hi Vladimir and Zoltan, > > thanks for reviewing. > > I have made the requested changes: > - Removed the code which skips the transformation on non-X86 platforms in heap-based compressed oops mode. I think we can discuss that independently. > - Improved the assertion as suggested by Vladimir. > > New webrev is here: > http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/ > > Zoltan, you have already offered to sponsor this change. Thanks. > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Mittwoch, 27. April 2016 03:47 > To: Doerr, Martin ; Zolt?n Maj? ; hotspot-compiler-dev at openjdk.java.net compiler > Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match" > > Thank you, Martin > > On 4/25/16 4:04 AM, Doerr, Martin wrote: >> Hi all, >> >> we have already seen such an assertion in final_graph_reshaping. >> >> We found 2 AddP nodes in a row. The ideal graph looked like this (simplified): >> N0 ConP >> N1 ConN >> N2 AddP(Base = N0, Address = N0) >> N3 AddP(Base = N0, Address = N2) >> >> Final graph reshaping visited N2 before N3 first and changed the graph: >> N0 ConP >> N1 ConN >> N4 DecodeN >> N2 AddP(Base = N4, Address = N4) >> N3 AddP(Base = N0, Address = N2) >> >> Afterwards, final graph reshaping visited N3 and ran into the assertion. The Base of N3 is unexpected. >> >> I made a change to reconnect N3's Base input to N4, too. >> >> Webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/ > > Fix looks good. > >> >> In addition to fixing this problem, I added an assertion to check if there are more than 3 AddP nodes in a row. >> I wouldn't expect that to happen. Not sure if this assertion is desired. > > We should not have 3rd AddP but I agree with assert. You should add additional check to the assert: > > || out_j->in(AddPNode::Base) != addp > >> >> I made an additional change: I think the graph transformation doesn't make sense if decoding is expensive. >> Therefore, I skip it on non-X86 platforms when we're running in heap based compressed oops mode. >> (I believe X86 is the only platform which can match the decoding in the operand in this case.) > > It is not related to the fix and should be done separately. Or don't do at all. > There is comment above the code which explains why it could beneficial on SPARC too: > > 2845 // On sparc loading 32-bits constant and decoding it have less > 2846 // instructions (4) then load 64-bits constant (7). > > Thanks, > Vladimir > >> >> Please review. I will also need a sponsor, please. >> >> Best regards, >> Martin >> >> From vladimir.kozlov at oracle.com Wed Apr 27 14:56:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Apr 2016 07:56:20 -0700 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <57207C36.3030101@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> <571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com> <571F500E.5020904@oracle.com> <499d9f6b-378f-e350-eb72-c35930360825@oracle.com> <57207C36.3030101@oracle.com> Message-ID: <18792245-5f8c-1f09-3359-4c89c33d17bd@oracle.com> Thanks you for verifying positive condition. Changes are good. Thanks, Vladimir On 4/27/16 1:45 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 27.04.2016 09:48, Vladimir Kozlov wrote: >> I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array. >> >> + // Bail out if we reached the end (but still do the comparison) >> + br(Assembler::positive, false, Assembler::pn, Lremaining); > > According to the SPARC manual [1], bpos branches if "not N", i.e. if limit is >= 0. I verified this with a small program (attached) using inline function templates [2]: > > $ cc -O main.c code.il > $ ./a.out > 1: 1 > 0: 1 > -1: 0 > > That means the bpos branch is taken for 0 and 1 but not for -1. So the code should be correct. > >> An other trick you can is use xorcc instead of cmp. >> Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()). > > Yes, John already suggested this but as I wrote in another email, it does not improve but slightly degrade performance. > > Thanks, > Tobias > > [1] http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf > [2] https://docs.oracle.com/cd/E26502_01/html/E28387/gmabr.html > >> >> Thanks, >> Vladimir >> >> On 4/26/16 4:25 AM, Tobias Hartmann wrote: >>> Thanks, Chris! >>> >>> On 25.04.2016 21:29, Christian Thalinger wrote: >>>> >>>>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann > wrote: >>>>> >>>>> Hi Mikael, >>>>> >>>>> On 25.04.2016 09:59, Mikael Gerdin wrote: >>>>>> Hi Tobias >>>>>> >>>>>> On 2016-04-25 08:38, Tobias Hartmann wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). >>>>>> >>>>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. >>>>>> >>>>>> With compressed oops enabled the mark word + compressed class + alength >>>>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. >>>>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. >>>>> >>>>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned. >>>> >>>> See arrayOop.hpp: >>>> >>>> // Returns the offset of the first element. >>>> static int base_offset_in_bytes(BasicType type) { >>>> return header_size(type) * HeapWordSize; >>>> } >>> >>> I verified that Mikael's claim is right: >>> With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4). >>> Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4). >>> >>> However, the header size is always aligned to HeapWordSize: >>> >>> static int header_size_in_bytes() { >>> size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize); >>> >>> and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned. >>> >>> Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance: >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/ >>> >>> Thanks, >>> Tobias >>> >>>>> >>>>> Best regards, >>>>> Tobias >>>>> >>>>>> >>>>>> /Mikael >>>>>> >>>>>>> >>>>>>> If there are no objections, I would like to push the basic version: >>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>> >>>>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic. >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>> On 20.04.2016 15:31, Tobias Hartmann wrote: >>>>>>>> Hi John, >>>>>>>> >>>>>>>> On 20.04.2016 03:46, John Rose wrote: >>>>>>>>> So I started looking at your code and my inner SPARC junkie took over. >>>>>>>>> >>>>>>>>> This is what happened: >>>>>>>>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >>>>>>>> >>>>>>>> Thanks a lot for having a look! >>>>>>>> >>>>>>>>> Perhaps there are some ideas that might be helpful: >>>>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >>>>>>>> >>>>>>>> Right, this simplifies the code a bit: >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >>>>>>>> >>>>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >>>>>>>> >>>>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >>>>>>>> >>>>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >>>>>>>> >>>>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >>>>>>>> >>>>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >>>>>>>> >>>>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >>>>>>>> >>>>>>>> What do you think? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tobias >>>>>>>> >>>>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >>>>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>> [3] Runtime alignment checks: >>>>>>>> >>>>>>>> bind(Lunaligned); >>>>>>>> Label next; >>>>>>>> xor3(ary1, ary2, tmp); >>>>>>>> and3(tmp, 7, tmp); >>>>>>>> br_null_short(tmp, Assembler::pn, next); >>>>>>>> STOP("One array is unaligned!"); >>>>>>>> should_not_reach_here(); >>>>>>>> bind(next); >>>>>>>> STOP("Both arrays are unaligned!"); >>>>>>>> >>>>>>>>> On the other hand, what you wrote is nice and simple. >>>>>>>>> >>>>>>>>> HTH >>>>>>>>> ? John >>>>>>>>> >>>>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>>>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>>>>>>>> But that's neither nice nor simple. >>>>>>>>> >>>>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>>>>>>>> >>>>>>>>>> Thanks, Vladimir! >>>>>>>>>> >>>>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>>>>>>>> >>>>>>>>>> Okay, I'll push the basic version. >>>>>>>>>> >>>>>>>>>> For reference, here are the results on a SPARC T4: >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>>>>>>>> >>>>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>>>>>>>> We do have arraycopy code for it but by default we don't use it: >>>>>>>>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>>>>>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>>>>>>>> >>>>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>>>>>>>> >>>>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>>>>>>>> >>>>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>>>>>>>> Error: Could not create the Java Virtual Machine. >>>>>>>>>> Error: A fatal exception has occurred. Program will exit >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Tobias >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> please review the following enhancement: >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>>>>>>>> >>>>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>>>>>>>> >>>>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>>>>>>>> >>>>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>>>>>>>> >>>>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>>>>>>>> >>>>>>>>>>>> I evaluated the following three versions of the patch. >>>>>>>>>>>> >>>>>>>>>>>> -- Basic -- >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>>>> >>>>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>>>>>>>> >>>>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>>>> Version "small" tries to improve this. >>>>>>>>>>>> >>>>>>>>>>>> -- Prefetching -- >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>>>> >>>>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>>>>>>>> >>>>>>>>>>>> -- Small -- >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>>>>>>>> >>>>>>>>>>>> The numbers can be found here: >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>>>>>>>> >>>>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>>>>>>>> >>>>>>>>>>>> What do you think? >>>>>>>>>>>> >>>>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Tobias >>>>>>>>>>>> >>>>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>>>>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>>> From vladimir.kozlov at oracle.com Wed Apr 27 15:32:55 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Apr 2016 08:32:55 -0700 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <57208F60.5000607@redhat.com> References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> <57208F60.5000607@redhat.com> Message-ID: Hi Andrew, Does size of immediate value have affect on latency on aarch64? For ConP it is 64-bit constant and for ConN it is 32-bit. Also, as Martin pointed, such constants are loaded from constant table now on SPARC and PPC. What about aarch64? Thanks, Vladimir On 4/27/16 3:07 AM, Andrew Haley wrote: > On 27/04/16 10:11, Doerr, Martin wrote: >> Would this be good for aarch64 as well? > > On AArch64, LoadConP is > > mov reg, #x > movk reg, #y shl #16 > movk reg, #z shl 32 > > (3 cycles latency) > > LoadConN + DecodeN heap-based > > mov reg, #x > movk reg, #y shl #16 > add reg, heapbase, reg shl #3 > > (4 cycles latency) > > Andrew. > From aph at redhat.com Wed Apr 27 15:37:44 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 27 Apr 2016 16:37:44 +0100 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> <57208F60.5000607@redhat.com> Message-ID: <5720DCC8.7060009@redhat.com> Hi, On 04/27/2016 04:32 PM, Vladimir Kozlov wrote: > Does size of immediate value have affect on latency on aarch64? > For ConP it is 64-bit constant and for ConN it is 32-bit. Immediate values are always 16 bits. To load more than 16 bits you have to use multiple instructions. An AArch64 address is 48 bits wide, so we need three 1-cycle instructions. > Also, as Martin pointed, such constants are loaded from constant > table now on SPARC and PPC. What about aarch64? No, never. It's too slow: 5 cycles latency, more if you miss L1 cache. Andrew. From rwestrel at redhat.com Wed Apr 27 15:53:04 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 27 Apr 2016 17:53:04 +0200 Subject: SuperWord::unrolling_analysis() question In-Reply-To: References: Message-ID: <5720E060.2050308@redhat.com> Hi Michael, Thanks for the answer. > The answer could be conditional if we had a machines with enough byte > or short components to make vectors with, I chose INT as it is the > current consistent minimum configuration for complete vector mapping. > The best answer would be to create some code which mines the common > type used in the current loops expressions, but I think we would be > stuck with two passes over the code, the first to bind the common > type, the second for finding the optimal sub vector mapping. Or > possibly moving the question to the machine layer as a query, where > compiler writers choose the minimum consistent configuration based on > current info on the machine we compile on. Would two passes like sketched here: http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/ would do the job? Roland. From rwestrel at redhat.com Wed Apr 27 15:57:17 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 27 Apr 2016 17:57:17 +0200 Subject: RFR(XS): 8155015: Aarch64: bad assert in spill generation code In-Reply-To: <571E1AE8.4020000@oracle.com> References: <571E190E.5050600@redhat.com> <571E1AE8.4020000@oracle.com> Message-ID: <5720E15D.3050006@redhat.com> Thanks for the review, Tobias! Roland. On 04/25/2016 03:26 PM, Tobias Hartmann wrote: > Hi Roland, > > On 25.04.2016 15:18, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8155015/webrev.00/ >> >> I hit that broken assert when doing some testing. > > That looks good to me! > > Best regards, > Tobias > >> >> Roland. >> From tobias.hartmann at oracle.com Wed Apr 27 16:08:35 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 27 Apr 2016 18:08:35 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <18792245-5f8c-1f09-3359-4c89c33d17bd@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> <571784C7.6020304@oracle.com> <571DBB4B.2070801@oracle.com> <571DCE55.6070506@oracle.com> <571E25AA.6090303@oracle.com> <571F500E.5020904@oracle.com> <499d9f6b-378f-e350-eb72-c35930360825@oracle.com> <57207C36.3030101@oracle.com> <18792245-5f8c-1f09-3359-4c89c33d17bd@oracle.com> Message-ID: <5720E403.6000400@oracle.com> Thanks, Vladimir! Best regards, Tobias On 27.04.2016 16:56, Vladimir Kozlov wrote: > Thanks you for verifying positive condition. > Changes are good. > > Thanks, > Vladimir > > On 4/27/16 1:45 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 27.04.2016 09:48, Vladimir Kozlov wrote: >>> I think next should be greaterEqual (branch from loop when limit+8 == 0) to avoid reading 8 bytes beyond array. >>> >>> + // Bail out if we reached the end (but still do the comparison) >>> + br(Assembler::positive, false, Assembler::pn, Lremaining); >> >> According to the SPARC manual [1], bpos branches if "not N", i.e. if limit is >= 0. I verified this with a small program (attached) using inline function templates [2]: >> >> $ cc -O main.c code.il >> $ ./a.out >> 1: 1 >> 0: 1 >> -1: 0 >> >> That means the bpos branch is taken for 0 and 1 but not for -1. So the code should be correct. >> >>> An other trick you can is use xorcc instead of cmp. >>> Then you need only one srlx and compare it with zero (may be use cmp_zero_and_br()). >> >> Yes, John already suggested this but as I wrote in another email, it does not improve but slightly degrade performance. >> >> Thanks, >> Tobias >> >> [1] http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf >> [2] https://docs.oracle.com/cd/E26502_01/html/E28387/gmabr.html >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/26/16 4:25 AM, Tobias Hartmann wrote: >>>> Thanks, Chris! >>>> >>>> On 25.04.2016 21:29, Christian Thalinger wrote: >>>>> >>>>>> On Apr 25, 2016, at 4:11 AM, Tobias Hartmann > wrote: >>>>>> >>>>>> Hi Mikael, >>>>>> >>>>>> On 25.04.2016 09:59, Mikael Gerdin wrote: >>>>>>> Hi Tobias >>>>>>> >>>>>>> On 2016-04-25 08:38, Tobias Hartmann wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I executed a complete hs-comp PIT (all hotspot tests with -Xcomp/-Xmixed) with the alignment checks mentioned below and it turned out that the arrays are always 8-byte aligned (results are attached to the bug). >>>>>>> >>>>>>> I think that if you disable compressed klass pointers (or compressed oops) then I suspect that array contents are not always 8 byte aligned. >>>>>>> >>>>>>> With compressed oops enabled the mark word + compressed class + alength >>>>>>> add up to 16 bytes and thus the array contents are guarateed to be 8 byte aligned. >>>>>>> If compressed klass pointers are enabled then the array header will become 20 bytes and I don't know how the contents are laid out in that case. >>>>>> >>>>>> Thanks for the suggestion! I've run some tests (-testset hotspot) with "-XX:-UseCompressedOops -XX:-UseCompressedClassPointers" but it seems that the arrays are still always 8-byte aligned. >>>>> >>>>> See arrayOop.hpp: >>>>> >>>>> // Returns the offset of the first element. >>>>> static int base_offset_in_bytes(BasicType type) { >>>>> return header_size(type) * HeapWordSize; >>>>> } >>>> >>>> I verified that Mikael's claim is right: >>>> With UseCompressedClassPointers the header size is 16 bytes: mark oop (8) + compressed klass (4) + length (4). >>>> Without UseCompressedClassPointers the header size is 20 bytes: mark oop (8) + klass (8) + length (4). >>>> >>>> However, the header size is always aligned to HeapWordSize: >>>> >>>> static int header_size_in_bytes() { >>>> size_t hs = align_size_up(length_offset_in_bytes() + sizeof(int), HeapWordSize); >>>> >>>> and therefore without UseCompressedClassPointers the header size is actually 24 bytes. On 64 bit systems, the first array element is always 8-byte aligned. >>>> >>>> Since we don't support 32-bit SPARC anymore, I wonder if it's okay to just remove the alignment check completely? This would simplify the code a lot (we don't need the "array_equals_loop" method) and improve performance: >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic.01/ >>>> >>>> Thanks, >>>> Tobias >>>> >>>>>> >>>>>> Best regards, >>>>>> Tobias >>>>>> >>>>>>> >>>>>>> /Mikael >>>>>>> >>>>>>>> >>>>>>>> If there are no objections, I would like to push the basic version: >>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>>> >>>>>>>> If unaligned performance turns out to be a problem, we can still improve the intrinsic. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tobias >>>>>>>> >>>>>>>> On 20.04.2016 15:31, Tobias Hartmann wrote: >>>>>>>>> Hi John, >>>>>>>>> >>>>>>>>> On 20.04.2016 03:46, John Rose wrote: >>>>>>>>>> So I started looking at your code and my inner SPARC junkie took over. >>>>>>>>>> >>>>>>>>>> This is what happened: >>>>>>>>>> http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ >>>>>>>>> >>>>>>>>> Thanks a lot for having a look! >>>>>>>>> >>>>>>>>>> Perhaps there are some ideas that might be helpful: >>>>>>>>>> - The rampdown logic can lose a couple of instructions by using xorcc and movr. >>>>>>>>> >>>>>>>>> Right, this simplifies the code a bit: >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ >>>>>>>>> >>>>>>>>> I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? >>>>>>>>> >>>>>>>>>> - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? >>>>>>>>> >>>>>>>>> I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: >>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 >>>>>>>>> >>>>>>>>> Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. >>>>>>>>> >>>>>>>>>> - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). >>>>>>>>> >>>>>>>>> Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. >>>>>>>>> >>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Tobias >>>>>>>>> >>>>>>>>> [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx >>>>>>>>> [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>>> [3] Runtime alignment checks: >>>>>>>>> >>>>>>>>> bind(Lunaligned); >>>>>>>>> Label next; >>>>>>>>> xor3(ary1, ary2, tmp); >>>>>>>>> and3(tmp, 7, tmp); >>>>>>>>> br_null_short(tmp, Assembler::pn, next); >>>>>>>>> STOP("One array is unaligned!"); >>>>>>>>> should_not_reach_here(); >>>>>>>>> bind(next); >>>>>>>>> STOP("Both arrays are unaligned!"); >>>>>>>>> >>>>>>>>>> On the other hand, what you wrote is nice and simple. >>>>>>>>>> >>>>>>>>>> HTH >>>>>>>>>> ? John >>>>>>>>>> >>>>>>>>>> P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more >>>>>>>>>> versions of misalignment, still with vectorization, as with the arraycopy stubs. >>>>>>>>>> But that's neither nice nor simple. >>>>>>>>>> >>>>>>>>>>> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >>>>>>>>>>> >>>>>>>>>>> Thanks, Vladimir! >>>>>>>>>>> >>>>>>>>>>> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>>>>>>>>>>> Very good. Go with basic. We can do SPU special improvements later if needed. >>>>>>>>>>> >>>>>>>>>>> Okay, I'll push the basic version. >>>>>>>>>>> >>>>>>>>>>> For reference, here are the results on a SPARC T4: >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >>>>>>>>>>> >>>>>>>>>>>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>>>>>>>>>>> We do have arraycopy code for it but by default we don't use it: >>>>>>>>>>>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>>>>>>>>>>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>>>>>>>>>>> >>>>>>>>>>>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >>>>>>>>>>> >>>>>>>>>>> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >>>>>>>>>>> >>>>>>>>>>> java -XX:ArraycopySrcPrefetchDistance=42 -version >>>>>>>>>>> ArraycopySrcPrefetchDistance (42) must be 0 >>>>>>>>>>> Error: Could not create the Java Virtual Machine. >>>>>>>>>>> Error: A fatal exception has occurred. Program will exit >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Tobias >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> please review the following enhancement: >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>>>>>>>>>>> >>>>>>>>>>>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>>>>>>>>>>> >>>>>>>>>>>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>>>>>>>>>>> >>>>>>>>>>>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>>>>>>>>>>> >>>>>>>>>>>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>>>>>>>>>>> >>>>>>>>>>>>> I evaluated the following three versions of the patch. >>>>>>>>>>>>> >>>>>>>>>>>>> -- Basic -- >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>>>>>>>>>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>>>>> >>>>>>>>>>>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>>>>>>>>>>> >>>>>>>>>>>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>>>>> Version "small" tries to improve this. >>>>>>>>>>>>> >>>>>>>>>>>>> -- Prefetching -- >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>>>>>>>>>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>>>>> >>>>>>>>>>>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>>>>>>>>>>> >>>>>>>>>>>>> -- Small -- >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>>>>>>>>>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>>>>>>>>>>> >>>>>>>>>>>>> The numbers can be found here: >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>>>>>>>>>>> >>>>>>>>>>>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>>>>>>>>>>> >>>>>>>>>>>>> What do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Tobias >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>>>>>>>>>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>>>>>>>>>>> [3] Microbenchmark results for the "basic" implementation >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>>>>>>>>>>> [4] Microbenchmark results for the "prefetching" implementation >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>>>>>>>>>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>>>> From nassim.halli at aselta.com Wed Apr 27 17:17:41 2016 From: nassim.halli at aselta.com (Nassim HALLI) Date: Wed, 27 Apr 2016 19:17:41 +0200 Subject: Are Java native methods inlined ? In-Reply-To: <5720956F.1060808@redhat.com> References: <5720956F.1060808@redhat.com> Message-ID: <5720F435.2080908@aselta.com> Hi Andrew, Thanks for the clarification. Nassim. Le 27/04/2016 12:33, Andrew Haley a ?crit : > On 27/04/16 10:12, Nassim Halli wrote: > >> Just a simple question : are Java native methods inlined ? > Not exactly. > >> By "native methods" I mean the interface between the Java caller and >> the targeted native JNI function in the dl (for which the >> JIT-compiled code can be observed using *+PrintNativeNMethods)*. >> >> I ask this question since I don't observe inlining with Java 8 and I >> heard it was. Although I understand that it could not necessarily >> lead to great improvements it's just for a clarification. > The glue code between Java and native code is not inlined into its > caller but it is generated by the JIT compiler. Like every > optimization in HotSpot, this is only done for hot code. If you run > something like Netbeans you'll see a lot of native nmethods generated. > > Andrew. From christian.thalinger at oracle.com Wed Apr 27 19:01:02 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 27 Apr 2016 09:01:02 -1000 Subject: RFR: 8154765: [JVMCI] Support dimensional granularity for stable array fields In-Reply-To: <5720BF26.2050400@oracle.com> References: <571A36D7.2030701@oracle.com> <5720BF26.2050400@oracle.com> Message-ID: <9869ACCB-BB3C-40E8-A5A0-585EFB347DAE@oracle.com> Ok. > On Apr 27, 2016, at 3:31 AM, Roland Schatz wrote: > > Please don't integrate this yet. We're looking for a more general solution, it's possible this won't be needed in the final version. > > Thanks, > Roland > > On 04/22/2016 04:36 PM, Andreas Woess wrote: >> Please review: >> http://cr.openjdk.java.net/~aw/8154765/webrev/ >> https://bugs.openjdk.java.net/browse/JDK-8154765 >> >> This change adds an optional stableDimensions parameter to ConstantReflectionProvider.readStableFieldValue that allows the number of stable array dimensions to be specified more fine-granular than inferring it from the type of the field. >> >> Thanks, >> Andreas >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Wed Apr 27 19:17:42 2016 From: dean.long at oracle.com (Dean Long) Date: Wed, 27 Apr 2016 12:17:42 -0700 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> Message-ID: <6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com> On 4/27/2016 2:11 AM, Doerr, Martin wrote: > Hi Roland, > > I have removed the piece of my change which would interfere with your change. > > But I'd like to explain the intention of skipping the transformation on non-x86 platforms in heap-based compressed oops mode. > (In simpler compressed oops modes decoding is very cheap so the transformation is probably good.) > Without transformation we have: LoadConP + Storage access > With transformation we have: LoadConN + DecodeN heap-based + Storage access > > I believe X86 is the only platform which can match the DecodeN heap-based into the Storage access. > > I guess other platforms should prefer the untransformed version: > PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded. > > I didn't take a closer look at SPARC, but I thought it would use the constant pool as well. Not sure if the following comment is still correct: > // On sparc loading 32-bits constant and decoding it have less > // instructions (4) then load 64-bits constant (7). > > > Therefore, I had proposed the following code to skip: > + // Matching decode heap based into an operand only works on X86. > + #if !defined(X86) > + if ((op == Op_ConN && Universe::narrow_oop_base() != NULL) || > + (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) { > + break; > + } > + #endif > + > > Would this be good for aarch64 as well? > Would you like to include code which skips the transformation in your change or should this better be discussed independently? Martin, wouldn't your #ifdef X86 code be better as a Matcher function, similar to narrow_oop_use_complex_address()? dl > Best regards, > Martin > > > -----Original Message----- > From: Roland Westrelin [mailto:rwestrel at redhat.com] > Sent: Mittwoch, 27. April 2016 10:00 > To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode > > Hi Martin, > >> seems like this issue is related to what I have sent out today: >> RFR(S): 8154836: VM crash due to "Base pointers must match" >> I also had to change the AddP case of final graph reshaping. >> >> In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode. >> >> Maybe I have to remove that part of my change, or at least adapt it. >> We should make sure that the changes don't get pushed on the same day. > Thanks for the heads up. It looks like your change will get in before > mine. I'll send an updated webrev once it's integrated. > > Roland. From john.r.rose at oracle.com Wed Apr 27 21:13:15 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 27 Apr 2016 14:13:15 -0700 Subject: Are Java native methods inlined ? In-Reply-To: <5720956F.1060808@redhat.com> References: <5720956F.1060808@redhat.com> Message-ID: <4E516677-5A69-4133-9962-AF64270B0A2C@oracle.com> On Apr 27, 2016, at 3:33 AM, Andrew Haley wrote: > > The glue code between Java and native code is not inlined into its > caller but it is generated by the JIT compiler. Like every > optimization in HotSpot, this is only done for hot code. If you run > something like Netbeans you'll see a lot of native nmethods generated. In Project Panama we are experimenting with inlining wrapper logic for native calls. But the 20-year practice in HotSpot is (as you say) to roll a separate, non-inlined wrapper for each JNI method. The JIT per se does not create this wrapper, but rather this logic: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/61a214186dae/src/cpu/x86/vm/sharedRuntime_x86_64.cpp#l1825 ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Apr 27 21:55:09 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 27 Apr 2016 14:55:09 -0700 Subject: SuperWord::unrolling_analysis() question In-Reply-To: <5720E060.2050308@redhat.com> References: <5720E060.2050308@redhat.com> Message-ID: <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com> It is reasonable to look ahead into the loop to find the largest applicable vector size, before choosing an unroll factor. A loop which works on bytes and doubles at the same time will want to unroll only up to vector-of-double. But a loop which works only on bytes will want to unroll more. Is that what we are talking about here? ? John > On Apr 27, 2016, at 8:53 AM, Roland Westrelin wrote: > > > Hi Michael, > > Thanks for the answer. > >> The answer could be conditional if we had a machines with enough byte >> or short components to make vectors with, I chose INT as it is the >> current consistent minimum configuration for complete vector mapping. >> The best answer would be to create some code which mines the common >> type used in the current loops expressions, but I think we would be >> stuck with two passes over the code, the first to bind the common >> type, the second for finding the optimal sub vector mapping. Or >> possibly moving the question to the machine layer as a query, where >> compiler writers choose the minimum consistent configuration based on >> current info on the machine we compile on. > > Would two passes like sketched here: > > http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/ > > would do the job? > > Roland. From vladimir.kozlov at oracle.com Wed Apr 27 21:55:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Apr 2016 14:55:16 -0700 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <5720CA98.10605@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> <571F5756.7020007@oracle.com> <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> <5720CA98.10605@oracle.com> Message-ID: <8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com> Hi Zoltan On 4/27/16 7:20 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! > > On 04/27/2016 12:39 AM, Vladimir Kozlov wrote: >> On 4/26/16 4:56 AM, Zolt?n Maj? wrote: >>> Hi Vladimir, >>> >>> >>> thank you for the feedback! Please see comments below. >>> >>> On 04/22/2016 12:37 AM, Vladimir Kozlov wrote: >>>> Hi, Zoltan >>>> >>>> I think we should change code in prefetch_allocation() instead: >>>> >>>> Node *cache_adr = new AddPNode(old_eden_top, old_eden_top, >>>> _igvn.MakeConX(step_size + distance)); >>> >>> The problem is that AllocatePrefetchStyle must align the first prefetched address to 8 bytes, otherwise the emitted stxa >>> instruction could cause a SIGBUS. But the alignment does not have to be at step_size boundary, 8-byte alignment is >>> sufficient. >> >> Actually it has to be step_size (cache line size) aligned - BIS instruction is triggered when the address of stxa is >> the beginning of cache line for AllocatePrefetchStyle == 3. If it is just 8 bytes aligned it will be simple store. > > Thank you for clarifying. > >> We should require AllocatePrefetchStepSize to be 8 bytes aligned in vm_version_sparc.cpp instead of: >> + // BIS instructions require 8-byte aligned addresses >> + Node* mask = _igvn.MakeConX(~(intptr_t)(wordSize - 1)); > > I agree. But I'd prefer we do that in the AllocatePrefetchStepSizeConstraintFunc() constraint function in > commandLineFlagConstraintsCompiler.cpp). The reason is that since JEP-245 the preferred place to validate of > command-line arguments is in constraint functions. Most compiler-related checks were moved there with JDK-8078554 and > JDK-8146478. Okay. Usually we do platform specific flag's setting in vm_version_.cpp files but it looks like it start changing. I am not supporter of having multiply #ifdef in shared code (in commandLineFlagConstraintsCompiler). > > I'd also like to set the minimum value for AllocatePrefetchDistance to AllocatePrefetchStepSize, otherwise we can access > the heap before newly allocated objects/arrays. Only for style 3. And I said it may be better to change code. >>> >>>> >>>> These way we allow AllocatePrefetchDistance == 0 in all AllocatePrefetchStyle cases - it is consistent. >>> >>> Unfortunately, it is not easy to have the same limit for AllocatePrefetchDistance on all platforms. Due to the 8-byte >>> alignment performed by compiled code, the lower limit of 8 for AllocatePrefetchDistance is needed on SPARC; the lower >>> limit of 0 works fine on all other platforms. >>> >>> I've started looking at the consistency of flags controlling allocation prefetch in general, as we have other issues >>> open related to them (e.g., JDK-8151622). We're touching related code now, so I thought, maybe it makes sense to fix all >>> remaining issues at once. >> >> Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it should be used only if UseTLAB is true. Please, >> look. > > It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true. > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844 I got report from it crashing on T7 (attached). May be it is because by default it used BIS with it. Thanks, Vladimir > > Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, I've started an RBT run with all hotspot on all > platforms using AllocatePrefetchStyle=2. So far no problems have shown up. > >> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch should be set for all styles on all platforms to >> avoid accessing beyond heap. Prefetch instructions doc say that they does not trap but we should be careful. > > I agree. > > That means we initialize _reserve_for_allocation_prefetch in a platform-independent way. So I think it would make sense > to move that field to ThreadLocalAllocBuffer, as TLAB is the only user of that field and we don't support > platform-independent initialization of Abstract_VM_Version. I did that in the updated webrev. > >>> >>> The updated webrev does the following (in addition to fixing the original problem with AllocatePrefetchDistance): >>> >>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., BIS instructions are used for prefetching). As >>> far as I understand, AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so if BIS is enabled, we >>> should use AllocatePrefetchStyle = 3. >> >> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select AllocatePrefetchStyle = 3. > > OK. > >> But we should allow AllocatePrefetchStyle = 3 if normal prefetch instructions (or other platforms) are used. > > OK, I've removed that restriction. > >> I think we should update comment in globals.hpp to say "generate one prefetch per cache line" without saying BIS. > > OK. > >> >> But I agree if BIS is not available we should not use BIS AllocatePrefetchInstr = 1. > > OK. > >> >>> >>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on non-SPARC platforms. >> >> It could be useful on other platforms since it does one access per cache line. > > OK, let's keep it available then. > >> >> >>> >>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if AllocatePrefetchStyle is 3 (due to alignment requirements). >> >> That is correct since stxa requires at least 8 bytes alignment (as stx). > > OK. > >> >>> >>> 3. Determine the number of lines to prefetch in the same way for all prefetch styles: >>> lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : AllocatePrefetchLines >> >> Agree. > > OK. > >> >>> >>> Here is the updated webrev: >>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/ >> >> vm_version_sparc.cpp >> AllocatePrefetchInstr = 0 should be set for all styles (not only 1) when BIS is not available. > > OK. I think it is sufficient to do > > 52 if (!has_blk_init()) { > 53 if (AllocatePrefetchInstr == 1) { > 54 warning("BIS instructions required for AllocatePrefetchInstr 1 unavailable"); > 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0); > 56 } I hope I'm not missing anything here. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT (incl. TestOptionsWithRanges); - local testing on > SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best regards, Zoltan > >> >> Thanks, >> Vladimir >> >>> >>> Testing: >>> - JPRT (incl. TestOptionsWithRanges) >>> - local testing on a SPARC machine. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >>>>> Hi, >>>>> >>>>> >>>>> please review the patch for 8153340. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8153340 >>>>> >>>>> >>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way >>>>> the address for the first prefetch instruction is calculated [1]: >>>>> >>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of >>>>> the bits of cache_adr. That result in accesses *before* the newly allocated object. >>>>> >>>>> >>>>> Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >>>>> Unquarantine test. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >>>>> >>>>> Testing: >>>>> - JPRT (incl. TestOptionsWithRanges.java) >>>>> - local testing on a SPARC machine. >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>>>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 >>> > -------------- next part -------------- An embedded message was scrubbed... From: Vladimir Kozlov Subject: Re: Java's use of BIS and RAW penalty Date: Tue, 15 Mar 2016 11:51:01 -0700 Size: 7622 URL: From michael.c.berg at intel.com Wed Apr 27 21:58:31 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 27 Apr 2016 21:58:31 +0000 Subject: SuperWord::unrolling_analysis() question In-Reply-To: <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com> References: <5720E060.2050308@redhat.com> <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com> Message-ID: John, it is pretty much that issue(unrolling for the available supported vector), I am testing some changes now for Roland. -----Original Message----- From: John Rose [mailto:john.r.rose at oracle.com] Sent: Wednesday, April 27, 2016 2:55 PM To: rwestrel at redhat.com Cc: Berg, Michael C ; hotspot-compiler-dev at openjdk.java.net Subject: Re: SuperWord::unrolling_analysis() question It is reasonable to look ahead into the loop to find the largest applicable vector size, before choosing an unroll factor. A loop which works on bytes and doubles at the same time will want to unroll only up to vector-of-double. But a loop which works only on bytes will want to unroll more. Is that what we are talking about here? - John > On Apr 27, 2016, at 8:53 AM, Roland Westrelin wrote: > > > Hi Michael, > > Thanks for the answer. > >> The answer could be conditional if we had a machines with enough byte >> or short components to make vectors with, I chose INT as it is the >> current consistent minimum configuration for complete vector mapping. >> The best answer would be to create some code which mines the common >> type used in the current loops expressions, but I think we would be >> stuck with two passes over the code, the first to bind the common >> type, the second for finding the optimal sub vector mapping. Or >> possibly moving the question to the machine layer as a query, where >> compiler writers choose the minimum consistent configuration based on >> current info on the machine we compile on. > > Would two passes like sketched here: > > http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/ > > would do the job? > > Roland. From tom.rodriguez at oracle.com Wed Apr 27 23:51:23 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 27 Apr 2016 16:51:23 -0700 Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly handle private methods in interfaces In-Reply-To: <80797340-B145-42F2-ABED-902197C9AC17@oracle.com> References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com> <80797340-B145-42F2-ABED-902197C9AC17@oracle.com> Message-ID: <12E0E6EC-C97F-4D7D-96BB-A7A4A6F8A7BD@oracle.com> One of the jtreg tests had to be fixed after these changes, so I?ve updated the webrev. The only changed file is http://cr.openjdk.java.net/~never/8152903/webrev/test/compiler/jvmci/compilerToVM/ResolveMethodTest.java.udiff.html The test case was actually resolving the method against the holder type instead of the receiver but this only mattered for interface types. We also require the type to be linked before answering the question so we have to force initialization. tom > On Apr 21, 2016, at 8:23 PM, Igor Veresov wrote: > > Looks good! > > igor > >> On Apr 21, 2016, at 6:22 PM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8152903/webrev >> >> JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. >> >> Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes. A modified version of the test which found the issue also passes now. I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes. https://bugs.openjdk.java.net/browse/JDK-8154904 >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Thu Apr 28 02:39:41 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 28 Apr 2016 02:39:41 +0000 Subject: SuperWord::unrolling_analysis() question In-Reply-To: References: <5720E060.2050308@redhat.com> <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com> Message-ID: Roland, for superword.cpp you only need this one line as change, which I have tested and for which has no negative side effects on x86. It will address the issue(oldly enough, its where we started): Line 201 int max_vector = Matcher::max_vector_size(T_BYTE); -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Wednesday, April 27, 2016 2:59 PM To: John Rose ; rwestrel at redhat.com Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: SuperWord::unrolling_analysis() question John, it is pretty much that issue(unrolling for the available supported vector), I am testing some changes now for Roland. -----Original Message----- From: John Rose [mailto:john.r.rose at oracle.com] Sent: Wednesday, April 27, 2016 2:55 PM To: rwestrel at redhat.com Cc: Berg, Michael C ; hotspot-compiler-dev at openjdk.java.net Subject: Re: SuperWord::unrolling_analysis() question It is reasonable to look ahead into the loop to find the largest applicable vector size, before choosing an unroll factor. A loop which works on bytes and doubles at the same time will want to unroll only up to vector-of-double. But a loop which works only on bytes will want to unroll more. Is that what we are talking about here? - John > On Apr 27, 2016, at 8:53 AM, Roland Westrelin wrote: > > > Hi Michael, > > Thanks for the answer. > >> The answer could be conditional if we had a machines with enough byte >> or short components to make vectors with, I chose INT as it is the >> current consistent minimum configuration for complete vector mapping. >> The best answer would be to create some code which mines the common >> type used in the current loops expressions, but I think we would be >> stuck with two passes over the code, the first to bind the common >> type, the second for finding the optimal sub vector mapping. Or >> possibly moving the question to the machine layer as a query, where >> compiler writers choose the minimum consistent configuration based on >> current info on the machine we compile on. > > Would two passes like sketched here: > > http://cr.openjdk.java.net/~roland/vect-unroll-analysis/webrev/ > > would do the job? > > Roland. From igor.veresov at oracle.com Thu Apr 28 02:43:30 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 27 Apr 2016 19:43:30 -0700 Subject: RFR 8152903: [JVMCI] CompilerToVM::resolveMethod should correctly handle private methods in interfaces In-Reply-To: <12E0E6EC-C97F-4D7D-96BB-A7A4A6F8A7BD@oracle.com> References: <8A29CA8C-5B4A-4843-A583-42688A99245D@oracle.com> <80797340-B145-42F2-ABED-902197C9AC17@oracle.com> <12E0E6EC-C97F-4D7D-96BB-A7A4A6F8A7BD@oracle.com> Message-ID: <168363BA-C08D-4FB9-8116-BB4ED6DFE366@oracle.com> Good. igor > On Apr 27, 2016, at 4:51 PM, Tom Rodriguez wrote: > > One of the jtreg tests had to be fixed after these changes, so I?ve updated the webrev. The only changed file is http://cr.openjdk.java.net/~never/8152903/webrev/test/compiler/jvmci/compilerToVM/ResolveMethodTest.java.udiff.html > > The test case was actually resolving the method against the holder type instead of the receiver but this only mattered for interface types. We also require the type to be linked before answering the question so we have to force initialization. > > tom > >> On Apr 21, 2016, at 8:23 PM, Igor Veresov > wrote: >> >> Looks good! >> >> igor >> >>> On Apr 21, 2016, at 6:22 PM, Tom Rodriguez > wrote: >>> >>> http://cr.openjdk.java.net/~never/8152903/webrev >>> >>> JVMCI had it own custom version of the resolution logic when it should be doing something similar to what ciMethod::resolve_invoke is doing. This required a semantic change that if the type is an interface no meaningful answer can be provided. I updated tests and the interface a little to reflect this. >>> >>> Making this change exposed a problem with -Xcomp where the resolution by the compiler was triggering compilation instead of the first real invoke. I rearranged the code a little for this to ensure that code wasn't executed for the Compiler thread. It passes the graal gate with these changes. A modified version of the test which found the issue also passes now. I filed a bug suggesting changes to that test that would make it work better with compiler like C2 and Graal that don?t handle unloaded classes. https://bugs.openjdk.java.net/browse/JDK-8154904 >>> >>> tom >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Thu Apr 28 05:31:23 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 28 Apr 2016 07:31:23 +0200 Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose In-Reply-To: <5720761A.4010004@oracle.com> References: <5720761A.4010004@oracle.com> Message-ID: <9da44f04-6a24-a96a-67c5-c443a74c8df1@oracle.com> Hi Stefan, thank you for taking care of this! This looks good to me! /Robbin On 04/27/2016 10:19 AM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to silence the DirectiveParser_test internal VM > test. > > http://cr.openjdk.java.net/~stefank/8155206/webrev.01 > https://bugs.openjdk.java.net/browse/JDK-8155206 > > Before the patch, we got the following output when running with > -XX:+ExecuteInternalVMTests: > > ... > Running test: Test_log_file_startup_truncation > Running test: Test_invalid_log_file > Running test: DirectivesParser_test > Internal error on line 1 byte 2: Directive missing required match. > Got EOS. > At '}'. > {} > Internal error on line 1 byte 3: Directive missing required match. > At '}]'. > [{}] > Internal error on line 1 byte 3: Directive missing required match. > At '},{}]'. > [{},{}] > Internal error on line 1 byte 2: Directive missing required match. > At '},{}'. > {},{} > Syntax error on line 2 byte 3: DirectivesParser can only start with an > array containing directive objects, or one single directive. > At '['. > [ > { > match: "foo/bar.*", > inline : "+java/util.*", > PrintAssembly: true, > BreakAtExecute: true, > } > ] > ] > > Value error on line 4 byte 20: The key 'PrintInlining' does not allow an > array of values. > At '['. > PrintInlining: [ > true, > false > ], > } > ] > > Warning: +LogCompilation must be set to enable compilation logging from > directives > Warning: +LogCompilation must be set to enable compilation logging from > directives > Value error on line 7 byte 9: Method pattern error: Missing leading > inline type (+/-) > At '"foo",'. > "foo", > "bar", > ] > } > } > ] > > Value error on line 8 byte 9: Method pattern error: Missing leading > inline type (+/-) > At '"bar",'. > "bar", > ] > } > } > ] > > Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key. > At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'. > [{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}] > Value error on line 5 byte 12: Key of type match needs a value of type > string > At 'true,'. > match: true, > inline: true, > enable: true, > c1: { > preset: true, > } > } > ] > > Running test: Test_TempNewSymbol > Running test: VMStructs_test > ... > > With the patch the output is much less noisy: > ... > Running test: Test_log_file_startup_truncation > Running test: Test_invalid_log_file > Running test: DirectivesParser_test > Warning: +LogCompilation must be set to enable compilation logging from > directives > Warning: +LogCompilation must be set to enable compilation logging from > directives > Running test: Test_TempNewSymbol > Running test: VMStructs_test > ... > > We might want to get rid of the Warning messages, but I think the > proposed patch is a good first step. > > You can turn on the old output with -XX:+VerboseInternalVMTests. > > Tested with -XX:+ExecuteInternalVMTests :) > > Thanks, > StefanK From stefan.karlsson at oracle.com Thu Apr 28 05:59:04 2016 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 28 Apr 2016 07:59:04 +0200 Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose In-Reply-To: <9da44f04-6a24-a96a-67c5-c443a74c8df1@oracle.com> References: <5720761A.4010004@oracle.com> <9da44f04-6a24-a96a-67c5-c443a74c8df1@oracle.com> Message-ID: <5721A6A8.5070408@oracle.com> Thanks, Robbin. StefanK On 2016-04-28 07:31, Robbin Ehn wrote: > Hi Stefan, thank you for taking care of this! > > This looks good to me! > > /Robbin > > On 04/27/2016 10:19 AM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to silence the DirectiveParser_test internal VM >> test. >> >> http://cr.openjdk.java.net/~stefank/8155206/webrev.01 >> https://bugs.openjdk.java.net/browse/JDK-8155206 >> >> Before the patch, we got the following output when running with >> -XX:+ExecuteInternalVMTests: >> >> ... >> Running test: Test_log_file_startup_truncation >> Running test: Test_invalid_log_file >> Running test: DirectivesParser_test >> Internal error on line 1 byte 2: Directive missing required match. >> Got EOS. >> At '}'. >> {} >> Internal error on line 1 byte 3: Directive missing required match. >> At '}]'. >> [{}] >> Internal error on line 1 byte 3: Directive missing required match. >> At '},{}]'. >> [{},{}] >> Internal error on line 1 byte 2: Directive missing required match. >> At '},{}'. >> {},{} >> Syntax error on line 2 byte 3: DirectivesParser can only start with an >> array containing directive objects, or one single directive. >> At '['. >> [ >> { >> match: "foo/bar.*", >> inline : "+java/util.*", >> PrintAssembly: true, >> BreakAtExecute: true, >> } >> ] >> ] >> >> Value error on line 4 byte 20: The key 'PrintInlining' does not allow an >> array of values. >> At '['. >> PrintInlining: [ >> true, >> false >> ], >> } >> ] >> >> Warning: +LogCompilation must be set to enable compilation logging from >> directives >> Warning: +LogCompilation must be set to enable compilation logging from >> directives >> Value error on line 7 byte 9: Method pattern error: Missing leading >> inline type (+/-) >> At '"foo",'. >> "foo", >> "bar", >> ] >> } >> } >> ] >> >> Value error on line 8 byte 9: Method pattern error: Missing leading >> inline type (+/-) >> At '"bar",'. >> "bar", >> ] >> } >> } >> ] >> >> Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key. >> At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'. >> [{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}] >> Value error on line 5 byte 12: Key of type match needs a value of type >> string >> At 'true,'. >> match: true, >> inline: true, >> enable: true, >> c1: { >> preset: true, >> } >> } >> ] >> >> Running test: Test_TempNewSymbol >> Running test: VMStructs_test >> ... >> >> With the patch the output is much less noisy: >> ... >> Running test: Test_log_file_startup_truncation >> Running test: Test_invalid_log_file >> Running test: DirectivesParser_test >> Warning: +LogCompilation must be set to enable compilation logging from >> directives >> Warning: +LogCompilation must be set to enable compilation logging from >> directives >> Running test: Test_TempNewSymbol >> Running test: VMStructs_test >> ... >> >> We might want to get rid of the Warning messages, but I think the >> proposed patch is a good first step. >> >> You can turn on the old output with -XX:+VerboseInternalVMTests. >> >> Tested with -XX:+ExecuteInternalVMTests :) >> >> Thanks, >> StefanK From zoltan.majo at oracle.com Thu Apr 28 08:25:17 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 28 Apr 2016 10:25:17 +0200 Subject: RFR(S): 8154836: VM crash due to "Base pointers must match" In-Reply-To: References: <57201A20.7000504@oracle.com> <1e9894d973d0490a81f4549dad411886@DEWDFE13DE14.global.corp.sap> Message-ID: <5721C8ED.8040507@oracle.com> The latest patch looks good to me as well. I'll push it today. Thank you! Best regards, Zoltan On 04/27/2016 04:54 PM, Vladimir Kozlov wrote: > This looks good. > > Thanks, > Vladimir > > On 4/27/16 1:40 AM, Doerr, Martin wrote: >> Hi Vladimir and Zoltan, >> >> thanks for reviewing. >> >> I have made the requested changes: >> - Removed the code which skips the transformation on non-X86 >> platforms in heap-based compressed oops mode. I think we can discuss >> that independently. >> - Improved the assertion as suggested by Vladimir. >> >> New webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.01/ >> >> >> Zoltan, you have already offered to sponsor this change. Thanks. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Mittwoch, 27. April 2016 03:47 >> To: Doerr, Martin ; Zolt?n Maj? >> ; hotspot-compiler-dev at openjdk.java.net >> compiler >> Subject: Re: RFR(S): 8154836: VM crash due to "Base pointers must match" >> >> Thank you, Martin >> >> On 4/25/16 4:04 AM, Doerr, Martin wrote: >>> Hi all, >>> >>> we have already seen such an assertion in final_graph_reshaping. >>> >>> We found 2 AddP nodes in a row. The ideal graph looked like this >>> (simplified): >>> N0 ConP >>> N1 ConN >>> N2 AddP(Base = N0, Address = N0) >>> N3 AddP(Base = N0, Address = N2) >>> >>> Final graph reshaping visited N2 before N3 first and changed the graph: >>> N0 ConP >>> N1 ConN >>> N4 DecodeN >>> N2 AddP(Base = N4, Address = N4) >>> N3 AddP(Base = N0, Address = N2) >>> >>> Afterwards, final graph reshaping visited N3 and ran into the >>> assertion. The Base of N3 is unexpected. >>> >>> I made a change to reconnect N3's Base input to N4, too. >>> >>> Webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8154836_final_graph_reshaping/webrev.00/ >>> >> >> Fix looks good. >> >>> >>> In addition to fixing this problem, I added an assertion to check if >>> there are more than 3 AddP nodes in a row. >>> I wouldn't expect that to happen. Not sure if this assertion is >>> desired. >> >> We should not have 3rd AddP but I agree with assert. You should add >> additional check to the assert: >> >> || out_j->in(AddPNode::Base) != addp >> >>> >>> I made an additional change: I think the graph transformation >>> doesn't make sense if decoding is expensive. >>> Therefore, I skip it on non-X86 platforms when we're running in heap >>> based compressed oops mode. >>> (I believe X86 is the only platform which can match the decoding in >>> the operand in this case.) >> >> It is not related to the fix and should be done separately. Or don't >> do at all. >> There is comment above the code which explains why it could >> beneficial on SPARC too: >> >> 2845 // On sparc loading 32-bits constant and decoding it have >> less >> 2846 // instructions (4) then load 64-bits constant (7). >> >> Thanks, >> Vladimir >> >>> >>> Please review. I will also need a sponsor, please. >>> >>> Best regards, >>> Martin >>> >>> From martin.doerr at sap.com Thu Apr 28 09:40:02 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 28 Apr 2016 09:40:02 +0000 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com> References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> <6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com> Message-ID: Hi all, thanks for all your comments. we could introduce something like x86: bool Matcher::const_oop_prefer_decode() { return true; } non-x86: bool Matcher::const_oop_prefer_decode() { return Universe::narrow_oop_base() == NULL; } all platforms: bool Matcher::const_klass_prefer_decode() { return Universe::narrow_klass_base() == NULL; } (Seems like matching of DecodeNKlass as operand is not implemented on x86.) Roland, sorry that we discuss so much of my stuff in your thread for 8154826. I think it is slightly related to your change and it touches the same files. So it might be easier to sell it as add-on to your change? If you prefer to handle it separated from 8154826, please let me know. I could open a new bug for it. Disadvantage would be that someone will have to take care of closed source platforms and I will need a sponsor from Oracle. Best regards, Martin -----Original Message----- From: Dean Long [mailto:dean.long at oracle.com] Sent: Mittwoch, 27. April 2016 21:18 To: Doerr, Martin ; rwestrel at redhat.com; Zolt?n Maj? ; Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode On 4/27/2016 2:11 AM, Doerr, Martin wrote: > Hi Roland, > > I have removed the piece of my change which would interfere with your change. > > But I'd like to explain the intention of skipping the transformation on non-x86 platforms in heap-based compressed oops mode. > (In simpler compressed oops modes decoding is very cheap so the transformation is probably good.) > Without transformation we have: LoadConP + Storage access > With transformation we have: LoadConN + DecodeN heap-based + Storage access > > I believe X86 is the only platform which can match the DecodeN heap-based into the Storage access. > > I guess other platforms should prefer the untransformed version: > PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded. > > I didn't take a closer look at SPARC, but I thought it would use the constant pool as well. Not sure if the following comment is still correct: > // On sparc loading 32-bits constant and decoding it have less > // instructions (4) then load 64-bits constant (7). > > > Therefore, I had proposed the following code to skip: > + // Matching decode heap based into an operand only works on X86. > + #if !defined(X86) > + if ((op == Op_ConN && Universe::narrow_oop_base() != NULL) || > + (op == Op_ConNKlass && Universe::narrow_klass_base() != NULL)) { > + break; > + } > + #endif > + > > Would this be good for aarch64 as well? > Would you like to include code which skips the transformation in your change or should this better be discussed independently? Martin, wouldn't your #ifdef X86 code be better as a Matcher function, similar to narrow_oop_use_complex_address()? dl > Best regards, > Martin > > > -----Original Message----- > From: Roland Westrelin [mailto:rwestrel at redhat.com] > Sent: Mittwoch, 27. April 2016 10:00 > To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode > > Hi Martin, > >> seems like this issue is related to what I have sent out today: >> RFR(S): 8154836: VM crash due to "Base pointers must match" >> I also had to change the AddP case of final graph reshaping. >> >> In one part of my change, I skip the graph transformation on non-X86 platforms when we're running in heap based compressed oops mode. >> >> Maybe I have to remove that part of my change, or at least adapt it. >> We should make sure that the changes don't get pushed on the same day. > Thanks for the heads up. It looks like your change will get in before > mine. I'll send an updated webrev once it's integrated. > > Roland. From rwestrel at redhat.com Thu Apr 28 09:50:06 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 28 Apr 2016 11:50:06 +0200 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset Message-ID: <5721DCCE.5050602@redhat.com> http://cr.openjdk.java.net/~roland/8155612/webrev.00/ When the superword optimization encounters a misaligned offset, it tries to align the loop induction variable so the memory access is aligned. So even though the aarch64 port only supports align vector memory accesses ( Matcher::misaligned_vectors_ok() return false), vector node can still get a misaligned offset which results in: # Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/cpu/aarch64/vm/assembler_aarch64.hpp:251), pid=3780, tid=3807 # guarantee(chk == -1 || chk == 0) failed: Field too big for insn Roland. From zoltan.majo at oracle.com Thu Apr 28 09:50:31 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 28 Apr 2016 11:50:31 +0200 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> <571F5756.7020007@oracle.com> <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> <5720CA98.10605@oracle.com> <8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com> Message-ID: <5721DCE7.1020102@oracle.com> Hi Vladimir, On 04/27/2016 11:55 PM, Vladimir Kozlov wrote: > [...] >> I agree. But I'd prefer we do that in the >> AllocatePrefetchStepSizeConstraintFunc() constraint function in >> commandLineFlagConstraintsCompiler.cpp). The reason is that since >> JEP-245 the preferred place to validate of >> command-line arguments is in constraint functions. Most >> compiler-related checks were moved there with JDK-8078554 and >> JDK-8146478. > > Okay. Usually we do platform specific flag's setting in > vm_version_.cpp files but it looks like it start changing. > I am not supporter of having multiply #ifdef in shared code (in > commandLineFlagConstraintsCompiler). JEP-245 proposed to have ranges/constraints defined for all flags. Some of these ranges/constraints are unavoidably architecture-specific. I agree with you that having lots of #ifdefs in shared code does not improve code readability. But I'm wondering what would be a good way to achieve the goals of JEP-245 without using too many #ifdefs. Maybe architecture-specific constraint files (e.g., commandLineFlagConstraintsCompiler_solaris.cpp, commandLineFlagConstraintsCompiler_x86.cpp, etc.)? At least with the current change we won't add more #ifdefs... > >> >> I'd also like to set the minimum value for AllocatePrefetchDistance >> to AllocatePrefetchStepSize, otherwise we can access >> the heap before newly allocated objects/arrays. > > Only for style 3. And I said it may be better to change code. Yes, you've mentioned that in the first round of reviews and I missed your point in the subsequent rounds. Sorry for that. I've updated the code in macro.cpp and the constraint AllocatePrefetchDistanceConstraintFunc() as you've suggested. > [...] >>> >>> Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - >>> it should be used only if UseTLAB is true. Please, >>> look. >> >> It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true. >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844 >> > > I got report from it crashing on T7 (attached). May be it is because > by default it used BIS with it. I think using BIS with AllocatePrefetchStyle=2 is indeed the cause. I've executed the following on our T7 machine with b115: * java -XX:AllocatePrefetchStyle=2 (uses AllocatePrefetchInstr=1 by default) -> crashes * java -XX:AllocatePrefetchStyle=2 -XX:AllocatePrefetchInstr=0 -> works With the newest webrev, both commands pass. Here is the newest webrev: http://cr.openjdk.java.net/~zmajo/8153340/webrev.03/ I re-did testing with JPRT, the results look OK. Thank you! Best regards, Zoltan > > Thanks, > Vladimir > >> >> Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, >> I've started an RBT run with all hotspot on all >> platforms using AllocatePrefetchStyle=2. So far no problems have >> shown up. >> >>> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch >>> should be set for all styles on all platforms to >>> avoid accessing beyond heap. Prefetch instructions doc say that they >>> does not trap but we should be careful. >> >> I agree. >> >> That means we initialize _reserve_for_allocation_prefetch in a >> platform-independent way. So I think it would make sense >> to move that field to ThreadLocalAllocBuffer, as TLAB is the only >> user of that field and we don't support >> platform-independent initialization of Abstract_VM_Version. I did >> that in the updated webrev. >> >>>> >>>> The updated webrev does the following (in addition to fixing the >>>> original problem with AllocatePrefetchDistance): >>>> >>>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 >>>> (i.e., BIS instructions are used for prefetching). As >>>> far as I understand, AllocatePrefetchStyle = 3 was added to support >>>> prefetching with BIS, so if BIS is enabled, we >>>> should use AllocatePrefetchStyle = 3. >>> >>> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should >>> select AllocatePrefetchStyle = 3. >> >> OK. >> >>> But we should allow AllocatePrefetchStyle = 3 if normal prefetch >>> instructions (or other platforms) are used. >> >> OK, I've removed that restriction. >> >>> I think we should update comment in globals.hpp to say "generate one >>> prefetch per cache line" without saying BIS. >> >> OK. >> >>> >>> But I agree if BIS is not available we should not use BIS >>> AllocatePrefetchInstr = 1. >> >> OK. >> >>> >>>> >>>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on >>>> non-SPARC platforms. >>> >>> It could be useful on other platforms since it does one access per >>> cache line. >> >> OK, let's keep it available then. >> >>> >>> >>>> >>>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if >>>> AllocatePrefetchStyle is 3 (due to alignment requirements). >>> >>> That is correct since stxa requires at least 8 bytes alignment (as >>> stx). >> >> OK. >> >>> >>>> >>>> 3. Determine the number of lines to prefetch in the same way for >>>> all prefetch styles: >>>> lines = (prefecth instance allocation) ? >>>> AllocateInstancePrefetchLines : AllocatePrefetchLines >>> >>> Agree. >> >> OK. >> >>> >>>> >>>> Here is the updated webrev: >>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/ >>> >>> vm_version_sparc.cpp >>> AllocatePrefetchInstr = 0 should be set for all styles (not only 1) >>> when BIS is not available. >> >> OK. I think it is sufficient to do >> >> 52 if (!has_blk_init()) { >> 53 if (AllocatePrefetchInstr == 1) { >> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 >> unavailable"); >> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0); >> 56 } I hope I'm not missing anything here. Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT >> (incl. TestOptionsWithRanges); - local testing on >> SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all >> platforms. Thank you! Best regards, Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Testing: >>>> - JPRT (incl. TestOptionsWithRanges) >>>> - local testing on a SPARC machine. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >>>>>> Hi, >>>>>> >>>>>> >>>>>> please review the patch for 8153340. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340 >>>>>> >>>>>> >>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and >>>>>> AllocatePrefetchDistance==0. The crash happens due to the way >>>>>> the address for the first prefetch instruction is calculated [1]: >>>>>> >>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= >>>>>> ~(AllocatePrefetchStepSize - 1) which can zero some of >>>>>> the bits of cache_adr. That result in accesses *before* the newly >>>>>> allocated object. >>>>>> >>>>>> >>>>>> Solution: Set lower limit of AllocatePrefetchDistance to >>>>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >>>>>> Unquarantine test. >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >>>>>> >>>>>> Testing: >>>>>> - JPRT (incl. TestOptionsWithRanges.java) >>>>>> - local testing on a SPARC machine. >>>>>> >>>>>> Thank you! >>>>>> >>>>>> Best regards, >>>>>> >>>>>> >>>>>> Zoltan >>>>>> >>>>>> [1] >>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 >>>> >> From aph at redhat.com Thu Apr 28 09:58:59 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Apr 2016 10:58:59 +0100 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset In-Reply-To: <5721DCCE.5050602@redhat.com> References: <5721DCCE.5050602@redhat.com> Message-ID: <5721DEE3.5010100@redhat.com> On 28/04/16 10:50, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8155612/webrev.00/ OK, thanks. Andrew. From nils.eliasson at oracle.com Thu Apr 28 12:01:19 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 28 Apr 2016 14:01:19 +0200 Subject: [9] RFR(S): 8142464: PlatformLoggerTest.java throws java.lang.RuntimeException: Logger test.logger.bar does not exist Message-ID: <5721FB8F.40207@oracle.com> Hi, Please review the fix of jdk/test/sun/util/logging/PlatformLoggerTest.java that has been failing intermittently in our nightlies. Summary: The test uses loggers that are accessible by weak refs from a LogManager. Since the test doesn't keep a strong ref to the loggers they may get collected during the test. Solution: Save loggers to a static field in the test class. Other: Also removing "@compile -XDignore.symbol.file" that is unnecessary after Jigsaw. The compile tag will force a compile even if the class already exists, wasting times during reruns. Testing: Running tests on all platforms. Bug: https://bugs.openjdk.java.net/browse/JDK-8142464 webrev: http://cr.openjdk.java.net/~neliasso/8142464/webrev_jdk.01/ (JDK) Regards, Nils Eliasson -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.karlsson at oracle.com Thu Apr 28 12:11:06 2016 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 28 Apr 2016 14:11:06 +0200 Subject: RFR: 8155206: Internal VM test DirectiveParser_test is too verbose In-Reply-To: <5721F893.9020801@oracle.com> References: <5720761A.4010004@oracle.com> <5721F893.9020801@oracle.com> Message-ID: <5721FDDA.8060804@oracle.com> (Adding back hotspot-compiler-dev) Thanks, Nils. StefanK On 2016-04-28 13:48, Nils Eliasson wrote: > Hi Stefan, > > Looks good! > > Reviewed. > > Regards, > Nils > > On 2016-04-27 10:19, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to silence the DirectiveParser_test internal >> VM test. >> >> http://cr.openjdk.java.net/~stefank/8155206/webrev.01 >> https://bugs.openjdk.java.net/browse/JDK-8155206 >> >> Before the patch, we got the following output when running with >> -XX:+ExecuteInternalVMTests: >> >> ... >> Running test: Test_log_file_startup_truncation >> Running test: Test_invalid_log_file >> Running test: DirectivesParser_test >> Internal error on line 1 byte 2: Directive missing required match. >> Got EOS. >> At '}'. >> {} >> Internal error on line 1 byte 3: Directive missing required match. >> At '}]'. >> [{}] >> Internal error on line 1 byte 3: Directive missing required match. >> At '},{}]'. >> [{},{}] >> Internal error on line 1 byte 2: Directive missing required match. >> At '},{}'. >> {},{} >> Syntax error on line 2 byte 3: DirectivesParser can only start with >> an array containing directive objects, or one single directive. >> At '['. >> [ >> { >> match: "foo/bar.*", >> inline : "+java/util.*", >> PrintAssembly: true, >> BreakAtExecute: true, >> } >> ] >> ] >> >> Value error on line 4 byte 20: The key 'PrintInlining' does not allow >> an array of values. >> At '['. >> PrintInlining: [ >> true, >> false >> ], >> } >> ] >> >> Warning: +LogCompilation must be set to enable compilation logging >> from directives >> Warning: +LogCompilation must be set to enable compilation logging >> from directives >> Value error on line 7 byte 9: Method pattern error: Missing leading >> inline type (+/-) >> At '"foo",'. >> "foo", >> "bar", >> ] >> } >> } >> ] >> >> Value error on line 8 byte 9: Method pattern error: Missing leading >> inline type (+/-) >> At '"bar",'. >> "bar", >> ] >> } >> } >> ] >> >> Key error on line 1 byte 7: Key 'c1' not allowed after 'c1' key. >> At 'c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}]'. >> [{c1:{c1:{c1:{c1:{c1:{c1:{c1:{}}}}}}}}] >> Value error on line 5 byte 12: Key of type match needs a value of >> type string >> At 'true,'. >> match: true, >> inline: true, >> enable: true, >> c1: { >> preset: true, >> } >> } >> ] >> >> Running test: Test_TempNewSymbol >> Running test: VMStructs_test >> ... >> >> With the patch the output is much less noisy: >> ... >> Running test: Test_log_file_startup_truncation >> Running test: Test_invalid_log_file >> Running test: DirectivesParser_test >> Warning: +LogCompilation must be set to enable compilation logging >> from directives >> Warning: +LogCompilation must be set to enable compilation logging >> from directives >> Running test: Test_TempNewSymbol >> Running test: VMStructs_test >> ... >> >> We might want to get rid of the Warning messages, but I think the >> proposed patch is a good first step. >> >> You can turn on the old output with -XX:+VerboseInternalVMTests. >> >> Tested with -XX:+ExecuteInternalVMTests :) >> >> Thanks, >> StefanK > From rwestrel at redhat.com Thu Apr 28 12:29:14 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 28 Apr 2016 14:29:14 +0200 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset In-Reply-To: <5721DEE3.5010100@redhat.com> References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com> Message-ID: <5722021A.8000306@redhat.com> Thank you for the review, Andrew. I need a sponsor because of the test case. Roland. From zoltan.majo at oracle.com Thu Apr 28 13:04:58 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 28 Apr 2016 15:04:58 +0200 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset In-Reply-To: <5722021A.8000306@redhat.com> References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com> <5722021A.8000306@redhat.com> Message-ID: <57220A7A.80400@oracle.com> Hi Roland, On 04/28/2016 02:29 PM, Roland Westrelin wrote: > Thank you for the review, Andrew. > I need a sponsor because of the test case. I can sponsor your change. Best regards, Zoltan > > Roland. From adinn at redhat.com Thu Apr 28 13:33:01 2016 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 28 Apr 2016 14:33:01 +0100 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset In-Reply-To: <5722021A.8000306@redhat.com> References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com> <5722021A.8000306@redhat.com> Message-ID: <5722110D.8040601@redhat.com> On 28/04/16 13:29, Roland Westrelin wrote: > Thank you for the review, Andrew. > I need a sponsor because of the test case. If you need a second AArch64 reviewer then the patch also has my imprimatur (that's a yes :-). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From edward.nevill at gmail.com Thu Apr 28 13:49:48 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 28 Apr 2016 14:49:48 +0100 Subject: RFR: 8155617: aarch64: ClearArray does not use DC ZVA Message-ID: <1461851388.10531.27.camel@mint> Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8155617/bzero6 https://bugs.openjdk.java.net/browse/JDK-8155617 This is the bzero3 version previously discussed on the aarch64 list with the inner DC ZVA outlined. The outlining of the DC ZVA loop made no measurable difference to performance. I have also tuned the BlockZeroingLowLimit to default to 4 x cache line size rather than always defaulting to 256. Updated performance charts here:- http://cr.openjdk.java.net/~enevill/8155617/bzero6.pdf The chart show the performance improvement on 3 different partners HW. The benchmark was the following JHM test provided by Andrew Haley http://people.linaro.org/~edward.nevill/block_zero/ArrayFill.java The charts have been normalised so that the original jdk9 hs-comp tree is shown as 100%. The figures are % of original performance so lower is better. This is done to avoid disclosing absolute performance information on partner's HW. Orig: Original jdk9 hs-comp bzero6: jdk9 hs-comp with bzero6 Orig (no prf): Original jdk9 hs-comp (-XX:AllocatePrefetchStyle=0) bzero6 (no pref): jdk9 hs-comp with bzero6 (-XX:AllocatePrefetchStyle=0) There is significant interaction between prefetch and block zeroing as discussed previously. Some partners benefit from prefetch, others do not. The proposed patch does not change the behaviour of prefetch (ie. it leaves it enabled) as I think there should be a separate tuning exercise to tune prefetch for different partners HW. OK to push? Ed. From aph at redhat.com Thu Apr 28 14:06:36 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 28 Apr 2016 15:06:36 +0100 Subject: RFR: 8155617: aarch64: ClearArray does not use DC ZVA In-Reply-To: <1461851388.10531.27.camel@mint> References: <1461851388.10531.27.camel@mint> Message-ID: <572218EC.1010201@redhat.com> On 04/28/2016 02:49 PM, Edward Nevill wrote: > The proposed patch does not change the behaviour of prefetch (ie. it leaves it enabled) as I think there should be a separate tuning exercise to tune prefetch for different partners HW. OK, but as this is a severe performance regression for Partner X I expect the prefetch-tuning patch very soon. Andrew. From dmitrij.pochepko at oracle.com Thu Apr 28 14:28:33 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 28 Apr 2016 17:28:33 +0300 Subject: RFR(XS): 8155163 - JVMCI: MethodHandleAccessProvider.resolveInvokeBasicTarget implementation doesn't match javadoc Message-ID: <57221E11.4000406@oracle.com> Hi, please review fix for JDK-8155163 - JVMCI: MethodHandleAccessProvider.resolveInvokeBasicTarget implementation doesn't match javadoc MethodHandleAccessProvider.resolveInvokeBasicTarget implementation throws NPE in case 1st parameter doesn't represent MethodHandle, but according to javadoc it should return null in such case. So, a small fix should be applied to match javadoc. CR: https://bugs.openjdk.java.net/browse/JDK-8155163 Webrev: http://cr.openjdk.java.net/~dpochepk/8155163/webrev.01/ I've tested this change on linux_x64 via jprt. Thanks, Dmitrij -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Thu Apr 28 14:33:15 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 28 Apr 2016 16:33:15 +0200 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset In-Reply-To: <57220A7A.80400@oracle.com> References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com> <5722021A.8000306@redhat.com> <57220A7A.80400@oracle.com> Message-ID: <57221F2B.50606@redhat.com> Thanks for offering to sponsor, Zoltan and thanks for the second review, Andrew D.! Roland. From zoltan.majo at oracle.com Thu Apr 28 14:41:37 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 28 Apr 2016 16:41:37 +0200 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset In-Reply-To: <57221F2B.50606@redhat.com> References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com> <5722021A.8000306@redhat.com> <57220A7A.80400@oracle.com> <57221F2B.50606@redhat.com> Message-ID: <57222121.9030501@oracle.com> Hi, On 04/28/2016 04:33 PM, Roland Westrelin wrote: > Thanks for offering to sponsor, Zoltan and thanks for the second review, > Andrew D.! @Roland: you are welcome! @Andrew: Unfortunately, I started the push job without mentioning you in the description. Once I've seen your review, it was already too late. Sorry for that. Best regards, Zoltan > > Roland. From adinn at redhat.com Thu Apr 28 14:53:20 2016 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 28 Apr 2016 15:53:20 +0100 Subject: RFR(S): 8155612: Aarch64: vector nodes need to support misaligned offset In-Reply-To: <57222121.9030501@oracle.com> References: <5721DCCE.5050602@redhat.com> <5721DEE3.5010100@redhat.com> <5722021A.8000306@redhat.com> <57220A7A.80400@oracle.com> <57221F2B.50606@redhat.com> <57222121.9030501@oracle.com> Message-ID: <572223E0.5080805@redhat.com> On 28/04/16 15:41, Zolt?n Maj? wrote: > Hi, > > > On 04/28/2016 04:33 PM, Roland Westrelin wrote: >> Thanks for offering to sponsor, Zoltan and thanks for the second review, >> Andrew D.! > > @Roland: you are welcome! > > @Andrew: Unfortunately, I started the push job without mentioning you in > the description. Once I've seen your review, it was already too late. > Sorry for that. No problem. I don't think anyone is worried about keeping a precise score :-) regards, Andrew Dinn ----------- From rwestrel at redhat.com Thu Apr 28 14:55:26 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 28 Apr 2016 16:55:26 +0200 Subject: SuperWord::unrolling_analysis() question In-Reply-To: References: <5720E060.2050308@redhat.com> <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com> Message-ID: <5722245E.7090202@redhat.com> Hi Michael, > Roland, for superword.cpp you only need this one line as change, > which I have tested and for which has no negative side effects on > x86. It will address the issue(oldly enough, its where we started): > > Line 201 int max_vector = Matcher::max_vector_size(T_BYTE); Thanks for experimenting with this. That looks fine to me. I can make that change as part of the patch that enables superword unrolling analysis on aarch64 unless you'd like to do it as a separate change? Roland. From rwestrel at redhat.com Thu Apr 28 15:43:03 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 28 Apr 2016 17:43:03 +0200 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> <6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com> Message-ID: <57222F87.3040602@redhat.com> Hi Martin, > Roland, sorry that we discuss so much of my stuff in your thread for > 8154826. I think it is slightly related to your change and it touches > the same files. So it might be easier to sell it as add-on to your > change? > > If you prefer to handle it separated from 8154826, please let me > know. I could open a new bug for it. Disadvantage would be that > someone will have to take care of closed source platforms and I will > need a sponsor from Oracle. I would say it would be cleaner to keep the 2 changes separated (a change that affects all platforms hidden in a platform specific one might be confusing to someone looking at the history of changes). But I don't really have a strong opinion either so I'll let those who would do the extra sponsoring work decide. Vladimir, what do you think? Roland. From tom.rodriguez at oracle.com Thu Apr 28 16:13:50 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 28 Apr 2016 09:13:50 -0700 Subject: RFR 8155047: [JVMCI] findLeafConcreteSubtype should handle arrays of leaf concrete subtype Message-ID: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com> http://cr.openjdk.java.net/~never/8155047/webrev findLeafConcreteSubtype should use the same machinery for the elemental type when identifying leaf array types. tom From tom.rodriguez at oracle.com Thu Apr 28 16:26:43 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 28 Apr 2016 09:26:43 -0700 Subject: RFR 8154483: update IGV with improvements from Graal Message-ID: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> http://cr.openjdk.java.net/~never/8154483/webrev This is a collection of small improvements to IGV developed while working on Graal. I?ve categorized them below Bug fixes: Reset folder in top component to release reference to old graphs Fix concurrent modification exception in IGV Fix HTML quoting in tooltips and remove useless entries from search box UI improvements: Fixed keybinding for open and save actions in IGV Allow importing multiple files at once in IGV Make node searches look through the graph history for a match Performance related improvements: Add info message about time spent parsing files Relax expensive assert in IGV Reduce overhead of hash computation for graph identity checks Increase Integer cache size in IGV Share properties in IGV Binary graph format: Add folder and graph property support to binary graphs in IGV Handle Strings larger than buffer size properly in IGV From rwestrel at redhat.com Thu Apr 28 16:35:30 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 28 Apr 2016 18:35:30 +0200 Subject: RFR(S): 8154943: AArch64: redundant address computation instructions with vectorization Message-ID: http://cr.openjdk.java.net/~roland/8154943/webrev.00/ Example generated code: ;; B106: # B106 B107 <- B105 B106 Loop: B106-B106 inner main of N612 Freq: 293550 0x000003ffa9275650: sbfiz x0, x4, #3, #32 0x000003ffa9275654: add x3, x5, x0 0x000003ffa9275658: sbfiz x7, x4, #3, #32 0x000003ffa927565c: ldr q19, [x3,#16] 0x000003ffa9275660: add x19, x5, x7 0x000003ffa9275664: ldr q20, [x19,#32] ;*daload {reexecute=0 rethrow=0 return_oop=0} ; - jnt.scimark2.LU::factor at 237 (line 235) both sbfiz and add are strictly indentical and so fully redundant. The redundant instructions are caused by 2 ConvI2L nodes that are identical except for their type: 1984 ConvI2L _ 1683 [ 1987 ] debug_idx = 33701984 type = long: dump_spec = #long:minint..maxint-1:www 1678 ConvI2L _ 1683 [ 1672 ] bci = 49 debug_orig = dump_spec = #long:0..maxint-1:www debug_idx = 30401678 line = 183 type = long: The type doesn't help code generation on arm so it could be widened and duplicated instructions could then be removed. I think the same could be done on sparc. Roland. From dmitrij.pochepko at oracle.com Thu Apr 28 16:39:30 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 28 Apr 2016 19:39:30 +0300 Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case Message-ID: <57223CC2.6000309@oracle.com> Hi, please review changes for 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case MemoryAccessProvider.readUnsafeConstant method implementation throws NullPointerException in case JavaKind is null. Such behavior wasn't specified in javadoc. So, a small fix contains logic to throw IllegalArgumentException and respective javadoc update. CR: https://bugs.openjdk.java.net/browse/JDK-8155244 webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ I've tested fix on linux_x64 via jprt. Thanks, Dmitrij From zoltan.majo at oracle.com Thu Apr 28 17:44:48 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 28 Apr 2016 19:44:48 +0200 Subject: [9] RFR(XS): 8155653: TestVectorUnalignedOffset.java not pushed with 8155612 Message-ID: <57224C10.9060704@oracle.com> Hi, The test TestVectorUnalignedOffset.java was originally included (and reviewed) with JDK-8155612: http://cr.openjdk.java.net/~roland/8155612/webrev.00/ I sponsored the changeset but, by accident, did not push the test. So I've filed the current bug to address that problem. If there are no objections, I'll push the test separately tomorrow (with Roland as a contributor). Here is the webrev: http://cr.openjdk.java.net/~zmajo/8155653/webrev.00/ Sorry for the noise. Best regards, Zoltan From michael.c.berg at intel.com Thu Apr 28 18:09:44 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 28 Apr 2016 18:09:44 +0000 Subject: SuperWord::unrolling_analysis() question In-Reply-To: <5722245E.7090202@redhat.com> References: <5720E060.2050308@redhat.com> <47728956-DFE2-49B0-8844-D7966ACD8B8B@oracle.com> <5722245E.7090202@redhat.com> Message-ID: Ok, Roland that would be great, please do add it to your next changeset. Regards, Michael -----Original Message----- From: Roland Westrelin [mailto:rwestrel at redhat.com] Sent: Thursday, April 28, 2016 7:55 AM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: SuperWord::unrolling_analysis() question Hi Michael, > Roland, for superword.cpp you only need this one line as change, which > I have tested and for which has no negative side effects on x86. It > will address the issue(oldly enough, its where we started): > > Line 201 int max_vector = Matcher::max_vector_size(T_BYTE); Thanks for experimenting with this. That looks fine to me. I can make that change as part of the patch that enables superword unrolling analysis on aarch64 unless you'd like to do it as a separate change? Roland. From christian.thalinger at oracle.com Thu Apr 28 19:03:34 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 28 Apr 2016 09:03:34 -1000 Subject: RFR(XS): 8155163 - JVMCI: MethodHandleAccessProvider.resolveInvokeBasicTarget implementation doesn't match javadoc In-Reply-To: <57221E11.4000406@oracle.com> References: <57221E11.4000406@oracle.com> Message-ID: Looks good. > On Apr 28, 2016, at 4:28 AM, Dmitrij Pochepko wrote: > > Hi, > > please review fix for JDK-8155163 - JVMCI: MethodHandleAccessProvider.resolveInvokeBasicTarget implementation doesn't match javadoc > > MethodHandleAccessProvider.resolveInvokeBasicTarget implementation throws NPE in case 1st parameter doesn't represent MethodHandle, but according to javadoc it should return null in such case. So, a small fix should be applied to match javadoc. > > CR: https://bugs.openjdk.java.net/browse/JDK-8155163 > Webrev: http://cr.openjdk.java.net/~dpochepk/8155163/webrev.01/ > > I've tested this change on linux_x64 via jprt. > > Thanks, > Dmitrij -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 28 19:14:46 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 12:14:46 -0700 Subject: [9] RFR(XS): 8155653: TestVectorUnalignedOffset.java not pushed with 8155612 In-Reply-To: <57224C10.9060704@oracle.com> References: <57224C10.9060704@oracle.com> Message-ID: <57226126.8010708@oracle.com> Good. Thanks, Vladimir On 4/28/16 10:44 AM, Zolt?n Maj? wrote: > Hi, > > > The test TestVectorUnalignedOffset.java was originally included (and reviewed) with JDK-8155612: > http://cr.openjdk.java.net/~roland/8155612/webrev.00/ > > I sponsored the changeset but, by accident, did not push the test. So I've filed the current bug to address that problem. > > If there are no objections, I'll push the test separately tomorrow (with Roland as a contributor). Here is the webrev: > http://cr.openjdk.java.net/~zmajo/8155653/webrev.00/ > > Sorry for the noise. > > Best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Thu Apr 28 19:16:29 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 12:16:29 -0700 Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case In-Reply-To: <57223CC2.6000309@oracle.com> References: <57223CC2.6000309@oracle.com> Message-ID: <5722618D.7010801@oracle.com> Looks good. Thanks, Vladimir On 4/28/16 9:39 AM, Dmitrij Pochepko wrote: > Hi, > > please review changes for 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case > > MemoryAccessProvider.readUnsafeConstant method implementation throws NullPointerException in case JavaKind is null. Such behavior wasn't specified in javadoc. > So, a small fix contains logic to throw IllegalArgumentException and respective javadoc update. > > CR: https://bugs.openjdk.java.net/browse/JDK-8155244 > webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ > > I've tested fix on linux_x64 via jprt. > > Thanks, > Dmitrij From christian.thalinger at oracle.com Thu Apr 28 19:23:16 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 28 Apr 2016 09:23:16 -1000 Subject: RFR 8155047: [JVMCI] findLeafConcreteSubtype should handle arrays of leaf concrete subtype In-Reply-To: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com> References: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com> Message-ID: <6D4CBD60-A855-44A6-9C27-04D2012D050D@oracle.com> > On Apr 28, 2016, at 6:13 AM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8155047/webrev public LeafType(ResolvedJavaType context) { + assert !context.isLeaf() : "assumption isn't required for leaf types"; This assert is confusing. The assumption is that a given type has no subtypes, which is also true for leaf types. Does this assert make sure it?s not used in the wrong places? > > findLeafConcreteSubtype should use the same machinery for the elemental type when identifying leaf array types. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Thu Apr 28 19:25:25 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 28 Apr 2016 09:25:25 -1000 Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case In-Reply-To: <57223CC2.6000309@oracle.com> References: <57223CC2.6000309@oracle.com> Message-ID: <1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com> + * @throws IllegalArgumentException if {@code kind} is {@code null}, {@code kind} is {@link JavaKind#Void} or + * not {@linkplain JavaKind#isPrimitive() primitive} kind I don?t think you have to repeat {@code kind}. > On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko wrote: > > Hi, > > please review changes for 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case > > MemoryAccessProvider.readUnsafeConstant method implementation throws NullPointerException in case JavaKind is null. Such behavior wasn't specified in javadoc. > So, a small fix contains logic to throw IllegalArgumentException and respective javadoc update. > > CR: https://bugs.openjdk.java.net/browse/JDK-8155244 > webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ > > I've tested fix on linux_x64 via jprt. > > Thanks, > Dmitrij -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 28 19:43:17 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 12:43:17 -0700 Subject: RFR(S): 8154943: AArch64: redundant address computation instructions with vectorization In-Reply-To: References: Message-ID: <572267D5.8020605@oracle.com> node.cpp change is good. compile.cpp I understand when you replace "similar" (same in(1)) node with it but it is not clear that you also processing users (whole following chain) to remove similar nodes. Add comment. I think the check "!(k->Opcode() == Op_ConvI2L || ... " (and use 'continue' instead of 'break') should be done when you push a node on the list wq.push(u). Thanks, Vladimir On 4/28/16 9:35 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8154943/webrev.00/ > > Example generated code: > > ;; B106: # B106 B107 <- B105 B106 Loop: B106-B106 inner main of N612 Freq: 293550 > > 0x000003ffa9275650: sbfiz x0, x4, #3, #32 > 0x000003ffa9275654: add x3, x5, x0 > 0x000003ffa9275658: sbfiz x7, x4, #3, #32 > 0x000003ffa927565c: ldr q19, [x3,#16] > 0x000003ffa9275660: add x19, x5, x7 > 0x000003ffa9275664: ldr q20, [x19,#32] ;*daload {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.LU::factor at 237 (line 235) > > both sbfiz and add are strictly indentical and so fully redundant. The > redundant instructions are caused by 2 ConvI2L nodes that are identical > except for their type: > > 1984 ConvI2L _ 1683 [ 1987 ] debug_idx = 33701984 type = long: dump_spec = #long:minint..maxint-1:www > 1678 ConvI2L _ 1683 [ 1672 ] bci = 49 debug_orig = dump_spec = #long:0..maxint-1:www debug_idx = 30401678 line = 183 type = long: > > The type doesn't help code generation on arm so it could be widened and > duplicated instructions could then be removed. I think the same could be > done on sparc. > > Roland. > From dean.long at oracle.com Thu Apr 28 20:24:25 2016 From: dean.long at oracle.com (Dean Long) Date: Thu, 28 Apr 2016 13:24:25 -0700 Subject: RFR(S): 8154943: AArch64: redundant address computation instructions with vectorization In-Reply-To: References: Message-ID: Are there any platforms where this won't help? How about adding a Matcher function instead of a cpu-specific ifdef. dl On 4/28/2016 9:35 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8154943/webrev.00/ > > Example generated code: > > ;; B106: # B106 B107 <- B105 B106 Loop: B106-B106 inner main of N612 Freq: 293550 > > 0x000003ffa9275650: sbfiz x0, x4, #3, #32 > 0x000003ffa9275654: add x3, x5, x0 > 0x000003ffa9275658: sbfiz x7, x4, #3, #32 > 0x000003ffa927565c: ldr q19, [x3,#16] > 0x000003ffa9275660: add x19, x5, x7 > 0x000003ffa9275664: ldr q20, [x19,#32] ;*daload {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.LU::factor at 237 (line 235) > > both sbfiz and add are strictly indentical and so fully redundant. The > redundant instructions are caused by 2 ConvI2L nodes that are identical > except for their type: > > 1984 ConvI2L _ 1683 [ 1987 ] debug_idx = 33701984 type = long: dump_spec = #long:minint..maxint-1:www > 1678 ConvI2L _ 1683 [ 1672 ] bci = 49 debug_orig = dump_spec = #long:0..maxint-1:www debug_idx = 30401678 line = 183 type = long: > > The type doesn't help code generation on arm so it could be widened and > duplicated instructions could then be removed. I think the same could be > done on sparc. > > Roland. From vladimir.kozlov at oracle.com Thu Apr 28 22:50:17 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 15:50:17 -0700 Subject: RFR(M): 8154826: AArch64: take better advantage of base + shifted offset addressing mode In-Reply-To: <57222F87.3040602@redhat.com> References: <57188E03.5070303@redhat.com> <571E20BD.3030907@redhat.com> <53e03a7d8d3249ecb5d84ef93c8f791c@DEWDFE13DE14.global.corp.sap> <5720718B.6020102@redhat.com> <6980415d-9615-ffa3-75e5-245a5bee6555@oracle.com> <57222F87.3040602@redhat.com> Message-ID: <572293A9.5090702@oracle.com> Do it separately, please. Thanks, Vladimir On 4/28/16 8:43 AM, Roland Westrelin wrote: > > Hi Martin, > >> Roland, sorry that we discuss so much of my stuff in your thread for >> 8154826. I think it is slightly related to your change and it touches >> the same files. So it might be easier to sell it as add-on to your >> change? >> >> If you prefer to handle it separated from 8154826, please let me >> know. I could open a new bug for it. Disadvantage would be that >> someone will have to take care of closed source platforms and I will >> need a sponsor from Oracle. > > I would say it would be cleaner to keep the 2 changes separated (a > change that affects all platforms hidden in a platform specific one > might be confusing to someone looking at the history of changes). But I > don't really have a strong opinion either so I'll let those who would do > the extra sponsoring work decide. Vladimir, what do you think? > > Roland. > From vladimir.kozlov at oracle.com Thu Apr 28 22:55:25 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 15:55:25 -0700 Subject: RFR 8154483: update IGV with improvements from Graal In-Reply-To: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> Message-ID: <572294DD.40601@oracle.com> Looks good. In serialization/BinaryParser.java Copyright year change is wrong. Thanks, Vladimir On 4/28/16 9:26 AM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8154483/webrev > > This is a collection of small improvements to IGV developed while working on Graal. I?ve categorized them below > > Bug fixes: > Reset folder in top component to release reference to old graphs > Fix concurrent modification exception in IGV > Fix HTML quoting in tooltips and remove useless entries from search box > > UI improvements: > Fixed keybinding for open and save actions in IGV > Allow importing multiple files at once in IGV > Make node searches look through the graph history for a match > > Performance related improvements: > Add info message about time spent parsing files > Relax expensive assert in IGV > Reduce overhead of hash computation for graph identity checks > Increase Integer cache size in IGV > Share properties in IGV > > Binary graph format: > Add folder and graph property support to binary graphs in IGV > Handle Strings larger than buffer size properly in IGV > From vladimir.kozlov at oracle.com Thu Apr 28 23:29:26 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 16:29:26 -0700 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <5721DCE7.1020102@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> <571F5756.7020007@oracle.com> <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> <5720CA98.10605@oracle.com> <8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com> <5721DCE7.1020102@oracle.com> Message-ID: <57229CD6.9050003@oracle.com> On 4/28/16 2:50 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > On 04/27/2016 11:55 PM, Vladimir Kozlov wrote: >> [...] >>> I agree. But I'd prefer we do that in the AllocatePrefetchStepSizeConstraintFunc() constraint function in >>> commandLineFlagConstraintsCompiler.cpp). The reason is that since JEP-245 the preferred place to validate of >>> command-line arguments is in constraint functions. Most compiler-related checks were moved there with JDK-8078554 and >>> JDK-8146478. >> >> Okay. Usually we do platform specific flag's setting in vm_version_.cpp files but it looks like it start changing. >> I am not supporter of having multiply #ifdef in shared code (in commandLineFlagConstraintsCompiler). > > JEP-245 proposed to have ranges/constraints defined for all flags. Some of these ranges/constraints are unavoidably architecture-specific. > > I agree with you that having lots of #ifdefs in shared code does not improve code readability. But I'm wondering what would be a good way to achieve the goals of JEP-245 without using too many > #ifdefs. Maybe architecture-specific constraint files (e.g., commandLineFlagConstraintsCompiler_solaris.cpp, commandLineFlagConstraintsCompiler_x86.cpp, etc.)? Yes, that will work. > > At least with the current change we won't add more #ifdefs... > >> >>> >>> I'd also like to set the minimum value for AllocatePrefetchDistance to AllocatePrefetchStepSize, otherwise we can access >>> the heap before newly allocated objects/arrays. >> >> Only for style 3. And I said it may be better to change code. > > Yes, you've mentioned that in the first round of reviews and I missed your point in the subsequent rounds. Sorry for that. > > I've updated the code in macro.cpp and the constraint AllocatePrefetchDistanceConstraintFunc() as you've suggested. > >> [...] >>>> >>>> Agree. I think AllocatePrefetchStyle=2 is broken on all platforms - it should be used only if UseTLAB is true. Please, >>>> look. >>> >>> It seems AllocatePrefetchStyle = 2 can be used only if UseTLAB is true. >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/6a17c49de974/src/share/vm/opto/macro.cpp#l1844 >> >> I got report from it crashing on T7 (attached). May be it is because by default it used BIS with it. > > I think using BIS with AllocatePrefetchStyle=2 is indeed the cause. > > I've executed the following on our T7 machine with b115: > * java -XX:AllocatePrefetchStyle=2 (uses AllocatePrefetchInstr=1 by default) -> crashes > * java -XX:AllocatePrefetchStyle=2 -XX:AllocatePrefetchInstr=0 -> works > > With the newest webrev, both commands pass. > > Here is the newest webrev: > http://cr.openjdk.java.net/~zmajo/8153340/webrev.03/ This looks good. Thanks, Vladimir > > I re-did testing with JPRT, the results look OK. > > Thank you! > > Best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >>> >>> Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, I've started an RBT run with all hotspot on all >>> platforms using AllocatePrefetchStyle=2. So far no problems have shown up. >>> >>>> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch should be set for all styles on all platforms to >>>> avoid accessing beyond heap. Prefetch instructions doc say that they does not trap but we should be careful. >>> >>> I agree. >>> >>> That means we initialize _reserve_for_allocation_prefetch in a platform-independent way. So I think it would make sense >>> to move that field to ThreadLocalAllocBuffer, as TLAB is the only user of that field and we don't support >>> platform-independent initialization of Abstract_VM_Version. I did that in the updated webrev. >>> >>>>> >>>>> The updated webrev does the following (in addition to fixing the original problem with AllocatePrefetchDistance): >>>>> >>>>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 (i.e., BIS instructions are used for prefetching). As >>>>> far as I understand, AllocatePrefetchStyle = 3 was added to support prefetching with BIS, so if BIS is enabled, we >>>>> should use AllocatePrefetchStyle = 3. >>>> >>>> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should select AllocatePrefetchStyle = 3. >>> >>> OK. >>> >>>> But we should allow AllocatePrefetchStyle = 3 if normal prefetch instructions (or other platforms) are used. >>> >>> OK, I've removed that restriction. >>> >>>> I think we should update comment in globals.hpp to say "generate one prefetch per cache line" without saying BIS. >>> >>> OK. >>> >>>> >>>> But I agree if BIS is not available we should not use BIS AllocatePrefetchInstr = 1. >>> >>> OK. >>> >>>> >>>>> >>>>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on non-SPARC platforms. >>>> >>>> It could be useful on other platforms since it does one access per cache line. >>> >>> OK, let's keep it available then. >>> >>>> >>>> >>>>> >>>>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if AllocatePrefetchStyle is 3 (due to alignment requirements). >>>> >>>> That is correct since stxa requires at least 8 bytes alignment (as stx). >>> >>> OK. >>> >>>> >>>>> >>>>> 3. Determine the number of lines to prefetch in the same way for all prefetch styles: >>>>> lines = (prefecth instance allocation) ? AllocateInstancePrefetchLines : AllocatePrefetchLines >>>> >>>> Agree. >>> >>> OK. >>> >>>> >>>>> >>>>> Here is the updated webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/ >>>> >>>> vm_version_sparc.cpp >>>> AllocatePrefetchInstr = 0 should be set for all styles (not only 1) when BIS is not available. >>> >>> OK. I think it is sufficient to do >>> >>> 52 if (!has_blk_init()) { >>> 53 if (AllocatePrefetchInstr == 1) { >>> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 unavailable"); >>> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0); >>> 56 } I hope I'm not missing anything here. Here is the updated webrev: >>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - JPRT (incl. TestOptionsWithRanges); - local testing on >>> SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all platforms. Thank you! Best regards, Zoltan >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Testing: >>>>> - JPRT (incl. TestOptionsWithRanges) >>>>> - local testing on a SPARC machine. >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> please review the patch for 8153340. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340 >>>>>>> >>>>>>> >>>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and AllocatePrefetchDistance==0. The crash happens due to the way >>>>>>> the address for the first prefetch instruction is calculated [1]: >>>>>>> >>>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= ~(AllocatePrefetchStepSize - 1) which can zero some of >>>>>>> the bits of cache_adr. That result in accesses *before* the newly allocated object. >>>>>>> >>>>>>> >>>>>>> Solution: Set lower limit of AllocatePrefetchDistance to AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >>>>>>> Unquarantine test. >>>>>>> >>>>>>> Webrev: >>>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >>>>>>> >>>>>>> Testing: >>>>>>> - JPRT (incl. TestOptionsWithRanges.java) >>>>>>> - local testing on a SPARC machine. >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> >>>>>>> Zoltan >>>>>>> >>>>>>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 >>>>> >>> > From vladimir.kozlov at oracle.com Thu Apr 28 23:54:32 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 16:54:32 -0700 Subject: [9] RFR(S): 8142464: PlatformLoggerTest.java throws java.lang.RuntimeException: Logger test.logger.bar does not exist In-Reply-To: <5721FB8F.40207@oracle.com> References: <5721FB8F.40207@oracle.com> Message-ID: <5722A2B8.7000100@oracle.com> Good. thanks, Vladimir On 4/28/16 5:01 AM, Nils Eliasson wrote: > Hi, > > Please review the fix of jdk/test/sun/util/logging/PlatformLoggerTest.java that has been failing intermittently in our nightlies. > > Summary: > The test uses loggers that are accessible by weak refs from a LogManager. Since the test doesn't keep a strong ref to the loggers they may get collected during the test. > > Solution: > Save loggers to a static field in the test class. > > Other: > Also removing "@compile -XDignore.symbol.file" that is unnecessary after Jigsaw. The compile tag will force a compile even if the class already exists, wasting times during reruns. > > Testing: > Running tests on all platforms. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8142464 > webrev: http://cr.openjdk.java.net/~neliasso/8142464/webrev_jdk.01/ (JDK) > > Regards, > Nils Eliasson From tom.rodriguez at oracle.com Thu Apr 28 23:55:49 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 28 Apr 2016 16:55:49 -0700 Subject: RFR 8155047: [JVMCI] findLeafConcreteSubtype should handle arrays of leaf concrete subtype In-Reply-To: <6D4CBD60-A855-44A6-9C27-04D2012D050D@oracle.com> References: <3B2FE4AA-668D-4929-9FF4-EC65A11700A8@oracle.com> <6D4CBD60-A855-44A6-9C27-04D2012D050D@oracle.com> Message-ID: <81FC93A7-03F5-4A10-858A-9D227540D419@oracle.com> > On Apr 28, 2016, at 12:23 PM, Christian Thalinger wrote: > > >> On Apr 28, 2016, at 6:13 AM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8155047/webrev > > public LeafType(ResolvedJavaType context) { > + assert !context.isLeaf() : "assumption isn't required for leaf types"; > This assert is confusing. The assumption is that a given type has no subtypes, which is also true for leaf types. Does this assert make sure it?s not used in the wrong places? LeafType is an assumption that a dynamic type that might someday have subclasses doesn?t currently have any. isLeaf() is a static guarantee that a type will never have subclasses, so we are asserting that we never emit a dynamic dependence for something that is statically true. I?m open to new wording. tom > >> >> findLeafConcreteSubtype should use the same machinery for the elemental type when identifying leaf array types. >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.rodriguez at oracle.com Thu Apr 28 23:57:02 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 28 Apr 2016 16:57:02 -0700 Subject: RFR 8154483: update IGV with improvements from Graal In-Reply-To: <572294DD.40601@oracle.com> References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> <572294DD.40601@oracle.com> Message-ID: <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com> > On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov wrote: > > Looks good. > > In serialization/BinaryParser.java Copyright year change is wrong. Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch. I?ll fix it. Actually, should I go ahead and update the copyright years for all these files? tom > > Thanks, > Vladimir > > On 4/28/16 9:26 AM, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/8154483/webrev >> >> This is a collection of small improvements to IGV developed while working on Graal. I?ve categorized them below >> >> Bug fixes: >> Reset folder in top component to release reference to old graphs >> Fix concurrent modification exception in IGV >> Fix HTML quoting in tooltips and remove useless entries from search box >> >> UI improvements: >> Fixed keybinding for open and save actions in IGV >> Allow importing multiple files at once in IGV >> Make node searches look through the graph history for a match >> >> Performance related improvements: >> Add info message about time spent parsing files >> Relax expensive assert in IGV >> Reduce overhead of hash computation for graph identity checks >> Increase Integer cache size in IGV >> Share properties in IGV >> >> Binary graph format: >> Add folder and graph property support to binary graphs in IGV >> Handle Strings larger than buffer size properly in IGV >> From vladimir.kozlov at oracle.com Fri Apr 29 00:03:09 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 17:03:09 -0700 Subject: RFR: 8153655: Make intrinsics flags diagnostic and update intrinsics tests to enable diagnostic options. In-Reply-To: References: Message-ID: <5722A4BD.5060607@oracle.com> Hi Rahul, Changes looks good but you need to update changes for SHA tests because I changed them for JDK-8154495: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6a17c49de974 Thanks, Vladimir On 4/27/16 2:45 AM, Rahul Raghavan wrote: > Hi, > > Please review the following patch for JDK-8153655. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 > Webrev: http://cr.openjdk.java.net/~rraghavan/8153655/webrev.00/ > > > Notes: > > 1. This 8153655/webrev.00 re-includes earlier backed out, same JDK-8145348 changes > (https://bugs.openjdk.java.net/browse/JDK-8145348 - Make intrinsics flags diagnostic) > and also additional fixes in failing intrinsic tests. > > > 2. Checked all the usages of changed intrinsic flags in tests and > found JDK-8153655 type test failure issue (after initial JDK-8145348 fix) is present only for following tests - > a. UseAESIntrinsics test (compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java) > b. UseSHA* tests (at compiler/intrinsics/sha/cli/) > > > 3. Summary of 8153655/webrev.00 changes. > > - Includes earlier backed out, same JDK-8145348 changes: > src/share/vm/c1/c1_globals.hpp > src/share/vm/opto/c2_globals.hpp > src/share/vm/runtime/globals.hpp > test/compiler/intrinsics/muladd/TestMulAdd.java > test/compiler/runtime/6859338/Test6859338.java > > - 'test/compiler/cpuflags/AESIntrinsicsBase.java' > Options were passed in wrong order. > Changes done so that 'UnlockDiagnosticVMOptions' option precedes the diagnostic flags. > > - 'test/compiler/intrinsics/sha/cli/*' - (UseSHA* tests) > 'UnlockDiagnosticVMOptions' option was not getting passed. > Added support to precede intrinsic flag usages with explicit 'UnlockDiagnosticVMOptions'. > > > 4. No issues found with testing done using product builds with proposed changes > (hotspot/test/compiler/cpuflags/*, hotspot/test/compiler/intrinsics/*, hotspot/test/compiler/runtime/6859338/Test6859338.java) > Complete pre-integration testing using product builds is in progress. > > > Thanks, > Rahul > From vladimir.kozlov at oracle.com Fri Apr 29 00:05:02 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 17:05:02 -0700 Subject: RFR 8154483: update IGV with improvements from Graal In-Reply-To: <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com> References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> <572294DD.40601@oracle.com> <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com> Message-ID: <5722A52E.6030502@oracle.com> On 4/28/16 4:57 PM, Tom Rodriguez wrote: > >> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov wrote: >> >> Looks good. >> >> In serialization/BinaryParser.java Copyright year change is wrong. > > Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch. I?ll fix it. Actually, should I go ahead and update the copyright years for all these files? Yes, please. Thanks, Vladimir > > tom > >> >> Thanks, >> Vladimir >> >> On 4/28/16 9:26 AM, Tom Rodriguez wrote: >>> http://cr.openjdk.java.net/~never/8154483/webrev >>> >>> This is a collection of small improvements to IGV developed while working on Graal. I?ve categorized them below >>> >>> Bug fixes: >>> Reset folder in top component to release reference to old graphs >>> Fix concurrent modification exception in IGV >>> Fix HTML quoting in tooltips and remove useless entries from search box >>> >>> UI improvements: >>> Fixed keybinding for open and save actions in IGV >>> Allow importing multiple files at once in IGV >>> Make node searches look through the graph history for a match >>> >>> Performance related improvements: >>> Add info message about time spent parsing files >>> Relax expensive assert in IGV >>> Reduce overhead of hash computation for graph identity checks >>> Increase Integer cache size in IGV >>> Share properties in IGV >>> >>> Binary graph format: >>> Add folder and graph property support to binary graphs in IGV >>> Handle Strings larger than buffer size properly in IGV >>> > From tom.rodriguez at oracle.com Fri Apr 29 00:08:28 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 28 Apr 2016 17:08:28 -0700 Subject: RFR 8154483: update IGV with improvements from Graal In-Reply-To: <5722A52E.6030502@oracle.com> References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> <572294DD.40601@oracle.com> <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com> <5722A52E.6030502@oracle.com> Message-ID: > On Apr 28, 2016, at 5:05 PM, Vladimir Kozlov wrote: > > On 4/28/16 4:57 PM, Tom Rodriguez wrote: >> >>> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov wrote: >>> >>> Looks good. >>> >>> In serialization/BinaryParser.java Copyright year change is wrong. >> >> Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch. I?ll fix it. Actually, should I go ahead and update the copyright years for all these files? > > Yes, please. They?re all updated. tom > > Thanks, > Vladimir > >> >> tom >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/28/16 9:26 AM, Tom Rodriguez wrote: >>>> http://cr.openjdk.java.net/~never/8154483/webrev >>>> >>>> This is a collection of small improvements to IGV developed while working on Graal. I?ve categorized them below >>>> >>>> Bug fixes: >>>> Reset folder in top component to release reference to old graphs >>>> Fix concurrent modification exception in IGV >>>> Fix HTML quoting in tooltips and remove useless entries from search box >>>> >>>> UI improvements: >>>> Fixed keybinding for open and save actions in IGV >>>> Allow importing multiple files at once in IGV >>>> Make node searches look through the graph history for a match >>>> >>>> Performance related improvements: >>>> Add info message about time spent parsing files >>>> Relax expensive assert in IGV >>>> Reduce overhead of hash computation for graph identity checks >>>> Increase Integer cache size in IGV >>>> Share properties in IGV >>>> >>>> Binary graph format: >>>> Add folder and graph property support to binary graphs in IGV >>>> Handle Strings larger than buffer size properly in IGV >>>> >> From vladimir.kozlov at oracle.com Fri Apr 29 00:14:17 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Apr 2016 17:14:17 -0700 Subject: RFR 8154483: update IGV with improvements from Graal In-Reply-To: References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> <572294DD.40601@oracle.com> <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com> <5722A52E.6030502@oracle.com> Message-ID: <5722A759.80706@oracle.com> Wow, did we really got IGV at 1998? Looks good. Thanks, Vladimir On 4/28/16 5:08 PM, Tom Rodriguez wrote: > >> On Apr 28, 2016, at 5:05 PM, Vladimir Kozlov wrote: >> >> On 4/28/16 4:57 PM, Tom Rodriguez wrote: >>> >>>> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov wrote: >>>> >>>> Looks good. >>>> >>>> In serialization/BinaryParser.java Copyright year change is wrong. >>> >>> Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch. I?ll fix it. Actually, should I go ahead and update the copyright years for all these files? >> >> Yes, please. > > They?re all updated. > > tom > >> >> Thanks, >> Vladimir >> >>> >>> tom >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/28/16 9:26 AM, Tom Rodriguez wrote: >>>>> http://cr.openjdk.java.net/~never/8154483/webrev >>>>> >>>>> This is a collection of small improvements to IGV developed while working on Graal. I?ve categorized them below >>>>> >>>>> Bug fixes: >>>>> Reset folder in top component to release reference to old graphs >>>>> Fix concurrent modification exception in IGV >>>>> Fix HTML quoting in tooltips and remove useless entries from search box >>>>> >>>>> UI improvements: >>>>> Fixed keybinding for open and save actions in IGV >>>>> Allow importing multiple files at once in IGV >>>>> Make node searches look through the graph history for a match >>>>> >>>>> Performance related improvements: >>>>> Add info message about time spent parsing files >>>>> Relax expensive assert in IGV >>>>> Reduce overhead of hash computation for graph identity checks >>>>> Increase Integer cache size in IGV >>>>> Share properties in IGV >>>>> >>>>> Binary graph format: >>>>> Add folder and graph property support to binary graphs in IGV >>>>> Handle Strings larger than buffer size properly in IGV >>>>> >>> > From tom.rodriguez at oracle.com Fri Apr 29 04:17:54 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 28 Apr 2016 21:17:54 -0700 Subject: RFR 8154483: update IGV with improvements from Graal In-Reply-To: <5722A759.80706@oracle.com> References: <32FAC776-FD30-4455-9650-08CEC0444F55@oracle.com> <572294DD.40601@oracle.com> <8D1C9410-96CF-429A-989D-71FF1D415962@oracle.com> <5722A52E.6030502@oracle.com> <5722A759.80706@oracle.com> Message-ID: <5FD6EA5C-1144-46A7-B477-40B2BF3F4985@oracle.com> > On Apr 28, 2016, at 5:14 PM, Vladimir Kozlov wrote: > > Wow, did we really got IGV at 1998? I think it contains some elements that are that old but IGV was integrated into the hotspot repo in 2008. It existed for a few years before that as Thomas?s thesis project, so maybe 2006? tom > > Looks good. > > Thanks, > Vladimir > > On 4/28/16 5:08 PM, Tom Rodriguez wrote: >> >>> On Apr 28, 2016, at 5:05 PM, Vladimir Kozlov wrote: >>> >>> On 4/28/16 4:57 PM, Tom Rodriguez wrote: >>>> >>>>> On Apr 28, 2016, at 3:55 PM, Vladimir Kozlov wrote: >>>>> >>>>> Looks good. >>>>> >>>>> In serialization/BinaryParser.java Copyright year change is wrong. >>>> >>>> Thanks, I think there are copyright updates in JDK9 that aren?t in 8 so I accidentally overwrote it when creating this patch. I?ll fix it. Actually, should I go ahead and update the copyright years for all these files? >>> >>> Yes, please. >> >> They?re all updated. >> >> tom >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> tom >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/28/16 9:26 AM, Tom Rodriguez wrote: >>>>>> http://cr.openjdk.java.net/~never/8154483/webrev >>>>>> >>>>>> This is a collection of small improvements to IGV developed while working on Graal. I?ve categorized them below >>>>>> >>>>>> Bug fixes: >>>>>> Reset folder in top component to release reference to old graphs >>>>>> Fix concurrent modification exception in IGV >>>>>> Fix HTML quoting in tooltips and remove useless entries from search box >>>>>> >>>>>> UI improvements: >>>>>> Fixed keybinding for open and save actions in IGV >>>>>> Allow importing multiple files at once in IGV >>>>>> Make node searches look through the graph history for a match >>>>>> >>>>>> Performance related improvements: >>>>>> Add info message about time spent parsing files >>>>>> Relax expensive assert in IGV >>>>>> Reduce overhead of hash computation for graph identity checks >>>>>> Increase Integer cache size in IGV >>>>>> Share properties in IGV >>>>>> >>>>>> Binary graph format: >>>>>> Add folder and graph property support to binary graphs in IGV >>>>>> Handle Strings larger than buffer size properly in IGV >>>>>> >>>> >> From zoltan.majo at oracle.com Fri Apr 29 06:37:35 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 29 Apr 2016 08:37:35 +0200 Subject: [9] RFR (XS): 8153340: Incorrect lower bound for AllocatePrefetchDistance with AllocatePrefetchStyle=3 In-Reply-To: <57229CD6.9050003@oracle.com> References: <5718B9BC.10001@oracle.com> <57195621.7050307@oracle.com> <571F5756.7020007@oracle.com> <0bb5ad96-ebd4-c9a2-d5a0-321b70b28528@oracle.com> <5720CA98.10605@oracle.com> <8136c0fd-4fe2-280f-1129-98530403e88a@oracle.com> <5721DCE7.1020102@oracle.com> <57229CD6.9050003@oracle.com> Message-ID: <5723012F.2020003@oracle.com> Hi Vladimir, On 04/29/2016 01:29 AM, Vladimir Kozlov wrote: > [...] >> JEP-245 proposed to have ranges/constraints defined for all flags. >> Some of these ranges/constraints are unavoidably architecture-specific. >> >> I agree with you that having lots of #ifdefs in shared code does not >> improve code readability. But I'm wondering what would be a good way >> to achieve the goals of JEP-245 without using too many >> #ifdefs. Maybe architecture-specific constraint files (e.g., >> commandLineFlagConstraintsCompiler_solaris.cpp, >> commandLineFlagConstraintsCompiler_x86.cpp, etc.)? > > Yes, that will work. OK, I'll keep that in mind when addressing similar problems in the future. > [...] >> >> I think using BIS with AllocatePrefetchStyle=2 is indeed the cause. >> >> I've executed the following on our T7 machine with b115: >> * java -XX:AllocatePrefetchStyle=2 (uses AllocatePrefetchInstr=1 by >> default) -> crashes >> * java -XX:AllocatePrefetchStyle=2 -XX:AllocatePrefetchInstr=0 -> works >> >> With the newest webrev, both commands pass. >> >> Here is the newest webrev: >> http://cr.openjdk.java.net/~zmajo/8153340/webrev.03/ > > This looks good. Thank you for the review! Best regards, Zoltan > > Thanks, > Vladimir > >> >> I re-did testing with JPRT, the results look OK. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Also, AllocatePrefetchStyle = 2 seems to work fine. But to be sure, >>>> I've started an RBT run with all hotspot on all >>>> platforms using AllocatePrefetchStyle=2. So far no problems have >>>> shown up. >>>> >>>>> And I think Abstract_VM_Version::_reserve_for_allocation_prefetch >>>>> should be set for all styles on all platforms to >>>>> avoid accessing beyond heap. Prefetch instructions doc say that >>>>> they does not trap but we should be careful. >>>> >>>> I agree. >>>> >>>> That means we initialize _reserve_for_allocation_prefetch in a >>>> platform-independent way. So I think it would make sense >>>> to move that field to ThreadLocalAllocBuffer, as TLAB is the only >>>> user of that field and we don't support >>>> platform-independent initialization of Abstract_VM_Version. I did >>>> that in the updated webrev. >>>> >>>>>> >>>>>> The updated webrev does the following (in addition to fixing the >>>>>> original problem with AllocatePrefetchDistance): >>>>>> >>>>>> 1. Enforce AllocatePrefetchStyle = 3 if AllocatePrefetchInstr = 1 >>>>>> (i.e., BIS instructions are used for prefetching). As >>>>>> far as I understand, AllocatePrefetchStyle = 3 was added to >>>>>> support prefetching with BIS, so if BIS is enabled, we >>>>>> should use AllocatePrefetchStyle = 3. >>>>> >>>>> Correct - if BIS (AllocatePrefetchInstr = 1) is used we should >>>>> select AllocatePrefetchStyle = 3. >>>> >>>> OK. >>>> >>>>> But we should allow AllocatePrefetchStyle = 3 if normal prefetch >>>>> instructions (or other platforms) are used. >>>> >>>> OK, I've removed that restriction. >>>> >>>>> I think we should update comment in globals.hpp to say "generate >>>>> one prefetch per cache line" without saying BIS. >>>> >>>> OK. >>>> >>>>> >>>>> But I agree if BIS is not available we should not use BIS >>>>> AllocatePrefetchInstr = 1. >>>> >>>> OK. >>>> >>>>> >>>>>> >>>>>> 2. AllocatePrefetchStyle = 3 is SPARC-specific, so disallow it on >>>>>> non-SPARC platforms. >>>>> >>>>> It could be useful on other platforms since it does one access per >>>>> cache line. >>>> >>>> OK, let's keep it available then. >>>> >>>>> >>>>> >>>>>> >>>>>> 3. Enforce that AllocatePrefetchStepSize is multiple of 8 if >>>>>> AllocatePrefetchStyle is 3 (due to alignment requirements). >>>>> >>>>> That is correct since stxa requires at least 8 bytes alignment (as >>>>> stx). >>>> >>>> OK. >>>> >>>>> >>>>>> >>>>>> 3. Determine the number of lines to prefetch in the same way for >>>>>> all prefetch styles: >>>>>> lines = (prefecth instance allocation) ? >>>>>> AllocateInstancePrefetchLines : AllocatePrefetchLines >>>>> >>>>> Agree. >>>> >>>> OK. >>>> >>>>> >>>>>> >>>>>> Here is the updated webrev: >>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.01/ >>>>> >>>>> vm_version_sparc.cpp >>>>> AllocatePrefetchInstr = 0 should be set for all styles (not only >>>>> 1) when BIS is not available. >>>> >>>> OK. I think it is sufficient to do >>>> >>>> 52 if (!has_blk_init()) { >>>> 53 if (AllocatePrefetchInstr == 1) { >>>> 54 warning("BIS instructions required for AllocatePrefetchInstr 1 >>>> unavailable"); >>>> 55 FLAG_SET_DEFAULT(AllocatePrefetchInstr, 0); >>>> 56 } I hope I'm not missing anything here. Here is the updated webrev: >>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.02/ Testing: - >>>> JPRT (incl. TestOptionsWithRanges); - local testing on >>>> SPARC; - all hotspot tests with AllocaPrefetchStyle=2 on all >>>> platforms. Thank you! Best regards, Zoltan >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Testing: >>>>>> - JPRT (incl. TestOptionsWithRanges) >>>>>> - local testing on a SPARC machine. >>>>>> >>>>>> Thank you! >>>>>> >>>>>> Best regards, >>>>>> >>>>>> >>>>>> Zoltan >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 4/21/16 4:30 AM, Zolt?n Maj? wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> please review the patch for 8153340. >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153340 >>>>>>>> >>>>>>>> >>>>>>>> Problem: The VM crashes if AllocatePrefetchStyle==3 and >>>>>>>> AllocatePrefetchDistance==0. The crash happens due to the way >>>>>>>> the address for the first prefetch instruction is calculated [1]: >>>>>>>> >>>>>>>> If distance==0, cache_addr == old_eden_top. Then, cache_adr &= >>>>>>>> ~(AllocatePrefetchStepSize - 1) which can zero some of >>>>>>>> the bits of cache_adr. That result in accesses *before* the >>>>>>>> newly allocated object. >>>>>>>> >>>>>>>> >>>>>>>> Solution: Set lower limit of AllocatePrefetchDistance to >>>>>>>> AllocatePrefetchStepSize (for AllocatePrefetchStyle == 3). >>>>>>>> Unquarantine test. >>>>>>>> >>>>>>>> Webrev: >>>>>>>> http://cr.openjdk.java.net/~zmajo/8153340/webrev.00/ >>>>>>>> >>>>>>>> Testing: >>>>>>>> - JPRT (incl. TestOptionsWithRanges.java) >>>>>>>> - local testing on a SPARC machine. >>>>>>>> >>>>>>>> Thank you! >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> >>>>>>>> Zoltan >>>>>>>> >>>>>>>> [1] >>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/f27c00e6f6bf/src/share/vm/opto/macro.cpp#l1941 >>>>>> >>>> >> From rwestrel at redhat.com Fri Apr 29 08:49:34 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 29 Apr 2016 10:49:34 +0200 Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling analysis Message-ID: <5723201E.5080909@redhat.com> http://cr.openjdk.java.net/~roland/8155717/webrev.00/ Loop unrolling analysis can help on arm for smaller data types (bytes). For other data types, it drives more inlining which can also be beneficial. This also includes the change Michael ok'ed to shared code. Roland. From aph at redhat.com Fri Apr 29 08:53:11 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Apr 2016 09:53:11 +0100 Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling analysis In-Reply-To: <5723201E.5080909@redhat.com> References: <5723201E.5080909@redhat.com> Message-ID: <572320F7.9030608@redhat.com> On 29/04/16 09:49, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8155717/webrev.00/ > > Loop unrolling analysis can help on arm for smaller data types (bytes). > For other data types, it drives more inlining which can also be > beneficial. This also includes the change Michael ok'ed to shared code. OK, cool. Will Michael sponsor this, then? Thanks, Andrew. From rwestrel at redhat.com Fri Apr 29 08:56:15 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 29 Apr 2016 10:56:15 +0200 Subject: RFR(S): 8154943: AArch64: redundant address computation instructions with vectorization In-Reply-To: References: Message-ID: <572321AF.3000703@redhat.com> > Are there any platforms where this won't help? Yes. x86 needs it: operand indPosIndexScale(any_RegP reg, rRegI idx, immI2 scale) %{ constraint(ALLOC_IN_RC(ptr_reg)); predicate(n->in(3)->in(1)->as_Type()->type()->is_long()->_lo >= 0); match(AddP reg (LShiftL (ConvI2L idx) scale)); >From a quick look through ppc.ad, it seems it needs it to. > How about adding a Matcher function instead of a cpu-specific ifdef. Sounds like a good suggestion. Roland. From rwestrel at redhat.com Fri Apr 29 08:58:28 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 29 Apr 2016 10:58:28 +0200 Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling analysis In-Reply-To: <572320F7.9030608@redhat.com> References: <5723201E.5080909@redhat.com> <572320F7.9030608@redhat.com> Message-ID: <57232234.7020504@redhat.com> > OK, cool. Will Michael sponsor this, then? Thanks for reviewing it. I don't think Michael has that power. Roland. From nils.eliasson at oracle.com Fri Apr 29 12:31:27 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 29 Apr 2016 14:31:27 +0200 Subject: [9] RFR(S): 8142464: PlatformLoggerTest.java throws java.lang.RuntimeException: Logger test.logger.bar does not exist In-Reply-To: <5722A2B8.7000100@oracle.com> References: <5721FB8F.40207@oracle.com> <5722A2B8.7000100@oracle.com> Message-ID: <5723541F.2060708@oracle.com> Thank you Vladimir! //Nils On 2016-04-29 01:54, Vladimir Kozlov wrote: > Good. > > thanks, > Vladimir > > On 4/28/16 5:01 AM, Nils Eliasson wrote: >> Hi, >> >> Please review the fix of >> jdk/test/sun/util/logging/PlatformLoggerTest.java that has been >> failing intermittently in our nightlies. >> >> Summary: >> The test uses loggers that are accessible by weak refs from a >> LogManager. Since the test doesn't keep a strong ref to the loggers >> they may get collected during the test. >> >> Solution: >> Save loggers to a static field in the test class. >> >> Other: >> Also removing "@compile -XDignore.symbol.file" that is unnecessary >> after Jigsaw. The compile tag will force a compile even if the class >> already exists, wasting times during reruns. >> >> Testing: >> Running tests on all platforms. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8142464 >> webrev: http://cr.openjdk.java.net/~neliasso/8142464/webrev_jdk.01/ >> (JDK) >> >> Regards, >> Nils Eliasson From zoltan.majo at oracle.com Fri Apr 29 13:37:00 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 29 Apr 2016 15:37:00 +0200 Subject: [9] RFR(XS): 8155653: TestVectorUnalignedOffset.java not pushed with 8155612 In-Reply-To: <57226126.8010708@oracle.com> References: <57224C10.9060704@oracle.com> <57226126.8010708@oracle.com> Message-ID: <5723637C.4000602@oracle.com> Thank you, Vladimir! Best regards, Zoltan On 04/28/2016 09:14 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 4/28/16 10:44 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> The test TestVectorUnalignedOffset.java was originally included (and >> reviewed) with JDK-8155612: >> http://cr.openjdk.java.net/~roland/8155612/webrev.00/ >> >> I sponsored the changeset but, by accident, did not push the test. So >> I've filed the current bug to address that problem. >> >> If there are no objections, I'll push the test separately tomorrow >> (with Roland as a contributor). Here is the webrev: >> http://cr.openjdk.java.net/~zmajo/8155653/webrev.00/ >> >> Sorry for the noise. >> >> Best regards, >> >> >> Zoltan >> From roland.schatz at oracle.com Fri Apr 29 13:45:32 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Fri, 29 Apr 2016 15:45:32 +0200 Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI exception stubs Message-ID: <5723657C.6020804@oracle.com> Hi, Please review this small change: webrev: http://cr.openjdk.java.net/~rschatz/JDK-8155735/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8155735 This makes it possible to use the exception stubs without knowing about the hotspot-only concept of symbol pointers. Thanks, Roland From volker.simonis at gmail.com Fri Apr 29 13:53:20 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 29 Apr 2016 15:53:20 +0200 Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on PowerPC Message-ID: Hi Aleksey, while implementing the new intrinsics for jdk.internal.misc.Unsafe on PowerPC we encountered problems with the following weakCompareAndSwap tests: compiler/unsafe/JdkInternalMiscUnsafeAccessTestInt.java compiler/unsafe/JdkInternalMiscUnsafeAccessTestLong.java The problem is that a weakComapreAndSwap may spuriously fail, but the tests assume that it will always succeed. On Power we saw for example the following kind of failures: test JdkInternalMiscUnsafeAccessTestInt.testArray(): failure java.lang.AssertionError: weakCompareAndSwapRelease int expected [true] but found [false] at org.testng.Assert.fail(Assert.java:94) at org.testng.Assert.failNotEquals(Assert.java:494) at org.testng.Assert.assertEquals(Assert.java:123) at org.testng.Assert.assertEquals(Assert.java:286) at JdkInternalMiscUnsafeAccessTestInt.testAccess(JdkInternalMiscUnsafeAccessTestInt.java:269) at JdkInternalMiscUnsafeAccessTestInt.testArray(JdkInternalMiscUnsafeAccessTestInt.java:109) >From our understanding of the weakComapreAndSwap semantics it is however perfectly legal that the weakComapreAndSwap fails and the old values remain in place. We would propose to run the weakComapreAndSwap in a loop in the test until it succeeds and only check for the correct value in memory afterwards. We could also log the number of retry attempts until the weakComapreAndSwap succeeds into the .jtr file. As far as we saw, all the regression tests are single threaded. Are there any multi-threaded tests available which stress the new Unsafe intrinsics (i.e. in the way this was previously done by the Java Concurrency Torture tests which verified that only valid combinations of reads were observed). Regards, Martin & Volker From aleksey.shipilev at oracle.com Fri Apr 29 14:04:33 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 29 Apr 2016 17:04:33 +0300 Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on PowerPC In-Reply-To: References: Message-ID: <572369F1.3010306@oracle.com> Hi Volker, On 04/29/2016 04:53 PM, Volker Simonis wrote: > The problem is that a weakComapreAndSwap may spuriously fail, but the > tests assume that it will always succeed. On Power we saw for example > the following kind of failures: > > test JdkInternalMiscUnsafeAccessTestInt.testArray(): failure > java.lang.AssertionError: weakCompareAndSwapRelease int expected > [true] but found [false] > at org.testng.Assert.fail(Assert.java:94) > at org.testng.Assert.failNotEquals(Assert.java:494) > at org.testng.Assert.assertEquals(Assert.java:123) > at org.testng.Assert.assertEquals(Assert.java:286) > at JdkInternalMiscUnsafeAccessTestInt.testAccess(JdkInternalMiscUnsafeAccessTestInt.java:269) > at JdkInternalMiscUnsafeAccessTestInt.testArray(JdkInternalMiscUnsafeAccessTestInt.java:109) > > From our understanding of the weakComapreAndSwap semantics it is > however perfectly legal that the weakComapreAndSwap fails and the old > values remain in place. Ah, oops, indeed! The test is too strong, and it just happens to work on x86. AArch64 should fail the same, once intrinsics are there. jdk/test/java/lang/invoke/VarHandles should fail too, so it deserves a global testbug fix. Can you prototype and test a simple weakCAS/CAE test on the POWER? We can then propagate the test shape to other tests with: https://bugs.openjdk.java.net/browse/JDK-8155739 > We would propose to run the weakComapreAndSwap in a loop in the test > until it succeeds and only check for the correct value in memory > afterwards. We could also log the number of retry attempts until the > weakComapreAndSwap succeeds into the .jtr file. Yes, looping is better for the tests. > As far as we saw, all the regression tests are single threaded. Are > there any multi-threaded tests available which stress the new Unsafe > intrinsics (i.e. in the way this was previously done by the Java > Concurrency Torture tests which verified that only valid combinations > of reads were observed). Yes, we have the jcstress tests pending for VarHandles, but they are not yet committed to jcstress. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rwestrel at redhat.com Fri Apr 29 14:05:26 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 29 Apr 2016 16:05:26 +0200 Subject: [aarch64-port-dev ] RFR: 8155617: aarch64: ClearArray does not use DC ZVA In-Reply-To: <1461851388.10531.27.camel@mint> References: <1461851388.10531.27.camel@mint> Message-ID: Hi Ed, I can't start a debug VM since that change was pushed: # Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/share/vm/runtime/stubRoutines.cpp:344), pid=16911, tid=16912 # assert(s.body[i] == 32) failed: what? I think the problem is MacroAssembler::fill_words() moves the pointer past the last word that was written (i.e. there'a hole between the last written word and the value of the base). It passes again with this patch: diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp @@ -4754,12 +4754,12 @@ const int unroll = 8; // Number of stp instructions we'll unroll cbz(cnt, fini); - tbz(base, 3, skip); + tbz(cnt, 0, skip); str(value, Address(post(base, 8))); sub(cnt, cnt, 1); bind(skip); - andr(rscratch1, cnt, (unroll-1) * 2); + andr(rscratch1, cnt, unroll* 2 - 1); sub(cnt, cnt, rscratch1); add(base, base, rscratch1, Assembler::LSL, 3); adr(rscratch2, entry); @@ -4767,15 +4767,14 @@ br(rscratch2); bind(loop); - for (int i = -unroll; i < 0; i++) + add(base, base, unroll * 16); + sub(cnt, cnt, unroll * 2); + for (int i = -unroll; i < 0; i++) { stp(value, value, Address(base, i * 16)); + } bind(entry); - subs(cnt, cnt, unroll * 2); - add(base, base, unroll * 16); - br(Assembler::GE, loop); - - tbz(cnt, 0, fini); - str(value, Address(base, -unroll * 16)); + cbnz(cnt, loop); + bind(fini); } I've done very little testing with it but a few things felt strange in that code: - I don't understand why aligning the base at the beginning is necessary. Once the base is aligned there's no guarantee the cnt is an even number of words. so when we update the base with: add(base, base, rscratch1, Assembler::LSL, 3); base can be misaligned again. I changed it to test whether cnt is even so at least we don't have to worry about that anymore. - (unroll-1) * 2 seems like an incorrect mask. - then there's the problem of base being updated at the end of the loop that I mentioned above and that can lead to an incorrect base. - That part at the end: str(value, Address(base, -unroll * 16)); I don't understand. Anyway, as I said I've done very little testing so it's entirely possible my patch above is broken. Roland. From martin.doerr at sap.com Fri Apr 29 14:06:19 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 29 Apr 2016 14:06:19 +0000 Subject: RFR(S): 8155729: C2: Skip transformation of LoadConP for heap-based compressed oops Message-ID: <41d17a7145394a0aa7e1ebcd890ccd73@DEWDFE13DE14.global.corp.sap> Hi, I have opened a new bug for the proposal which was discussed in thread 8154826. The summary is: C2's final graph reshaping performs the following transformation: Original pattern: LoadConP + Storage access Transformed pattern: LoadConN + DecodeN heap-based + Storage access This seems to be fine for simpler compressed oops mode. It also seems to be fine on x86 which can match the decoding into the operand of the Storage access instruction. Other platforms should better skip the transformation: -PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded. -SPARC can use that as well. -aarch64: LoadConN+DecodeN has a higher latency than LoadConP. We can always skip the transformation in heap-based compressed klass mode. Matching DecodeNKlass as operand is currently not implemented. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8155729_LoadConP/webrev.00/ Please review. I will also need a sponsor, please. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Fri Apr 29 14:09:38 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 29 Apr 2016 14:09:38 +0000 Subject: RFR(XS): 8155738: C2: fix frame_complete_offset Message-ID: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap> Hi, In C2, frame_complete_offset is set wrong. It's also called during scratch_emit_size, which happens at least in the debug build. I also added the missing call on ppc. Please review this small fix. I please need a sponsor: http://cr.openjdk.java.net/~goetz/wr16/8155738-frameCmplt/webrev.00/ Thanks and best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Fri Apr 29 14:46:18 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 29 Apr 2016 14:46:18 +0000 Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on PowerPC In-Reply-To: <572369F1.3010306@oracle.com> References: <572369F1.3010306@oracle.com> Message-ID: <35a9d588d5eb41e5ae017938de789f3c@DEWDFE13DE14.global.corp.sap> Hi Aleksey, I have used the following test block on ppc64le. More than 2 attempts in the single-threaded test were never observed. So it would be interesting to report it somehow if that happens. { int attempts = 0; boolean r; do { attempts++; r = UNSAFE.weakCompareAndSwapInt(base, offset, 1, 2); } while (!r && attempts < 10); if (attempts > 2) { reportMessage(attempts, "weakCompareAndSwap int 1 -> 2"); } int x = UNSAFE.getInt(base, offset); assertEquals(x, 2, "weakCompareAndSwap int value 1 -> 2"); } Is this what you mean by simple weakCAS test prototype? Best regards, Martin -----Original Message----- From: Aleksey Shipilev [mailto:aleksey.shipilev at oracle.com] Sent: Freitag, 29. April 2016 16:05 To: Volker Simonis ; hotspot compiler Cc: Doerr, Martin Subject: Re: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on PowerPC * PGP Signature not checked Hi Volker, On 04/29/2016 04:53 PM, Volker Simonis wrote: > The problem is that a weakComapreAndSwap may spuriously fail, but the > tests assume that it will always succeed. On Power we saw for example > the following kind of failures: > > test JdkInternalMiscUnsafeAccessTestInt.testArray(): failure > java.lang.AssertionError: weakCompareAndSwapRelease int expected > [true] but found [false] > at org.testng.Assert.fail(Assert.java:94) > at org.testng.Assert.failNotEquals(Assert.java:494) > at org.testng.Assert.assertEquals(Assert.java:123) > at org.testng.Assert.assertEquals(Assert.java:286) > at JdkInternalMiscUnsafeAccessTestInt.testAccess(JdkInternalMiscUnsafeAccessTestInt.java:269) > at JdkInternalMiscUnsafeAccessTestInt.testArray(JdkInternalMiscUnsafeAccessTestInt.java:109) > > From our understanding of the weakComapreAndSwap semantics it is > however perfectly legal that the weakComapreAndSwap fails and the old > values remain in place. Ah, oops, indeed! The test is too strong, and it just happens to work on x86. AArch64 should fail the same, once intrinsics are there. jdk/test/java/lang/invoke/VarHandles should fail too, so it deserves a global testbug fix. Can you prototype and test a simple weakCAS/CAE test on the POWER? We can then propagate the test shape to other tests with: https://bugs.openjdk.java.net/browse/JDK-8155739 > We would propose to run the weakComapreAndSwap in a loop in the test > until it succeeds and only check for the correct value in memory > afterwards. We could also log the number of retry attempts until the > weakComapreAndSwap succeeds into the .jtr file. Yes, looping is better for the tests. > As far as we saw, all the regression tests are single threaded. Are > there any multi-threaded tests available which stress the new Unsafe > intrinsics (i.e. in the way this was previously done by the Java > Concurrency Torture tests which verified that only valid combinations > of reads were observed). Yes, we have the jcstress tests pending for VarHandles, but they are not yet committed to jcstress. Thanks, -Aleksey * Signature checking is off by policy From aph at redhat.com Fri Apr 29 14:49:00 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 29 Apr 2016 15:49:00 +0100 Subject: Problems with compiler/unsafe/JdkInternalMiscUnsafeAccessTest on PowerPC In-Reply-To: <35a9d588d5eb41e5ae017938de789f3c@DEWDFE13DE14.global.corp.sap> References: <572369F1.3010306@oracle.com> <35a9d588d5eb41e5ae017938de789f3c@DEWDFE13DE14.global.corp.sap> Message-ID: <5723745C.9000009@redhat.com> On 04/29/2016 03:46 PM, Doerr, Martin wrote: > I have used the following test block on ppc64le. More than 2 > attempts in the single-threaded test were never observed. So it > would be interesting to report it somehow if that happens. That would be the sort of thing that jcstress reports as INTERESTING. But it's not a failure. Andrew. From edward.nevill at gmail.com Fri Apr 29 15:30:33 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 29 Apr 2016 16:30:33 +0100 Subject: [aarch64-port-dev ] RFR: 8155617: aarch64: ClearArray does not use DC ZVA In-Reply-To: References: <1461851388.10531.27.camel@mint> Message-ID: <1461943833.12456.16.camel@mint> Hi Roland, On Fri, 2016-04-29 at 16:05 +0200, Roland Westrelin wrote: > Hi Ed, > > I can't start a debug VM since that change was pushed: > > # Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/share/vm/runtime/stubRoutines.cpp:344), pid=16911, tid=16912 > # assert(s.body[i] == 32) failed: what? > > I think the problem is MacroAssembler::fill_words() moves the pointer > past the last word that was written (i.e. there'a hole between the last > written word and the value of the base). It passes again with this > patch: Sorry, I missed the post-condition that base should point to the end. Could you please try the following patch diff -r e118111d4433 src/cpu/aarch64/vm/macroAssembler_aarch64.cpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp Fri Apr 29 14:32:19 2016 +0200 +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp Fri Apr 29 08:08:47 2016 -0700 @@ -4767,15 +4767,15 @@ br(rscratch2); bind(loop); + add(base, base, unroll * 16); for (int i = -unroll; i < 0; i++) stp(value, value, Address(base, i * 16)); bind(entry); subs(cnt, cnt, unroll * 2); - add(base, base, unroll * 16); br(Assembler::GE, loop); tbz(cnt, 0, fini); - str(value, Address(base, -unroll * 16)); + str(value, Address(post(base, 8))); bind(fini); } I believe the only thing wrong with the code is that it exits without the base being updated as expected. > > diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp > --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp > +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp > @@ -4754,12 +4754,12 @@ > const int unroll = 8; // Number of stp instructions we'll unroll > > cbz(cnt, fini); > - tbz(base, 3, skip); > + tbz(cnt, 0, skip); > str(value, Address(post(base, 8))); > sub(cnt, cnt, 1); > bind(skip); The intention here is to align the base, not ensure the count is even. An odd count is handled by the final store. > > - andr(rscratch1, cnt, (unroll-1) * 2); > + andr(rscratch1, cnt, unroll* 2 - 1); The mask (unroll-1) * 2 ensures rscratch1 is even. With an unroll value of 8 the mask will be 0xe giving a count of the even no of values to be stored by the jump into the loop. > sub(cnt, cnt, rscratch1); > add(base, base, rscratch1, Assembler::LSL, 3); > adr(rscratch2, entry); > @@ -4767,15 +4767,14 @@ > br(rscratch2); > > bind(loop); > - for (int i = -unroll; i < 0; i++) > + add(base, base, unroll * 16); > + sub(cnt, cnt, unroll * 2); > + for (int i = -unroll; i < 0; i++) { > stp(value, value, Address(base, i * 16)); > + } > bind(entry); > - subs(cnt, cnt, unroll * 2); > - add(base, base, unroll * 16); > - br(Assembler::GE, loop); I need to do subs / b GE here because cnt may be odd. Therefore I cannot use cbnz cnt, ... because the count may never become zero. Moving the update of the base to the head of the loop is correct as that ensures the base points correctly to the end on exit. > - > - tbz(cnt, 0, fini); > - str(value, Address(base, -unroll * 16)); > + cbnz(cnt, loop); > + > bind(fini); > } > > > I've done very little testing with it but a few things felt strange in > that code: > > - I don't understand why aligning the base at the beginning is > necessary. Once the base is aligned there's no guarantee the cnt is an > even number of words. so when we update the base with: > > add(base, base, rscratch1, Assembler::LSL, 3); Yes, but the mask (unroll-1) * 2 ensures rscratch1 is even. > > base can be misaligned again. I changed it to test whether cnt is even > so at least we don't have to worry about that anymore. > > - (unroll-1) * 2 seems like an incorrect mask. > > - then there's the problem of base being updated at the end of the loop > that I mentioned above and that can lead to an incorrect base. Yes, sorry I completely missed that postcondition. > > - That part at the end: > str(value, Address(base, -unroll * 16)); Because I did the update of the base at the end, base points unroll * 16 beyond the end. So if there is an odd word at the end it is stored at offset -unroll * 16. But updating the base at the head of the loop ensures it is pointing to the end on exit. Then if there is an odd word at the end we can use post(base, 8) to store the odd word and ensure the base is updated. > > I don't understand. > > Anyway, as I said I've done very little testing so it's entirely > possible my patch above is broken. Thanks for finding this and sorry again. I'll give the above patch a good testing and submit it for RFR. If you could also try it that would be appreciated. All the best, Ed. From tom.rodriguez at oracle.com Fri Apr 29 15:27:39 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 29 Apr 2016 08:27:39 -0700 Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI exception stubs In-Reply-To: <5723657C.6020804@oracle.com> References: <5723657C.6020804@oracle.com> Message-ID: > On Apr 29, 2016, at 6:45 AM, Roland Schatz wrote: > > Hi, > > Please review this small change: > > webrev: http://cr.openjdk.java.net/~rschatz/JDK-8155735/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8155735 > > This makes it possible to use the exception stubs without knowing about the hotspot-only concept of symbol pointers. I think you need to handle the case where the symbol doesn?t exist. lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later. So you either need to throw an exception here or you should use TempNewSymbol symbol = SymbolTable::new_symbol and let new_exception throw an exception if the class named by symbol doesn?t exist. tom > > Thanks, > Roland From dmitrij.pochepko at oracle.com Fri Apr 29 15:29:34 2016 From: dmitrij.pochepko at oracle.com (dmitrij pochepko) Date: Fri, 29 Apr 2016 18:29:34 +0300 Subject: RFR(XS): 8155163 - JVMCI: MethodHandleAccessProvider.resolveInvokeBasicTarget implementation doesn't match javadoc In-Reply-To: References: <57221E11.4000406@oracle.com> Message-ID: <57237DDE.5000607@oracle.com> Thank you! > Looks good. > >> On Apr 28, 2016, at 4:28 AM, Dmitrij Pochepko >> > wrote: >> >> Hi, >> >> please review fix for JDK-8155163 - JVMCI: >> MethodHandleAccessProvider.resolveInvokeBasicTarget implementation >> doesn't match javadoc >> >> MethodHandleAccessProvider.resolveInvokeBasicTarget implementation >> throws NPE in case 1st parameter doesn't represent MethodHandle, but >> according to javadoc it should return null in such case. So, a small >> fix should be applied to match javadoc. >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8155163 >> Webrev: http://cr.openjdk.java.net/~dpochepk/8155163/webrev.01/ >> >> I've tested this change on linux_x64 via jprt. >> >> Thanks, >> Dmitrij > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Fri Apr 29 15:49:12 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 29 Apr 2016 17:49:12 +0200 Subject: RFR(S): 8154943: AArch64: redundant address computation instructions with vectorization In-Reply-To: <572267D5.8020605@oracle.com> References: <572267D5.8020605@oracle.com> Message-ID: <57238278.5030607@redhat.com> > node.cpp change is good. > > compile.cpp I understand when you replace "similar" (same in(1)) node > with it but it is not clear that you also processing users (whole > following chain) to remove similar nodes. Add comment. > I think the check "!(k->Opcode() == Op_ConvI2L || ... " (and use > 'continue' instead of 'break') should be done when you push a node on > the list wq.push(u). Thanks for the review, Vladimir. What about this? http://cr.openjdk.java.net/~roland/8154943/webrev.01/ Roland. From roland.schatz at oracle.com Fri Apr 29 15:55:26 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Fri, 29 Apr 2016 17:55:26 +0200 Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI exception stubs In-Reply-To: References: <5723657C.6020804@oracle.com> Message-ID: <572383EE.5060608@oracle.com> On 04/29/2016 05:27 PM, Tom Rodriguez wrote: > I think you need to handle the case where the symbol doesn?t exist. lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later. So you either need to throw an exception here or you should use > > TempNewSymbol symbol = SymbolTable::new_symbol SymbolTable::lookup should add it if it doesn't exist. > and let new_exception throw an exception if the class named by symbol doesn?t exist. throw_and_post_jvmti_exception will throw a `NoClassDefFoundError` when the exception class doesn't exist. Just to be sure, I tested this using graal. Temporarily passing in a wrong class name into the stub results in: > java.lang.NoClassDefFoundError: bla/java/lang/NullPointerException > at > com.oracle.graal.replacements.test.CompiledNullPointerExceptionTest.testSnippet(CompiledNullPointerExceptionTest.java:93) > at jdk.vm.ci.hotspot.CompilerToVM.executeInstalledCode(Native > Method) > at > jdk.vm.ci.hotspot.HotSpotNmethod.executeVarargs(HotSpotNmethod.java:100) > at > com.oracle.graal.compiler.test.GraalCompilerTest.executeActual(GraalCompilerTest.java:578) > [...] - Roland From rwestrel at redhat.com Fri Apr 29 15:56:11 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 29 Apr 2016 17:56:11 +0200 Subject: [aarch64-port-dev ] RFR: 8155617: aarch64: ClearArray does not use DC ZVA In-Reply-To: <1461943833.12456.16.camel@mint> References: <1461851388.10531.27.camel@mint> <1461943833.12456.16.camel@mint> Message-ID: <5723841B.3030607@redhat.com> Hi Ed, > Could you please try the following patch I tried your patch and the debug VM starts ok now. Roland. From dmitrij.pochepko at oracle.com Fri Apr 29 16:00:52 2016 From: dmitrij.pochepko at oracle.com (dmitrij pochepko) Date: Fri, 29 Apr 2016 19:00:52 +0300 Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case In-Reply-To: <1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com> References: <57223CC2.6000309@oracle.com> <1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com> Message-ID: <57238534.20405@oracle.com> Hi, I've updated javadoc according to comment. Please take a look at updated version: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.02/ Thanks, Dmitrij > + * @throws IllegalArgumentException if {@code kind} is {@code null}, > {@code kind} is {@link JavaKind#Void} or > + * not {@linkplain JavaKind#isPrimitive() primitive} kind > I don?t think you have to repeat {@code kind}. > >> On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko >> > wrote: >> >> Hi, >> >> please review changes for 8155244 - JVMCI: >> MemoryAccessProvider.readUnsafeConstant javadoc should be updated for >> null JavaKind case >> >> MemoryAccessProvider.readUnsafeConstant method implementation throws >> NullPointerException in case JavaKind is null. Such behavior wasn't >> specified in javadoc. >> So, a small fix contains logic to throw IllegalArgumentException and >> respective javadoc update. >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8155244 >> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ >> >> >> I've tested fix on linux_x64 via jprt. >> >> Thanks, >> Dmitrij > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Apr 29 16:06:39 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Apr 2016 09:06:39 -0700 Subject: RFR(S): 8154943: AArch64: redundant address computation instructions with vectorization In-Reply-To: <57238278.5030607@redhat.com> References: <572267D5.8020605@oracle.com> <57238278.5030607@redhat.com> Message-ID: <71b64964-0682-2deb-c0d4-0102d7a8a939@oracle.com> Yes, this looks good. Thanks, Vladimir On 4/29/16 8:49 AM, Roland Westrelin wrote: >> node.cpp change is good. >> >> compile.cpp I understand when you replace "similar" (same in(1)) node >> with it but it is not clear that you also processing users (whole >> following chain) to remove similar nodes. Add comment. >> I think the check "!(k->Opcode() == Op_ConvI2L || ... " (and use >> 'continue' instead of 'break') should be done when you push a node on >> the list wq.push(u). > > Thanks for the review, Vladimir. What about this? > > http://cr.openjdk.java.net/~roland/8154943/webrev.01/ > > Roland. > From vladimir.kozlov at oracle.com Fri Apr 29 16:09:00 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Apr 2016 09:09:00 -0700 Subject: RFR(XS): 8155717: Aarch64: enable loop superword's unrolling analysis In-Reply-To: <57232234.7020504@redhat.com> References: <5723201E.5080909@redhat.com> <572320F7.9030608@redhat.com> <57232234.7020504@redhat.com> Message-ID: <5caad142-53f6-ed36-ac4e-e006c441bdc4@oracle.com> Changes looks good. I will run tests with it and will sponsor. Thanks, Vladimir On 4/29/16 1:58 AM, Roland Westrelin wrote: > >> OK, cool. Will Michael sponsor this, then? > > Thanks for reviewing it. I don't think Michael has that power. > > Roland. > From vladimir.kozlov at oracle.com Fri Apr 29 16:55:57 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Apr 2016 09:55:57 -0700 Subject: RFR(XS): 8155738: C2: fix frame_complete_offset In-Reply-To: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap> References: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap> Message-ID: <5b697841-5471-cee3-7136-7a334ee2ca25@oracle.com> "In the debug build the call at output.cpp:1831 build if (n->size(_regalloc) < (current_offset-instr_offset)) overwrites the proper value." So the fix is to not reset frame_complete_offset (Compile::_code_offsets._values[Frame_Complete]) when node is emitted into scratch buffer. Changes looks good. Thanks, Vladimir On 4/29/16 7:09 AM, Lindenmaier, Goetz wrote: > Hi, > > > > In C2, frame_complete_offset is set wrong. It?s also called during scratch_emit_size, > > which happens at least in the debug build. > > I also added the missing call on ppc. > > > > Please review this small fix. I please need a sponsor: > > http://cr.openjdk.java.net/~goetz/wr16/8155738-frameCmplt/webrev.00/ > > > > Thanks and best regards, > > Goetz. > From vladimir.kozlov at oracle.com Fri Apr 29 17:05:14 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Apr 2016 10:05:14 -0700 Subject: RFR(S): 8155729: C2: Skip transformation of LoadConP for heap-based compressed oops In-Reply-To: <41d17a7145394a0aa7e1ebcd890ccd73@DEWDFE13DE14.global.corp.sap> References: <41d17a7145394a0aa7e1ebcd890ccd73@DEWDFE13DE14.global.corp.sap> Message-ID: <9f7bd5c7-1f40-5129-ca80-3d83e035c4e0@oracle.com> Looks good to me. We need corresponding changes in our closed code too. Thanks, Vladimir On 4/29/16 7:06 AM, Doerr, Martin wrote: > Hi, > > > > I have opened a new bug for the proposal which was discussed in thread 8154826. > > The summary is: > > > > C2's final graph reshaping performs the following transformation: > > Original pattern: LoadConP + Storage access > > Transformed pattern: LoadConN + DecodeN heap-based + Storage access > > > > This seems to be fine for simpler compressed oops mode. It also seems to be fine on x86 which can match the decoding > into the operand of the Storage access instruction. > > > > Other platforms should better skip the transformation: > > -PPC can load the ConP from constant pool. Decoding takes a lot of instructions, because the heap base needs to get loaded. > > -SPARC can use that as well. > > -aarch64: LoadConN+DecodeN has a higher latency than LoadConP. > > > > We can always skip the transformation in heap-based compressed klass mode. Matching DecodeNKlass as operand is currently > not implemented. > > > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8155729_LoadConP/webrev.00/ > > > > Please review. I will also need a sponsor, please. > > > > Best regards, > > Martin > > > From christian.thalinger at oracle.com Fri Apr 29 18:51:04 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 29 Apr 2016 08:51:04 -1000 Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case In-Reply-To: <57238534.20405@oracle.com> References: <57223CC2.6000309@oracle.com> <1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com> <57238534.20405@oracle.com> Message-ID: <15DCAD37-3E4D-456E-A569-CB91070B4B69@oracle.com> Looks good. > On Apr 29, 2016, at 6:00 AM, dmitrij pochepko wrote: > > Hi, > I've updated javadoc according to comment. > > Please take a look at updated version: > > http://cr.openjdk.java.net/~dpochepk/8155244/webrev.02/ > > Thanks, > Dmitrij > >> + * @throws IllegalArgumentException if {@code kind} is {@code null}, {@code kind} is {@link JavaKind#Void} or >> + * not {@linkplain JavaKind#isPrimitive() primitive} kind >> I don?t think you have to repeat {@code kind}. >> >>> On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko > wrote: >>> >>> Hi, >>> >>> please review changes for 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case >>> >>> MemoryAccessProvider.readUnsafeConstant method implementation throws NullPointerException in case JavaKind is null. Such behavior wasn't specified in javadoc. >>> So, a small fix contains logic to throw IllegalArgumentException and respective javadoc update. >>> >>> CR: https://bugs.openjdk.java.net/browse/JDK-8155244 >>> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ >>> >>> I've tested fix on linux_x64 via jprt. >>> >>> Thanks, >>> Dmitrij >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Apr 29 19:29:14 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 29 Apr 2016 09:29:14 -1000 Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI exception stubs In-Reply-To: <572383EE.5060608@oracle.com> References: <5723657C.6020804@oracle.com> <572383EE.5060608@oracle.com> Message-ID: <8F968C97-90F0-4341-9D0F-B8A5A84748B8@oracle.com> > On Apr 29, 2016, at 5:55 AM, Roland Schatz wrote: > > On 04/29/2016 05:27 PM, Tom Rodriguez wrote: >> I think you need to handle the case where the symbol doesn?t exist. lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later. So you either need to throw an exception here or you should use >> >> TempNewSymbol symbol = SymbolTable::new_symbol > > SymbolTable::lookup should add it if it doesn't exist. new_symbol actually calls lookup but I think you?d still want the TempNewSymbol. Runtime people would know better. > >> and let new_exception throw an exception if the class named by symbol doesn?t exist. > > throw_and_post_jvmti_exception will throw a `NoClassDefFoundError` when the exception class doesn't exist. > > > Just to be sure, I tested this using graal. Temporarily passing in a wrong class name into the stub results in: >> java.lang.NoClassDefFoundError: bla/java/lang/NullPointerException >> at com.oracle.graal.replacements.test.CompiledNullPointerExceptionTest.testSnippet(CompiledNullPointerExceptionTest.java:93) >> at jdk.vm.ci.hotspot.CompilerToVM.executeInstalledCode(Native Method) >> at jdk.vm.ci.hotspot.HotSpotNmethod.executeVarargs(HotSpotNmethod.java:100) >> at com.oracle.graal.compiler.test.GraalCompilerTest.executeActual(GraalCompilerTest.java:578) >> [...] > > > - Roland > From dmitrij.pochepko at oracle.com Fri Apr 29 19:49:05 2016 From: dmitrij.pochepko at oracle.com (dmitrij pochepko) Date: Fri, 29 Apr 2016 22:49:05 +0300 Subject: RFR(XS): 8155244 - JVMCI: MemoryAccessProvider.readUnsafeConstant javadoc should be updated for null JavaKind case In-Reply-To: <15DCAD37-3E4D-456E-A569-CB91070B4B69@oracle.com> References: <57223CC2.6000309@oracle.com> <1992029D-ED65-47CE-8A5A-7B7B2EFDCA75@oracle.com> <57238534.20405@oracle.com> <15DCAD37-3E4D-456E-A569-CB91070B4B69@oracle.com> Message-ID: <5723BAB1.8000904@oracle.com> Thank you! On 29.04.2016 21:51, Christian Thalinger wrote: > Looks good. > >> On Apr 29, 2016, at 6:00 AM, dmitrij pochepko >> > wrote: >> >> Hi, >> I've updated javadoc according to comment. >> >> Please take a look at updated version: >> >> http://cr.openjdk.java.net/~dpochepk/8155244/webrev.02/ >> >> Thanks, >> Dmitrij >> >>> + * @throws IllegalArgumentException if {@code kind} is {@code >>> null}, {@code kind} is {@link JavaKind#Void} or >>> + * not {@linkplain JavaKind#isPrimitive() primitive} kind >>> I don?t think you have to repeat {@code kind}. >>> >>>> On Apr 28, 2016, at 6:39 AM, Dmitrij Pochepko >>>> > >>>> wrote: >>>> >>>> Hi, >>>> >>>> please review changes for 8155244 - JVMCI: >>>> MemoryAccessProvider.readUnsafeConstant javadoc should be updated >>>> for null JavaKind case >>>> >>>> MemoryAccessProvider.readUnsafeConstant method implementation >>>> throws NullPointerException in case JavaKind is null. Such behavior >>>> wasn't specified in javadoc. >>>> So, a small fix contains logic to throw IllegalArgumentException >>>> and respective javadoc update. >>>> >>>> CR: https://bugs.openjdk.java.net/browse/JDK-8155244 >>>> webrev: http://cr.openjdk.java.net/~dpochepk/8155244/webrev.01/ >>>> >>>> >>>> I've tested fix on linux_x64 via jprt. >>>> >>>> Thanks, >>>> Dmitrij >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.rodriguez at oracle.com Fri Apr 29 19:49:22 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 29 Apr 2016 12:49:22 -0700 Subject: RFR(XS): 8155735: use strings instead of Symbol* in JVMCI exception stubs In-Reply-To: <572383EE.5060608@oracle.com> References: <5723657C.6020804@oracle.com> <572383EE.5060608@oracle.com> Message-ID: > On Apr 29, 2016, at 8:55 AM, Roland Schatz wrote: > > On 04/29/2016 05:27 PM, Tom Rodriguez wrote: >> I think you need to handle the case where the symbol doesn?t exist. lookup_symbol will return NULL if there?s no currently loaded symbol with that name which will lead to a SEGV later. So you either need to throw an exception here or you should use >> >> TempNewSymbol symbol = SymbolTable::new_symbol > > SymbolTable::lookup should add it if it doesn't exist. Sorry I was looking at the comment below lookup instead of above it. It?s very odd that both of these exist though and new_symbol just forwards to lookup. static Symbol* new_symbol(const char* utf8_buffer, int length, TRAPS) { static Symbol* lookup(const char* name, int len, TRAPS); I do think you want the TempNewSymbol though. > >> and let new_exception throw an exception if the class named by symbol doesn?t exist. > > throw_and_post_jvmti_exception will throw a `NoClassDefFoundError` when the exception class doesn't exist. > > > Just to be sure, I tested this using graal. Temporarily passing in a wrong class name into the stub results in: >> java.lang.NoClassDefFoundError: bla/java/lang/NullPointerException >> at com.oracle.graal.replacements.test.CompiledNullPointerExceptionTest.testSnippet(CompiledNullPointerExceptionTest.java:93) >> at jdk.vm.ci.hotspot.CompilerToVM.executeInstalledCode(Native Method) >> at jdk.vm.ci.hotspot.HotSpotNmethod.executeVarargs(HotSpotNmethod.java:100) >> at com.oracle.graal.compiler.test.GraalCompilerTest.executeActual(GraalCompilerTest.java:578) >> [?] Thanks for checking. tom > > > - Roland > From dean.long at oracle.com Fri Apr 29 20:51:15 2016 From: dean.long at oracle.com (Dean Long) Date: Fri, 29 Apr 2016 13:51:15 -0700 Subject: RFR(XS): 8155738: C2: fix frame_complete_offset In-Reply-To: <5b697841-5471-cee3-7136-7a334ee2ca25@oracle.com> References: <2768884fbe2647e3a2e6b82da36b4fe6@DEWDFE13DE09.global.corp.sap> <5b697841-5471-cee3-7136-7a334ee2ca25@oracle.com> Message-ID: Looks good. I will closed JDK-8008415 as a duplicate! dl On 4/29/2016 9:55 AM, Vladimir Kozlov wrote: > "In the debug build the call at output.cpp:1831 build if > (n->size(_regalloc) < (current_offset-instr_offset)) overwrites the > proper value." > > So the fix is to not reset frame_complete_offset > (Compile::_code_offsets._values[Frame_Complete]) when node is emitted > into scratch buffer. > > Changes looks good. > > Thanks, > Vladimir > > On 4/29/16 7:09 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> >> >> In C2, frame_complete_offset is set wrong. It?s also called during >> scratch_emit_size, >> >> which happens at least in the debug build. >> >> I also added the missing call on ppc. >> >> >> >> Please review this small fix. I please need a sponsor: >> >> http://cr.openjdk.java.net/~goetz/wr16/8155738-frameCmplt/webrev.00/ >> >> >> >> Thanks and best regards, >> >> Goetz. >> From tom.rodriguez at oracle.com Fri Apr 29 22:04:21 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 29 Apr 2016 15:04:21 -0700 Subject: RFR 8155771: [JVMCI] expose JVM_ACC_IS_CLONEABLE_FAST Message-ID: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com> http://cr.openjdk.java.net/~never/8155771/webrev The new flag JVM_ACC_IS_CLONEABLE_FAST controls whether an object can be cloned by simply performing an allocation and copying the fields. Previously all Cloneable objects were assumed to be cloneable in this way. Add ResolvedJavaType.isAllocationCloneable to expose this notion. Tested with Graal. tom From aleksey.shipilev at oracle.com Fri Apr 29 22:12:29 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Sat, 30 Apr 2016 01:12:29 +0300 Subject: RFR (M) 8155739: [TESTBUG] VarHandles/Unsafe tests for weakCAS should allow spurious failures Message-ID: <5723DC4D.8090507@oracle.com> Hi, I would like to fix a simple testbug in our weakCompareAndSet VarHandles and Unsafe intrinsics tests. weakCompareAndSet is spec-ed to allow spurious failures, but current tests do not allow that. This blocks development and testing on non-x86 platforms. Bug: https://bugs.openjdk.java.net/browse/JDK-8155739 Webrevs: http://cr.openjdk.java.net/~shade/8155739/webrev.hs.00/ http://cr.openjdk.java.net/~shade/8155739/webrev.jdk.00/ The tests are auto-generated, and the substantiative changes are in *.template files. I also removed obsolete generate-unsafe-tests.sh. I would like to push through hs-comp to expose this to Power and AArch64 folks early. Testing: x86_64, jdk:java/lang/invoke/VarHandle, hotspot:compiler/unsafe Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From christian.thalinger at oracle.com Fri Apr 29 23:10:52 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 29 Apr 2016 13:10:52 -1000 Subject: RFR 8155771: [JVMCI] expose JVM_ACC_IS_CLONEABLE_FAST In-Reply-To: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com> References: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com> Message-ID: <0C76F8C6-012E-48F4-934D-4BFB9DC241A3@oracle.com> > On Apr 29, 2016, at 12:04 PM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8155771/webrev + boolean isAllocationCloneable(); That means ?allocation-cloneable?, not ?allocation is cloneable?, right? It could be interpreted both ways but the change is good. > > The new flag JVM_ACC_IS_CLONEABLE_FAST controls whether an object can be cloned by simply performing an allocation and copying the fields. Previously all Cloneable objects were assumed to be cloneable in this way. Add ResolvedJavaType.isAllocationCloneable to expose this notion. Tested with Graal. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Fri Apr 29 23:11:49 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 30 Apr 2016 02:11:49 +0300 Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM) failed: cannot alias-analyze an untyped ptr Message-ID: <5723EA35.5020809@oracle.com> http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8155635 SplitIf transformation can produce untyped pointers when slitting AddP nodes for unsafe accesses through a Phi which merges non-null & null values: AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull)) Proposed fix is to enable oop pointer to raw pointer conversion for absolute addresses. I also experimented with blocking SplitIf transformation is such cases, but the transformation seems viable and considerably simplifies the graph: X-shaped control flow is untangled by eliminating redundant and the transformation sharpens types on both branches. I checked specifically how Phi merges raw & oop pointers after the split and it works fine. Testing: failing test, JPRT, RBT (hs-tier0-comp.js). Thanks! Best regards, Vladimir Ivanov PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to write a simplified test case which triggers SplitIf transformation. From tom.rodriguez at oracle.com Fri Apr 29 23:36:35 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 29 Apr 2016 16:36:35 -0700 Subject: RFR 8155771: [JVMCI] expose JVM_ACC_IS_CLONEABLE_FAST In-Reply-To: <0C76F8C6-012E-48F4-934D-4BFB9DC241A3@oracle.com> References: <579B4981-C0D1-4177-9DED-8BCD6C233395@oracle.com> <0C76F8C6-012E-48F4-934D-4BFB9DC241A3@oracle.com> Message-ID: <582A1B37-8F4C-4151-8E0C-D77E73236C93@oracle.com> > On Apr 29, 2016, at 4:10 PM, Christian Thalinger wrote: > > >> On Apr 29, 2016, at 12:04 PM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8155771/webrev + boolean isAllocationCloneable(); > That means ?allocation-cloneable?, not ?allocation is cloneable?, right? It could be interpreted both ways but the change is good. I knew what I meant so I didn?t see alternative readings. Is there something better? canBeClonedWithAllocation :) tom > >> >> The new flag JVM_ACC_IS_CLONEABLE_FAST controls whether an object can be cloned by simply performing an allocation and copying the fields. Previously all Cloneable objects were assumed to be cloneable in this way. Add ResolvedJavaType.isAllocationCloneable to expose this notion. Tested with Graal. >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Apr 29 23:39:13 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Apr 2016 16:39:13 -0700 Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM) failed: cannot alias-analyze an untyped ptr In-Reply-To: <5723EA35.5020809@oracle.com> References: <5723EA35.5020809@oracle.com> Message-ID: <99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com> I am not comfortable with this fix. You may replace in(Base) != NULL with TOP. Also it should not be RAW pointer (TOP as Base) if it is created by graph transformation from normal oop pointer. I think we should track which pointers are really RAW when creating them. Can you explain why we have such graph shape where we access memory after a merge point and on one merged path has NULL as pointer to object. There should be NULL check after merge before memory access in such case. Thanks, Vladimir K On 4/29/16 4:11 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8155635 > > SplitIf transformation can produce untyped pointers when slitting AddP nodes for unsafe accesses through a Phi which > merges non-null & null values: > AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull)) > > Proposed fix is to enable oop pointer to raw pointer conversion for absolute addresses. > > I also experimented with blocking SplitIf transformation is such cases, but the transformation seems viable and > considerably simplifies the graph: X-shaped control flow is untangled by eliminating redundant and the transformation > sharpens types on both branches. > > I checked specifically how Phi merges raw & oop pointers after the split and it works fine. > > Testing: failing test, JPRT, RBT (hs-tier0-comp.js). > > Thanks! > > Best regards, > Vladimir Ivanov > > PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to write a simplified test case which triggers > SplitIf transformation. From paul.sandoz at oracle.com Fri Apr 29 23:42:43 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 29 Apr 2016 16:42:43 -0700 Subject: RFR (M) 8155739: [TESTBUG] VarHandles/Unsafe tests for weakCAS should allow spurious failures In-Reply-To: <5723DC4D.8090507@oracle.com> References: <5723DC4D.8090507@oracle.com> Message-ID: <76809325-CB53-499E-885B-9ED4AB0DBBB5@oracle.com> > On 29 Apr 2016, at 15:12, Aleksey Shipilev wrote: > > Hi, > > I would like to fix a simple testbug in our weakCompareAndSet VarHandles > and Unsafe intrinsics tests. weakCompareAndSet is spec-ed to allow > spurious failures, but current tests do not allow that. This blocks > development and testing on non-x86 platforms. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8155739 > > Webrevs: > http://cr.openjdk.java.net/~shade/8155739/webrev.hs.00/ > http://cr.openjdk.java.net/~shade/8155739/webrev.jdk.00/ > Looks good. Small tweak if you so wish to do so: #if[JdkInternalMisc] static final int WEAK_ATTEMPTS = Integer.getInteger("weakAttempts", 10); #end[JdkInternalMisc] which avoids changes to the SunMiscUnsafe* tests. Paul. > The tests are auto-generated, and the substantiative changes are in > *.template files. I also removed obsolete generate-unsafe-tests.sh. I > would like to push through hs-comp to expose this to Power and AArch64 > folks early. > > Testing: x86_64, jdk:java/lang/invoke/VarHandle, hotspot:compiler/unsafe > > Thanks, > -Aleksey > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tomasz.wojtowicz at intel.com Fri Apr 29 23:57:41 2016 From: tomasz.wojtowicz at intel.com (Wojtowicz, Tomasz) Date: Fri, 29 Apr 2016 23:57:41 +0000 Subject: RFR (M): 8154974: AVX-512 equipped inflate, has_negatives & compress intrinsics Message-ID: <3616187E21868C40AD1B36D41D29F4C1520B04B1@FMSMSX106.amr.corp.intel.com> I would like to contribute following change: Review details Review Title: AVX-512 equipped inflate, has_negatives & compress intrinsics Review ID: #8154974 Diff: http://cr.openjdk.java.net/~vdeshpande/8154974/webrev.00/ Description: 512 bit wide upgrades for previously 128-256 wide implementations using mask registers for tail computations. Link: https://bugs.openjdk.java.net/browse/JDK-8154974 Author: Tomasz, Wojtowicz -- Thank you, Tomek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Sat Apr 30 00:24:48 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 30 Apr 2016 03:24:48 +0300 Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM) failed: cannot alias-analyze an untyped ptr In-Reply-To: <99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com> References: <5723EA35.5020809@oracle.com> <99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com> Message-ID: <5723FB50.6050301@oracle.com> Thanks for the feedback, Vladimir. On 4/30/16 2:39 AM, Vladimir Kozlov wrote: > I am not comfortable with this fix. You may replace in(Base) != NULL > with TOP. Do you see any cases when it is possible? I don't see any sense in an absolute address with a valid heap base. I can check that in(Base) == in(Address) == NULL. It will fix the problem as well. > Also it should not be RAW pointer (TOP as Base) if it is created by > graph transformation from normal oop pointer. > I think we should track which pointers are really RAW when creating them. > > Can you explain why we have such graph shape where we access memory > after a merge point and on one merged path has NULL as pointer to > object. There should be NULL check after merge before memory access in > such case. It's not necessarily a normal oop pointer. Double-register addressing mode is the source of such shapes. Consider the following example: Object o = (flag ? INSTANCE : null); long off = (flag ? F_OFFSET : ADDR); UNSAFE.getLong(o, off); is translated into: LoadL mem (AddP (Phi #NULL #NonNull) off) If such AddP is split through the Phi, it turns into (AddP #NULL #NULL off) and (AddP #NonNull #NonNull off). The former is untyped and causes problems later. What I can't replicate is how X-shaped control flow eligible for SplitIf transformation is produced. In the failing case, initial null & exact type checks of an oop local (on OSR entry) merge into redundant X-shaped block. Unsafe accesses uses the local as a base later. Best regards, Vladimir Ivanov > > On 4/29/16 4:11 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8155635 >> >> SplitIf transformation can produce untyped pointers when slitting AddP >> nodes for unsafe accesses through a Phi which >> merges non-null & null values: >> AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull)) >> >> Proposed fix is to enable oop pointer to raw pointer conversion for >> absolute addresses. >> >> I also experimented with blocking SplitIf transformation is such >> cases, but the transformation seems viable and >> considerably simplifies the graph: X-shaped control flow is untangled >> by eliminating redundant and the transformation >> sharpens types on both branches. >> >> I checked specifically how Phi merges raw & oop pointers after the >> split and it works fine. >> >> Testing: failing test, JPRT, RBT (hs-tier0-comp.js). >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to >> write a simplified test case which triggers >> SplitIf transformation. From vladimir.kozlov at oracle.com Sat Apr 30 01:30:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Apr 2016 18:30:28 -0700 Subject: [9] RFR (XS): 8155635: C2: assert(flat != TypePtr::BOTTOM) failed: cannot alias-analyze an untyped ptr In-Reply-To: <5723FB50.6050301@oracle.com> References: <5723EA35.5020809@oracle.com> <99d03b20-bdb4-5948-89aa-eced57b279af@oracle.com> <5723FB50.6050301@oracle.com> Message-ID: <57240AB4.8040406@oracle.com> On 4/29/16 5:24 PM, Vladimir Ivanov wrote: > Thanks for the feedback, Vladimir. > > On 4/30/16 2:39 AM, Vladimir Kozlov wrote: >> I am not comfortable with this fix. You may replace in(Base) != NULL >> with TOP. > Do you see any cases when it is possible? > I don't see any sense in an absolute address with a valid heap base. > > I can check that in(Base) == in(Address) == NULL. It will fix the problem as well. > >> Also it should not be RAW pointer (TOP as Base) if it is created by >> graph transformation from normal oop pointer. >> I think we should track which pointers are really RAW when creating them. >> >> Can you explain why we have such graph shape where we access memory >> after a merge point and on one merged path has NULL as pointer to >> object. There should be NULL check after merge before memory access in >> such case. > It's not necessarily a normal oop pointer. Double-register addressing mode is the source of such shapes. Consider the following example: > > Object o = (flag ? INSTANCE : null); > long off = (flag ? F_OFFSET : ADDR); > UNSAFE.getLong(o, off); I think for this graph shape C2 type system gave up and drops type to general Ptr::BOTTOM because it does not know that 'off' can be address and not a normal offset on dead path where base is NULL. We usually do: long l = flag ? o.field : UNSAFE.getLong(addr); And for UNSAFE.getLong(addr) we generate Raw pointer address (make_unsafe_address()). What I am saying is that C2 treat long value as address only when it was used as direct parameter for unsafe. See LibraryCallKit::classify_unsafe_addr(). Who produces ADDR? May be we can't set flag to indicate that it is RAW address. We should discuss it with John who is *the* expert in this. Thanks, Vladimir > > is translated into: > > LoadL mem (AddP (Phi #NULL #NonNull) off) > > If such AddP is split through the Phi, it turns into (AddP #NULL #NULL off) and (AddP #NonNull #NonNull off). The former is untyped and causes problems later. > > What I can't replicate is how X-shaped control flow eligible for SplitIf transformation is produced. > > In the failing case, initial null & exact type checks of an oop local (on OSR entry) merge into redundant X-shaped block. Unsafe accesses uses the local as a base later. > > Best regards, > Vladimir Ivanov > >> >> On 4/29/16 4:11 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8155635/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8155635 >>> >>> SplitIf transformation can produce untyped pointers when slitting AddP >>> nodes for unsafe accesses through a Phi which >>> merges non-null & null values: >>> AddP ... (Phi (ConP #NULL) (CheckCastPP Oop:...:NotNull)) >>> >>> Proposed fix is to enable oop pointer to raw pointer conversion for >>> absolute addresses. >>> >>> I also experimented with blocking SplitIf transformation is such >>> cases, but the transformation seems viable and >>> considerably simplifies the graph: X-shaped control flow is untangled >>> by eliminating redundant and the transformation >>> sharpens types on both branches. >>> >>> I checked specifically how Phi merges raw & oop pointers after the >>> split and it works fine. >>> >>> Testing: failing test, JPRT, RBT (hs-tier0-comp.js). >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> PS: though AddP (Phi #NULL #NotNull) shape is common, I wasn't able to >>> write a simplified test case which triggers >>> SplitIf transformation. From edward.nevill at gmail.com Sat Apr 30 08:17:29 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Sat, 30 Apr 2016 09:17:29 +0100 Subject: 8155790: aarch64: debug VM fails to start after 8155617 Message-ID: <1462004249.32168.11.camel@mint> Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8155790/ JIRA: https://bugs.openjdk.java.net/browse/JDK-8155790 This fixes an issue where the debug build fails to start after 8155617 with the error # Internal Error (/scratch/rwestrel/hs-comp/hotspot/src/share/vm/runtime/stubRoutines.cpp:344), pid=16911, tid=16912 # assert(s.body[i] == 32) failed: what? The problem is that I failed to observe that the base register must point immediately after the end of the block being zeroed on exit from the zero loop. In addition the base register can sometimes be byte aligned due to vectorisation and this was not handled. The above webrev fixes this. I have tested a debug build with jtreg hotspot and langtools. OK to push? Ed. From aph at redhat.com Sat Apr 30 08:39:57 2016 From: aph at redhat.com (Andrew Haley) Date: Sat, 30 Apr 2016 09:39:57 +0100 Subject: 8155790: aarch64: debug VM fails to start after 8155617 In-Reply-To: <1462004249.32168.11.camel@mint> References: <1462004249.32168.11.camel@mint> Message-ID: <57246F5D.7090609@redhat.com> On 30/04/16 09:17, Edward Nevill wrote: > http://cr.openjdk.java.net/~enevill/8155790/ > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8155790 OK, thanks. Andrew.