From christian.thalinger at oracle.com Fri Apr 1 00:22:08 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 31 Mar 2016 14:22:08 -1000 Subject: RFR: 8144964: JVMCI compilations need to be disabled until the module system is initialized In-Reply-To: <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> References: <56FCB10C.3020609@oracle.com> <1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com> <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> Message-ID: Vladimir pointed out a bug. Of course it should be: + bool must_load; +#if INCLUDE_JVMCI + if (EnableJVMCI) { + // If JVMCI is enabled we require its classes to be found. + must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci); + } else +#endif + { + must_load = (init_opt < SystemDictionary::Opt); + } > On Mar 31, 2016, at 1:08 PM, Christian Thalinger wrote: > > I found a problem when graal.jar is appended to the boot class path. Somehow (and I don?t know why, yet) in that case jdk.vm.ci classes are not found when trying to preload them and the VM crashes. We need to make sure the jdk.vm.ci classes are preloaded if the JVMCI is enabled. > > diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp > --- a/src/share/vm/classfile/systemDictionary.cpp Thu Mar 31 09:16:49 2016 -0700 > +++ b/src/share/vm/classfile/systemDictionary.cpp Thu Mar 31 13:04:35 2016 -1000 > @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla > int sid = (info >> CEIL_LG_OPTION_LIMIT); > Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid); > InstanceKlass** klassp = &_well_known_klasses[id]; > - bool must_load = (init_opt < SystemDictionary::Opt); > + > + bool must_load; > +#if INCLUDE_JVMCI > + if (EnableJVMCI) { > + // If JVMCI is enabled we require its classes to be found. > + must_load = (init_opt <= SystemDictionary::Jvmci); > + } else > +#endif > + { > + must_load = (init_opt < SystemDictionary::Opt); > + } > + > if ((*klassp) == NULL) { > Klass* k; > if (must_load) { > diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp > --- a/src/share/vm/classfile/systemDictionary.hpp Thu Mar 31 09:16:49 2016 -0700 > +++ b/src/share/vm/classfile/systemDictionary.hpp Thu Mar 31 13:04:35 2016 -1000 > @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic { > > Opt, // preload tried; NULL if not present > #if INCLUDE_JVMCI > - Jvmci, // preload tried; error if not present, use only with JVMCI > + Jvmci, // preload tried; error if not present if JVMCI enabled > #endif > OPTION_LIMIT, > CEIL_LG_OPTION_LIMIT = 2 // OPTION_LIMIT <= (1< > >> On Mar 31, 2016, at 11:10 AM, Christian Thalinger > wrote: >> >> Thanks, Vladimir. >> >>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov > wrote: >>> >>> Looks fine. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/30/16 5:01 PM, Christian Thalinger wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8144964 >>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/ >>>> >>>> JVMCI compilations need to be disabled until the module system is initialized. Basically, only allow tier 1-3 compilations until it's up. >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Apr 1 00:36:01 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 31 Mar 2016 17:36:01 -0700 Subject: RFR: 8144964: JVMCI compilations need to be disabled until the module system is initialized In-Reply-To: References: <56FCB10C.3020609@oracle.com> <1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com> <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> Message-ID: <56FDC271.7090201@oracle.com> Looks good. Thanks, Vladimir On 3/31/16 5:22 PM, Christian Thalinger wrote: > Vladimir pointed out a bug. Of course it should be: > > + bool must_load; > +#if INCLUDE_JVMCI > + if (EnableJVMCI) { > + // If JVMCI is enabled we require its classes to be found. > + must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci); > + } else > +#endif > + { > + must_load = (init_opt < SystemDictionary::Opt); > + } > >> On Mar 31, 2016, at 1:08 PM, Christian Thalinger > > wrote: >> >> I found a problem when graal.jar is appended to the boot class path. Somehow (and I don?t know why, yet) in that case >> jdk.vm.ci classes are not found when trying to preload them and the VM crashes. We need to make sure the jdk.vm.ci >> classes are preloaded if the JVMCI is enabled. >> >> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp >> --- a/src/share/vm/classfile/systemDictionary.cppThu Mar 31 09:16:49 2016 -0700 >> +++ b/src/share/vm/classfile/systemDictionary.cppThu Mar 31 13:04:35 2016 -1000 >> @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla >> int sid = (info >> CEIL_LG_OPTION_LIMIT); >> Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid); >> InstanceKlass** klassp = &_well_known_klasses[id]; >> - bool must_load = (init_opt < SystemDictionary::Opt); >> + >> + bool must_load; >> +#if INCLUDE_JVMCI >> + if (EnableJVMCI) { >> + // If JVMCI is enabled we require its classes to be found. >> + must_load = (init_opt <= SystemDictionary::Jvmci); >> + } else >> +#endif >> + { >> + must_load = (init_opt < SystemDictionary::Opt); >> + } >> + >> if ((*klassp) == NULL) { >> Klass* k; >> if (must_load) { >> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp >> --- a/src/share/vm/classfile/systemDictionary.hppThu Mar 31 09:16:49 2016 -0700 >> +++ b/src/share/vm/classfile/systemDictionary.hppThu Mar 31 13:04:35 2016 -1000 >> @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic { >> >> Opt, // preload tried; NULL if not present >> #if INCLUDE_JVMCI >> - Jvmci, // preload tried; error if not present, use only with JVMCI >> + Jvmci, // preload tried; error if not present if JVMCI enabled >> #endif >> OPTION_LIMIT, >> CEIL_LG_OPTION_LIMIT = 2 // OPTION_LIMIT <= (1<> >> >>> On Mar 31, 2016, at 11:10 AM, Christian Thalinger >> > wrote: >>> >>> Thanks, Vladimir. >>> >>>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov > wrote: >>>> >>>> Looks fine. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/30/16 5:01 PM, Christian Thalinger wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8144964 >>>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/ >>>>> >>>>> JVMCI compilations need to be disabled until the module system is initialized. Basically, only allow tier 1-3 >>>>> compilations until it's up. >>>>> >>> >> > From christian.thalinger at oracle.com Fri Apr 1 00:36:59 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 31 Mar 2016 14:36:59 -1000 Subject: RFR: 8144964: JVMCI compilations need to be disabled until the module system is initialized In-Reply-To: <56FDC271.7090201@oracle.com> References: <56FCB10C.3020609@oracle.com> <1E67EA8E-A40A-41EB-9C3F-0D08520A96FA@oracle.com> <792D474E-6C0F-4F43-999F-560A90A45F0C@oracle.com> <56FDC271.7090201@oracle.com> Message-ID: <8E272A3F-4D29-4E31-BAC9-B602745F87A4@oracle.com> Thank you, Vladimir. > On Mar 31, 2016, at 2:36 PM, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 3/31/16 5:22 PM, Christian Thalinger wrote: >> Vladimir pointed out a bug. Of course it should be: >> >> + bool must_load; >> +#if INCLUDE_JVMCI >> + if (EnableJVMCI) { >> + // If JVMCI is enabled we require its classes to be found. >> + must_load = (init_opt < SystemDictionary::Opt) || (init_opt == SystemDictionary::Jvmci); >> + } else >> +#endif >> + { >> + must_load = (init_opt < SystemDictionary::Opt); >> + } >> >>> On Mar 31, 2016, at 1:08 PM, Christian Thalinger >> >> wrote: >>> >>> I found a problem when graal.jar is appended to the boot class path. Somehow (and I don?t know why, yet) in that case >>> jdk.vm.ci classes are not found when trying to preload them and the VM crashes. We need to make sure the jdk.vm.ci >>> classes are preloaded if the JVMCI is enabled. >>> >>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.cpp >>> --- a/src/share/vm/classfile/systemDictionary.cppThu Mar 31 09:16:49 2016 -0700 >>> +++ b/src/share/vm/classfile/systemDictionary.cppThu Mar 31 13:04:35 2016 -1000 >>> @@ -2063,7 +2063,18 @@ bool SystemDictionary::initialize_wk_kla >>> int sid = (info >> CEIL_LG_OPTION_LIMIT); >>> Symbol* symbol = vmSymbols::symbol_at((vmSymbols::SID)sid); >>> InstanceKlass** klassp = &_well_known_klasses[id]; >>> - bool must_load = (init_opt < SystemDictionary::Opt); >>> + >>> + bool must_load; >>> +#if INCLUDE_JVMCI >>> + if (EnableJVMCI) { >>> + // If JVMCI is enabled we require its classes to be found. >>> + must_load = (init_opt <= SystemDictionary::Jvmci); >>> + } else >>> +#endif >>> + { >>> + must_load = (init_opt < SystemDictionary::Opt); >>> + } >>> + >>> if ((*klassp) == NULL) { >>> Klass* k; >>> if (must_load) { >>> diff -r 1b1fb02718ef src/share/vm/classfile/systemDictionary.hpp >>> --- a/src/share/vm/classfile/systemDictionary.hppThu Mar 31 09:16:49 2016 -0700 >>> +++ b/src/share/vm/classfile/systemDictionary.hppThu Mar 31 13:04:35 2016 -1000 >>> @@ -241,7 +241,7 @@ class SystemDictionary : AllStatic { >>> >>> Opt, // preload tried; NULL if not present >>> #if INCLUDE_JVMCI >>> - Jvmci, // preload tried; error if not present, use only with JVMCI >>> + Jvmci, // preload tried; error if not present if JVMCI enabled >>> #endif >>> OPTION_LIMIT, >>> CEIL_LG_OPTION_LIMIT = 2 // OPTION_LIMIT <= (1<>> >>> >>>> On Mar 31, 2016, at 11:10 AM, Christian Thalinger >>>> >> wrote: >>>> >>>> Thanks, Vladimir. >>>> >>>>> On Mar 30, 2016, at 7:09 PM, Vladimir Kozlov >> wrote: >>>>> >>>>> Looks fine. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/30/16 5:01 PM, Christian Thalinger wrote: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8144964 >>>>>> http://cr.openjdk.java.net/~twisti/8144964/webrev.01/ >>>>>> >>>>>> JVMCI compilations need to be disabled until the module system is initialized. Basically, only allow tier 1-3 >>>>>> compilations until it's up. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Fri Apr 1 03:36:45 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 1 Apr 2016 03:36:45 +0000 Subject: CR for RFR 8151573 In-Reply-To: <56FCADE3.20403@oracle.com> References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> Message-ID: Vladimir, I think I have addressed every concern in the latest webrev: http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. The code is fully retested with no issues. Thanks, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, March 30, 2016 9:56 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 On 3/30/16 4:57 PM, Berg, Michael C wrote: > See below for context. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 3:51 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > Michael, > > First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. > > multi_version_post_loops() can use is_canonical_main_loop_entry() from > 8148754 but you need to modify it to move > is_Main() assert to other call sites. > > ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > > I did not get rce'd post loop checks in loopnode.cpp. > > First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. > Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. > The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > > Swap next checks since has_range_checks() may be expensive scanning loop body: > + // only process RCE'd main loops > + if (cl->has_range_checks() || !cl->is_main_loop()) return; > > Ok, makes sense. Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. But, please, rename has_range_checks(cl) method to avoid confusion. Thanks, Vladimir > > Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. > I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. > > I perceive the real problem is don't scan more than once after we check. I will move towards that solution. > > > Why you need local copies?: > > - visited.Clear(); > - clones.clear(); > + Arena *a = Thread::current()->resource_area(); > + VectorSet visited(a); > + Node_Stack clones(a, main_head->back_control()->outcnt()); > > I will look into this, and see if it can be cleaned up. > > > I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. > > Ok, I will look into a version without PostLoopInfo. > > Thanks, > Vladimir > > On 3/30/16 1:44 PM, Berg, Michael C wrote: >> Here is an update after full testing, the webrev is: >> >> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >> >> Please review and comment, >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: hotspot-compiler-dev >> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >> Berg, Michael C >> Sent: Wednesday, March 16, 2016 10:30 AM >> To: Vladimir Kozlov ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: RE: CR for RFR 8151573 >> >> Putting a hold on the review, retesting everything on my end. >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 16, 2016 8:42 AM >> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: CR for RFR 8151573 >> >> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>> Vladimir: >>> >>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >> >> I understand that we can get some benefits. But in general case they will not be visible. >> >>> >>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >> >> Yes, after you explained me vector masking I now understand why it could be used for post loop. >> >> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >> >> Regards, >> Vladimir >> >>> >>> Regards, >>> Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, March 15, 2016 4:37 PM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> As we all know we can always construct microbenchmarks which shows >>> 30% >>> - 50% difference. When in real application we will never see >>> difference. I still don't see a real reason why we should spend time >>> and optimize >>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>> >>> Why "programmable SIMD" depends on it? What about pre-loop? >>> >>> Thanks, >>> Vladimir >>> >>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>> Correction below... >>>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev >>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>>> Berg, Michael C >>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: RE: CR for RFR 8151573 >>>> >>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>> >>>> for(int i = 0; i < process_len; i++) >>>> { >>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>> } >>>> >>>> The above code makes 9 vector ops. >>>> >>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>> >>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>> >>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>> >>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>> >>>> Regards, >>>> Michael >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> Hi Michael, >>>> >>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute multi-versioning post loops for range >>>>> check elimination. Beforehand cfg optimizations after register >>>>> allocation were where post loop optimizations were done for range >>>>> checks. I have added code which produces the desired effect much >>>>> earlier by introducing a safe transformation which will minimally >>>>> allow a range check free version of the final post loop to execute >>>>> up until the point it actually has to take a range check exception >>>>> by re-ranging the limit of the rce'd loop, then exit the rce'd >>>>> post loop and take the range check exception in the legacy loops execution if required. >>>>> If during optimization we discover that we know enough to remove >>>>> the range check version of the post loop, mostly by exposing the >>>>> load range values into the limit logic of the rce'd post loop, we >>>>> will eliminate the range check post loop altogether much like cfg >>>>> optimizations did, but much earlier. This gives optimizations >>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>> vectorize the rce'd post loops to a single iteration based on mask >>>>> vectors which map to the residual iterations. Programmable SIMD >>>>> will be a follow on change set utilizing this code to stage its >>>>> work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>> Currently I have enabled this optimization for x86 only. We base >>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>> >>>>> This code was tested as follows: >>>>> >>>>> >>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>> >>>>> >>>>> webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>> >>>>> Thanks, >>>>> >>>>> Michael >>>>> From igor.veresov at oracle.com Fri Apr 1 05:44:05 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 31 Mar 2016 22:44:05 -0700 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56FD2F47.1000709@azulsystems.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> <56A8BC9D.8060004@azulsystems.com> <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> <56AA2AE4.2090803@azulsystems.com> <2538083C-7906-44AA-A074-7DBF5F2D8654@oracle.com> <50C14C66-4068-4DD7-BD94-96E37F7C9B0A@oracle.com> <56AF85F3.3060802@azulsystems.com> <56BBCBF4.2070504@azulsystems.com> <56BD1F7F.3020808@azulsystems.com> <56E0B770.1@azulsystems.com> <56FD2F47.1000709@azulsystems.com> Message-ID: <080FF9DB-5B5C-47B7-AC1C-174755C9B826@oracle.com> Looks good. igor > On Mar 31, 2016, at 7:08 AM, Ivan Krylov wrote: > > I have updated the webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/ > > I overlooked the missing LIR_Assembler::on_spin_wait() for non-x86 platforms. I have no access to non-intel boxes > and hence saw the problems only at integration time. > c1_LIRAssembler.o: In function `LIR_Assembler::emit_op0(LIR_Op0*)': > hotspot/src/share/vm/c1/c1_LIRAssembler.cpp:683: undefined reference to `LIR_Assembler::on_spin_wait()' > > So, 3 empty method implementations were added to the corresponding files - the top 3 on the webrev above. > > Paul, thanks for identifying those issues. > > Regards, > > Ivan > > > > > On 10/03/2016 04:04, Igor Veresov wrote: >> Ok, good. >> >> igor >> >>> On Mar 9, 2016, at 3:53 PM, Ivan Krylov wrote: >>> >>> Paul, Indeed, thanks. I have modified the test. >>> I also made changes to reflect the fact that onSpinWait is now decided to be placed into j.l.Thread. >>> >>> Igor, >>> This is a new webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/ >>> This is the diff between previous and this patches (03 vs 04): >>> http://cr.openjdk.java.net/~ikrylov/8147844.hs.04/diff.txt >>> >>> Thanks, >>> >>> Ivan >>> >>> On 12/02/2016 06:01, Paul Sandoz wrote: >>>>> On 12 Feb 2016, at 00:55, Ivan Krylov wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> Thanks both for your help and your reviews. >>>>> Here is a new version, tested on mac for c1 and c2: >>>>> >>>>> http://cr.openjdk.java.net/~ikrylov/8147844.hs.03 >>>>> >>>> Now that support C1 is supported should the test be updated with C1 only execution? >>>> >>>> Paul. > From rahul.v.raghavan at oracle.com Fri Apr 1 08:22:59 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 1 Apr 2016 01:22:59 -0700 (PDT) Subject: RFR (XXS): 8150690: C++11 user-defined literal syntax in jvmciCompilerToVM.cpp In-Reply-To: <84F04E81-69FB-402F-956B-1C9CD21AD4C2@oracle.com> References: <99c44ac7-0279-421f-9469-9f5445d1312a@default> <84F04E81-69FB-402F-956B-1C9CD21AD4C2@oracle.com> Message-ID: <00b78f82-b68b-4c42-867f-438efccaa3ba@default> > -----Original Message----- > From: Christian Thalinger > Sent: Friday, April 01, 2016 5:01 AM > > Looks correct. Thank you Chris. > > > On Mar 30, 2016, at 6:21 PM, Rahul Raghavan wrote: > > > > Hi, > > > > : https://bugs.openjdk.java.net/browse/JDK-8150690 > > : http://cr.openjdk.java.net/~rraghavan/8150690/webrev.00/ > > > > - Added space required between literal and identifier for C++11, in CompilerToVM::methods array initializer. > > (only white space changes) > > - Confirmed no other similar issues elsewhere in jvmciCompilerToVM.cpp. > > - This proposed fix is similar to fix done for JDK-8081202, JDK-8135209, JDK-8132969. > > > > - Could not try and reconfirm with Visual Studio 2015. > > But manually confirmed the changes and > > understood another related infrastructure/build task is reported separately - JDK-8145549 (to build OpenJDK using Visual Studio > 2015 Community edition) > > > > - No issues with jprt run (-testset hotspot). > > > > Thanks, > > Rahul > From jamsheed.c.m at oracle.com Fri Apr 1 09:02:04 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 1 Apr 2016 14:32:04 +0530 Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...) failed: a) MT-unsafe modification of inline cache In-Reply-To: <56F9521E.5020808@oracle.com> References: <56F456B7.6010104@oracle.com> <3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com> <56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com> <56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com> <56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com> <56F9521E.5020808@oracle.com> Message-ID: <56FE390C.20906@oracle.com> Hi Vladimir Ivanov, I used overloaded clearInlineCaches wb api. revised webrevs: hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/ root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/ Best Regards, Jamsheed On 3/28/2016 9:17 PM, Vladimir Ivanov wrote: >>> in addition it clears this. >>> void static_stub_Relocation::clear_inline_cache() { >>> // Call stub is only used when calling the interpreted code. >>> // It does not really need to be cleared, except that we want to >>> clean out the methodoop. >>> CompiledStaticCall::set_stub_to_clean(this); >> >> i want assert to catch this issue. if static stubs are cleared, assert >> wouldn't fail. > I see. Then I suggest to rename the method to > WhiteBox.cleanupInlineCaches() and iterate over the whole code cache > (don't specify Method*). > > void CodeCache::cleanup_inline_caches() { > assert_locked_or_safepoint(CodeCache_lock); > NMethodIterator iter; > while(iter.next_alive()) { > iter.method()->cleanup_inline_caches(true); > } > } > > Best regards, > Vladimir Ivanov > > >> -Jamsheed >>> } >>> >>> Best Regards, >>> Jamsheed >>>> >>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb)) >>>> VM_ClearICs clear_ics; >>>> VMThread::execute(&clear_ics); >>>> WB_END >>>> >>>> class VM_ClearICs: public VM_Operation { >>>> ... >>>> void doit() { CodeCache::clear_inline_caches(); } >>>> ... >>>> }; >>>> >>>> void CodeCache::clear_inline_caches() { >>>> assert_locked_or_safepoint(CodeCache_lock); >>>> NMethodIterator iter; >>>> while(iter.next_alive()) { >>>> iter.method()->clear_inline_caches(); >>>> } >>>> } >>>> >>>> void nmethod::clear_inline_caches() { >>>> assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's >>>> only allowed at safepoint"); >>>> if (is_zombie()) { >>>> return; >>>> } >>>> >>>> RelocIterator iter(this); >>>> while (iter.next()) { >>>> iter.reloc()->clear_inline_cache(); >>>> } >>>> } >>>> >>>> void static_call_Relocation::clear_inline_cache() { >>>> // Safe call site info >>>> CompiledStaticCall* handler = compiledStaticCall_at(this); >>>> handler->set_to_clean(); >>>> } >>>> >>>> void opt_virtual_call_Relocation::clear_inline_cache() { >>>> // No stubs for ICs >>>> // Clean IC >>>> ResourceMark rm; >>>> CompiledIC* icache = CompiledIC_at(this); >>>> icache->set_to_clean(); >>>> } >>>> >>>> void virtual_call_Relocation::clear_inline_cache() { >>>> // No stubs for ICs >>>> // Clean IC >>>> ResourceMark rm; >>>> CompiledIC* icache = CompiledIC_at(this); >>>> icache->set_to_clean(); >>>> } >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> >>>>> Best Regards, >>>>> Jamsheed >>>>> >>>>> On 3/26/2016 1:50 PM, Dean Long wrote: >>>>>> Instead of changing cleanup_inline_caches() to take a new flag, can >>>>>> you use the existing >>>>>> clear_inline_caches()? >>>>>> >>>>>> dl >>>>>> >>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote: >>>>>>> Thank you Chris. >>>>>>> I have updated the code. >>>>>>> >>>>>>> + if (method == NULL) { >>>>>>> + return; >>>>>>> + } >>>>>>> + nmethod* nm = method->code(); >>>>>>> + if (nm == NULL || nm->is_unloaded()) { >>>>>>> + return; >>>>>>> + } >>>>>>> + nm->cleanup_inline_caches(true); >>>>>>> Best Regards, >>>>>>> Jamsheed >>>>>>> >>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote: >>>>>>>> >>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> Request for review, >>>>>>>>> >>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247 >>>>>>>>> >>>>>>>>> webrevs: >>>>>>>>> fix: >>>>>>>>> jdk part: >>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/ >>>>>>>>> >>>>>>>>> >>>>>>>>> newly added test case >>>>>>>>> hotspot part: >>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/ >>>>>>>>> >>>>>>>>> under hs-comp/test >>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java >>>>>>>>> Testing: JPRT with new test case, with fix, without fix >>>>>>>>> >>>>>>>>> Problem Summary: MH.invoke linksite take assistance of java code >>>>>>>>> to get an adapter method. Here a new method holder class and a >>>>>>>>> adapter method are created for a MT and lform instance is cached. >>>>>>>>> Normally this cached lform get returned for a linksite request of >>>>>>>>> same MT. When these cached lform get collected(due to memory >>>>>>>>> pressure), a new class and method gets created for same MT(even >>>>>>>>> though old method holder class and adapter method are live). >>>>>>>>> Fix Summary: Kept a strong reference to lform instance in adapter >>>>>>>>> method holder class of MT. >>>>>>>> >>>>>>>> Wow! You found the cause for his long-standing issue? Nice. >>>>>>>> + if (method == NULL) { return; } >>>>>>>> + nmethod* nm = method->code(); >>>>>>>> + if (nm == NULL) { return; } >>>>>>>> + if (nm->is_unloaded()) { return; } >>>>>>>> Please put the return and } on separate lines. >>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> From igor.ignatyev at oracle.com Fri Apr 1 09:22:42 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 1 Apr 2016 12:22:42 +0300 Subject: RFR(S): 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays In-Reply-To: <56FC242C.6030108@oracle.com> References: <56FC242C.6030108@oracle.com> Message-ID: <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com> Hi Dmitrij, the fix looks good to me Thanks, ? Igor > On Mar 30, 2016, at 10:08 PM, Dmitrij Pochepko wrote: > > Hi, > > please review small fix for 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays > > A problem was in Arrays.fill method usage with mismatched argument types for primitive types arrays, so, generated tests compilation failed. > > This fix removes respective Arrays.fill usage generation for primitive types. > > bug: https://bugs.openjdk.java.net/browse/JDK-8151828 > webrev: http://cr.openjdk.java.net/~dpochepk/8151828/webrev.01/ > > I've tested fix locally. > > Thanks, > Dmitrij > > From vladimir.x.ivanov at oracle.com Fri Apr 1 09:48:53 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 1 Apr 2016 12:48:53 +0300 Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...) failed: a) MT-unsafe modification of inline cache In-Reply-To: <56FE390C.20906@oracle.com> References: <56F456B7.6010104@oracle.com> <3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com> <56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com> <56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com> <56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com> <56F9521E.5020808@oracle.com> <56FE390C.20906@oracle.com> Message-ID: <56FE4405.3080807@oracle.com> Looks good! Small detail: the following comment in the test is misleading: 71 test(); // new LF creation should fail. LF shouldn't be unloaded, so no new LF is normally not instantiated. Something like the following: // Trigger call site re-resolution. Invoker LambdaForm should stay the same. test(); No need to send new webrev. Best regards, Vladimir Ivanov On 4/1/16 12:02 PM, Jamsheed C m wrote: > Hi Vladimir Ivanov, > > I used overloaded clearInlineCaches wb api. > > revised webrevs: > hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/ > root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/ > > Best Regards, > Jamsheed > > On 3/28/2016 9:17 PM, Vladimir Ivanov wrote: >>>> in addition it clears this. >>>> void static_stub_Relocation::clear_inline_cache() { >>>> // Call stub is only used when calling the interpreted code. >>>> // It does not really need to be cleared, except that we want to >>>> clean out the methodoop. >>>> CompiledStaticCall::set_stub_to_clean(this); >>> >>> i want assert to catch this issue. if static stubs are cleared, assert >>> wouldn't fail. >> I see. Then I suggest to rename the method to >> WhiteBox.cleanupInlineCaches() and iterate over the whole code cache >> (don't specify Method*). >> >> void CodeCache::cleanup_inline_caches() { >> assert_locked_or_safepoint(CodeCache_lock); >> NMethodIterator iter; >> while(iter.next_alive()) { >> iter.method()->cleanup_inline_caches(true); >> } >> } >> >> Best regards, >> Vladimir Ivanov >> >> >>> -Jamsheed >>>> } >>>> >>>> Best Regards, >>>> Jamsheed >>>>> >>>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb)) >>>>> VM_ClearICs clear_ics; >>>>> VMThread::execute(&clear_ics); >>>>> WB_END >>>>> >>>>> class VM_ClearICs: public VM_Operation { >>>>> ... >>>>> void doit() { CodeCache::clear_inline_caches(); } >>>>> ... >>>>> }; >>>>> >>>>> void CodeCache::clear_inline_caches() { >>>>> assert_locked_or_safepoint(CodeCache_lock); >>>>> NMethodIterator iter; >>>>> while(iter.next_alive()) { >>>>> iter.method()->clear_inline_caches(); >>>>> } >>>>> } >>>>> >>>>> void nmethod::clear_inline_caches() { >>>>> assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's >>>>> only allowed at safepoint"); >>>>> if (is_zombie()) { >>>>> return; >>>>> } >>>>> >>>>> RelocIterator iter(this); >>>>> while (iter.next()) { >>>>> iter.reloc()->clear_inline_cache(); >>>>> } >>>>> } >>>>> >>>>> void static_call_Relocation::clear_inline_cache() { >>>>> // Safe call site info >>>>> CompiledStaticCall* handler = compiledStaticCall_at(this); >>>>> handler->set_to_clean(); >>>>> } >>>>> >>>>> void opt_virtual_call_Relocation::clear_inline_cache() { >>>>> // No stubs for ICs >>>>> // Clean IC >>>>> ResourceMark rm; >>>>> CompiledIC* icache = CompiledIC_at(this); >>>>> icache->set_to_clean(); >>>>> } >>>>> >>>>> void virtual_call_Relocation::clear_inline_cache() { >>>>> // No stubs for ICs >>>>> // Clean IC >>>>> ResourceMark rm; >>>>> CompiledIC* icache = CompiledIC_at(this); >>>>> icache->set_to_clean(); >>>>> } >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>> >>>>>> Best Regards, >>>>>> Jamsheed >>>>>> >>>>>> On 3/26/2016 1:50 PM, Dean Long wrote: >>>>>>> Instead of changing cleanup_inline_caches() to take a new flag, can >>>>>>> you use the existing >>>>>>> clear_inline_caches()? >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote: >>>>>>>> Thank you Chris. >>>>>>>> I have updated the code. >>>>>>>> >>>>>>>> + if (method == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + nmethod* nm = method->code(); >>>>>>>> + if (nm == NULL || nm->is_unloaded()) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + nm->cleanup_inline_caches(true); >>>>>>>> Best Regards, >>>>>>>> Jamsheed >>>>>>>> >>>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote: >>>>>>>>> >>>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Request for review, >>>>>>>>>> >>>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247 >>>>>>>>>> >>>>>>>>>> webrevs: >>>>>>>>>> fix: >>>>>>>>>> jdk part: >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> newly added test case >>>>>>>>>> hotspot part: >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/ >>>>>>>>>> >>>>>>>>>> under hs-comp/test >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java >>>>>>>>>> Testing: JPRT with new test case, with fix, without fix >>>>>>>>>> >>>>>>>>>> Problem Summary: MH.invoke linksite take assistance of java code >>>>>>>>>> to get an adapter method. Here a new method holder class and a >>>>>>>>>> adapter method are created for a MT and lform instance is cached. >>>>>>>>>> Normally this cached lform get returned for a linksite request of >>>>>>>>>> same MT. When these cached lform get collected(due to memory >>>>>>>>>> pressure), a new class and method gets created for same MT(even >>>>>>>>>> though old method holder class and adapter method are live). >>>>>>>>>> Fix Summary: Kept a strong reference to lform instance in adapter >>>>>>>>>> method holder class of MT. >>>>>>>>> >>>>>>>>> Wow! You found the cause for his long-standing issue? Nice. >>>>>>>>> + if (method == NULL) { return; } >>>>>>>>> + nmethod* nm = method->code(); >>>>>>>>> + if (nm == NULL) { return; } >>>>>>>>> + if (nm->is_unloaded()) { return; } >>>>>>>>> Please put the return and } on separate lines. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Jamsheed >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> > From jamsheed.c.m at oracle.com Fri Apr 1 10:05:49 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 1 Apr 2016 15:35:49 +0530 Subject: RFR: 8067247: Crash: assert(method_holder->data() == 0 ...) failed: a) MT-unsafe modification of inline cache In-Reply-To: <56FE4405.3080807@oracle.com> References: <56F456B7.6010104@oracle.com> <3521DE25-3A44-4B50-92DD-AEF858416E4C@oracle.com> <56F56D5B.9060001@oracle.com> <56F64643.1090508@oracle.com> <56F8CCEA.2060804@oracle.com> <56F91212.9020002@oracle.com> <56F91309.6010006@oracle.com> <56F91623.6060103@oracle.com> <56F9521E.5020808@oracle.com> <56FE390C.20906@oracle.com> <56FE4405.3080807@oracle.com> Message-ID: <56FE47FD.2010905@oracle.com> Sure. Thank you Vladimir Ivanov! Best Regards, Jamsheed On 4/1/2016 3:18 PM, Vladimir Ivanov wrote: > Looks good! > > Small detail: the following comment in the test is misleading: > > 71 test(); // new LF creation should fail. > > LF shouldn't be unloaded, so no new LF is normally not instantiated. > > Something like the following: > // Trigger call site re-resolution. Invoker LambdaForm should stay > the same. > test(); > > No need to send new webrev. > > Best regards, > Vladimir Ivanov > > On 4/1/16 12:02 PM, Jamsheed C m wrote: >> Hi Vladimir Ivanov, >> >> I used overloaded clearInlineCaches wb api. >> >> revised webrevs: >> hs: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.02/ >> root: http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.01/ >> >> Best Regards, >> Jamsheed >> >> On 3/28/2016 9:17 PM, Vladimir Ivanov wrote: >>>>> in addition it clears this. >>>>> void static_stub_Relocation::clear_inline_cache() { >>>>> // Call stub is only used when calling the interpreted code. >>>>> // It does not really need to be cleared, except that we want to >>>>> clean out the methodoop. >>>>> CompiledStaticCall::set_stub_to_clean(this); >>>> >>>> i want assert to catch this issue. if static stubs are cleared, assert >>>> wouldn't fail. >>> I see. Then I suggest to rename the method to >>> WhiteBox.cleanupInlineCaches() and iterate over the whole code cache >>> (don't specify Method*). >>> >>> void CodeCache::cleanup_inline_caches() { >>> assert_locked_or_safepoint(CodeCache_lock); >>> NMethodIterator iter; >>> while(iter.next_alive()) { >>> iter.method()->cleanup_inline_caches(true); >>> } >>> } >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> >>>> -Jamsheed >>>>> } >>>>> >>>>> Best Regards, >>>>> Jamsheed >>>>>> >>>>>> WB_ENTRY(void, WB_ClearInlineCaches(JNIEnv* env, jobject wb)) >>>>>> VM_ClearICs clear_ics; >>>>>> VMThread::execute(&clear_ics); >>>>>> WB_END >>>>>> >>>>>> class VM_ClearICs: public VM_Operation { >>>>>> ... >>>>>> void doit() { CodeCache::clear_inline_caches(); } >>>>>> ... >>>>>> }; >>>>>> >>>>>> void CodeCache::clear_inline_caches() { >>>>>> assert_locked_or_safepoint(CodeCache_lock); >>>>>> NMethodIterator iter; >>>>>> while(iter.next_alive()) { >>>>>> iter.method()->clear_inline_caches(); >>>>>> } >>>>>> } >>>>>> >>>>>> void nmethod::clear_inline_caches() { >>>>>> assert(SafepointSynchronize::is_at_safepoint(), "cleaning of IC's >>>>>> only allowed at safepoint"); >>>>>> if (is_zombie()) { >>>>>> return; >>>>>> } >>>>>> >>>>>> RelocIterator iter(this); >>>>>> while (iter.next()) { >>>>>> iter.reloc()->clear_inline_cache(); >>>>>> } >>>>>> } >>>>>> >>>>>> void static_call_Relocation::clear_inline_cache() { >>>>>> // Safe call site info >>>>>> CompiledStaticCall* handler = compiledStaticCall_at(this); >>>>>> handler->set_to_clean(); >>>>>> } >>>>>> >>>>>> void opt_virtual_call_Relocation::clear_inline_cache() { >>>>>> // No stubs for ICs >>>>>> // Clean IC >>>>>> ResourceMark rm; >>>>>> CompiledIC* icache = CompiledIC_at(this); >>>>>> icache->set_to_clean(); >>>>>> } >>>>>> >>>>>> void virtual_call_Relocation::clear_inline_cache() { >>>>>> // No stubs for ICs >>>>>> // Clean IC >>>>>> ResourceMark rm; >>>>>> CompiledIC* icache = CompiledIC_at(this); >>>>>> icache->set_to_clean(); >>>>>> } >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>>> >>>>>>> Best Regards, >>>>>>> Jamsheed >>>>>>> >>>>>>> On 3/26/2016 1:50 PM, Dean Long wrote: >>>>>>>> Instead of changing cleanup_inline_caches() to take a new flag, >>>>>>>> can >>>>>>>> you use the existing >>>>>>>> clear_inline_caches()? >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 3/25/2016 9:54 AM, Jamsheed C m wrote: >>>>>>>>> Thank you Chris. >>>>>>>>> I have updated the code. >>>>>>>>> >>>>>>>>> + if (method == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + nmethod* nm = method->code(); >>>>>>>>> + if (nm == NULL || nm->is_unloaded()) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + nm->cleanup_inline_caches(true); >>>>>>>>> Best Regards, >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> On 3/25/2016 6:58 AM, Christian Thalinger wrote: >>>>>>>>>> >>>>>>>>>>> On Mar 24, 2016, at 11:05 AM, Jamsheed C m >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> Request for review, >>>>>>>>>>> >>>>>>>>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8067247 >>>>>>>>>>> >>>>>>>>>>> webrevs: >>>>>>>>>>> fix: >>>>>>>>>>> jdk part: >>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.jdk.00/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> newly added test case >>>>>>>>>>> hotspot part: >>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs.00/ >>>>>>>>>>> >>>>>>>>>>> under hs-comp/test >>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8067247/webrev.hs_comp.00/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Unit Test: test/compiler/jsr292/misc/gc/MHInvokeTest.java >>>>>>>>>>> Testing: JPRT with new test case, with fix, without fix >>>>>>>>>>> >>>>>>>>>>> Problem Summary: MH.invoke linksite take assistance of java >>>>>>>>>>> code >>>>>>>>>>> to get an adapter method. Here a new method holder class and a >>>>>>>>>>> adapter method are created for a MT and lform instance is >>>>>>>>>>> cached. >>>>>>>>>>> Normally this cached lform get returned for a linksite >>>>>>>>>>> request of >>>>>>>>>>> same MT. When these cached lform get collected(due to memory >>>>>>>>>>> pressure), a new class and method gets created for same >>>>>>>>>>> MT(even >>>>>>>>>>> though old method holder class and adapter method are live). >>>>>>>>>>> Fix Summary: Kept a strong reference to lform instance in >>>>>>>>>>> adapter >>>>>>>>>>> method holder class of MT. >>>>>>>>>> >>>>>>>>>> Wow! You found the cause for his long-standing issue? Nice. >>>>>>>>>> + if (method == NULL) { return; } >>>>>>>>>> + nmethod* nm = method->code(); >>>>>>>>>> + if (nm == NULL) { return; } >>>>>>>>>> + if (nm->is_unloaded()) { return; } >>>>>>>>>> Please put the return and } on separate lines. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Jamsheed >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> From aleksey.shipilev at oracle.com Fri Apr 1 11:35:28 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 1 Apr 2016 14:35:28 +0300 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign Message-ID: <56FE5D00.7000209@oracle.com> Hi, compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify String Concat strategies, because some of them are loading new methods and use them during String concat linkage and execution. Notably, this will happen inside of the asserts. We need to prime the asserts before using them in-between counter polls. Bug: https://bugs.openjdk.java.net/browse/JDK-8153265 Webrev: http://cr.openjdk.java.net/~shade/8153265/webrev.00/ Testing: offending test in oob/-Xcomp modes Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From dmitrij.pochepko at oracle.com Fri Apr 1 12:27:10 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Fri, 1 Apr 2016 15:27:10 +0300 Subject: RFR(S): 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays In-Reply-To: <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com> References: <56FC242C.6030108@oracle.com> <3424C25B-7FE2-4580-AD77-3E8B99E753AE@oracle.com> Message-ID: <56FE691E.2050208@oracle.com> Thank you! > Hi Dmitrij, > > the fix looks good to me > > Thanks, > ? Igor >> On Mar 30, 2016, at 10:08 PM, Dmitrij Pochepko wrote: >> >> Hi, >> >> please review small fix for 8151828: Jittester: array creation node handled inproperly in source code visitor for non-int numerical arrays >> >> A problem was in Arrays.fill method usage with mismatched argument types for primitive types arrays, so, generated tests compilation failed. >> >> This fix removes respective Arrays.fill usage generation for primitive types. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8151828 >> webrev: http://cr.openjdk.java.net/~dpochepk/8151828/webrev.01/ >> >> I've tested fix locally. >> >> Thanks, >> Dmitrij >> >> From zoltan.majo at oracle.com Fri Apr 1 12:32:01 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 1 Apr 2016 14:32:01 +0200 Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop optimizations to 'develop' In-Reply-To: <56FD4906.4040909@oracle.com> References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com> <56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com> Message-ID: <56FE6A41.6030703@oracle.com> Hi Vladimir, thank you for the feedback! On 03/31/2016 05:57 PM, Vladimir Kozlov wrote: > It is nice to have not product flags which is easy to remove :) Yes, indeed. :-) > > Clean up looks good. Thank you. > > Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions > -XX:-LoopLimitCheck" only? It has interesting code shape. Add comment > that it was ran with "-XX:+UnlockDiagnosticVMOptions > -XX:-LoopLimitCheck" to trigger problem. Yes, of course. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/ Thank you! Best regards, Zoltan > > Thanks, > Vladimir > > On 3/31/16 1:15 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for your feedback! >> >> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote: >>> These flags were added when I fixed long standing C2 problem with >>> counted loops: 5091921. >>> They were added to have ability to revert back to original code if >>> new code cause a problem. >>> Looks like the old code which executed with these flags switched off >>> become rotten. >>> >>> Zoltan, did you find what cause the crash? Looks like product VM was >>> used in the bug report. What result gives >>> fastdebug VM? >> >> I've tried starting different VM versions with the flag(s) off. The >> most frequent error I get is >> >> # Internal Error >> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), >> pid=32727, tid=32746 >> # assert(false) failed: Bad graph detected in build_loop_late >> >> So it seems that the code executed with the flags off has indeed >> become rotten. >> >>> Converting flags to develop will not prevent problems happening with >>> fastdebug VM where these flags could be switched >>> off even when they are develop. >>> >>> If the problem with original code (flags are off) is something >>> fundamental we may simple remove old code and remove >>> these flags and have only new code. 5 years already passed since >>> 5091921 was fixed. >> >> Yes, I agree. I think it's reasonable to remove the old code. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/ >> >> The changes pass JPRT. >> >> I've changed the title of the bug to "Cleanup: Remove some unused >> flags/code in loop optimizations" to better reflect >> what the change is doing. I have kept the original title in the RFR. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8072422. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8072422 >>>> >>>> Problem: Some flags controlling loop optimizations are currently >>>> 'diagnostic'. Even though these flags are useful >>>> mostly for compiler-related development, their value can be changed >>>> not only in >>>> fastdebug, but also also in release builds, >>>> >>>> Solution: Change the flags to 'develop'. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/ >>>> >>>> Testing: >>>> - locally built/started VM; >>>> - locally executed >>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> From martin.doerr at sap.com Fri Apr 1 12:37:30 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 1 Apr 2016 12:37:30 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Message-ID: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Fri Apr 1 13:55:01 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 1 Apr 2016 15:55:01 +0200 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method Message-ID: <56FE7DB5.404@oracle.com> Hi all, Please review this fix. Summary: There is a mismatch in the CompilerWhiteBox testcases between the callable and the executable constructors. SimpleTestCase$Helper implements all constructors and methods that are tested. However since Helper is an inner class there will be an extra (javac created) constructor that has the parent class as an appended argument. The callable will invoke this constructor, but the executable will reference the normal constructor. Solution: Stop have the Helper as an inner class. Rename it to SimpleTestCaseHelper for some uniqueness in compiler commands and directives. Testing: Run all hotspot/compiler/whitebox tests on all platforms, and all hotspot/compiler tests on one platform. Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ Best regards, Nils Eliasson From aleksey.shipilev at oracle.com Fri Apr 1 14:37:54 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 1 Apr 2016 17:37:54 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <56F5676D.7020401@oracle.com> References: <56F5676D.7020401@oracle.com> Message-ID: <56FE87C2.50002@oracle.com> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: > I would like to solicit comments for C1 support for new > Unsafe.compareAndExchange intrinsics (we have support for them in C2). > The rest of new Unsafe methods that are not intrinsified by C1 are > handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be > emulated with existing APIs. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8152753 > > Webrev: > http://cr.openjdk.java.net/~shade/8152753/webrev.00/ Update: http://cr.openjdk.java.net/~shade/8152753/webrev.01/ Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some other cleanups. Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT hs-comp testset (some unrelated timeouts on SPARC). Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From christian.thalinger at oracle.com Fri Apr 1 16:33:35 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 1 Apr 2016 06:33:35 -1000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: > On Apr 1, 2016, at 2:37 AM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s exception cache. Readers of the cache may read stale data on weak memory platforms. > > The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. > Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Does it make sense to keep: void set_exception_cache(ExceptionCache *ec) { _exception_cache = ec; } or would it be safer to always do the store-release even when clearing the cache? > > Please review. I will also need a sponsor. > > Best regards, > Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Fri Apr 1 16:42:38 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 1 Apr 2016 17:42:38 +0100 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: <56FEA4FE.2010807@redhat.com> On 04/01/2016 01:37 PM, Doerr, Martin wrote: > Therefore, the nmethod's field _exception_cache needs to be volatile > and adding new entries must be done by releasing stores. (Loading > seems to be fine without acquire because there's an address > dependency from the load of the cache to the usage of its contents > which is sufficient to ensure ordering on all openjdk platforms.) I think that's very risky. We can't be really sure what an optimizer might do in this area, as discussed at (very) considerable length in concurrency forums. memory_order_consume does this correctly in C++11 but we're not yet using C++11. I'd use acquire and leave a note that in future this can be replaced by memory_order_consume. Andrew. From igor.veresov at oracle.com Fri Apr 1 18:28:35 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 1 Apr 2016 11:28:35 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime Message-ID: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ Thanks, igor From tom.rodriguez at oracle.com Fri Apr 1 19:47:25 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 1 Apr 2016 12:47:25 -0700 Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should return dependencies_failed Message-ID: http://cr.openjdk.java.net/~never/8153315/webrev This fixes a minor issue which showed up while debugging Java code. evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies. Graal considers it an error to build invalid dependencies so it complained. Tested under the Eclipse debugger. tom From igor.veresov at oracle.com Fri Apr 1 20:18:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 1 Apr 2016 13:18:48 -0700 Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should return dependencies_failed In-Reply-To: References: Message-ID: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com> Looks good. igor > On Apr 1, 2016, at 12:47 PM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8153315/webrev > > This fixes a minor issue which showed up while debugging Java code. evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies. Graal considers it an error to build invalid dependencies so it complained. Tested under the Eclipse debugger. > > tom From michael.c.berg at intel.com Fri Apr 1 21:51:14 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 1 Apr 2016 21:51:14 +0000 Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler Message-ID: Hi All, I would like to contribute some clean up on the x86 assembler applied to vex encoding to address the usage of the nds assembler parameter. For all instructions which use nds source xmm registers, the validity check has been removed. It was originally placed there here: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.1269 And propagated. Now nds register usage is fully compliant with each isa descrption. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001 webrev: http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/ Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sat Apr 2 03:18:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Apr 2016 20:18:28 -0700 Subject: CR for RFR 8151573 In-Reply-To: References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> Message-ID: <56FF3A04.5090601@oracle.com> I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend time >>>> and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>>>> Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to execute >>>>>> up until the point it actually has to take a range check exception >>>>>> by re-ranging the limit of the rce'd loop, then exit the rce'd >>>>>> post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on mask >>>>>> vectors which map to the residual iterations. Programmable SIMD >>>>>> will be a follow on change set utilizing this code to stage its >>>>>> work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From michael.c.berg at intel.com Sat Apr 2 03:25:16 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sat, 2 Apr 2016 03:25:16 +0000 Subject: CR for RFR 8151573 In-Reply-To: <56FF3A04.5090601@oracle.com> References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 01, 2016 8:18 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() >> from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend >>>> time and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>> Of Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to >>>>>> execute up until the point it actually has to take a range check >>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>> mask vectors which map to the residual iterations. Programmable >>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From michael.c.berg at intel.com Sat Apr 2 05:16:01 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sat, 2 Apr 2016 05:16:01 +0000 Subject: CR for RFR 8151573 In-Reply-To: References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: That small revision is reflected in: https://bugs.openjdk.java.net/browse/JDK-8151573 and can be accessed at: http://cr.openjdk.java.net/~mcberg/8151573/webrev.04a/ Regards, Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Friday, April 01, 2016 8:25 PM To: Vladimir Kozlov ; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: CR for RFR 8151573 I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 01, 2016 8:18 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() >> from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend >>>> time and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>> Of Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to >>>>>> execute up until the point it actually has to take a range check >>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>> mask vectors which map to the residual iterations. Programmable >>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From jamsheed.c.m at oracle.com Mon Apr 4 06:14:14 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Mon, 4 Apr 2016 11:44:14 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: <57020636.7010806@oracle.com> Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s exception > cache. Readers of the cache may read stale data on weak memory platforms. > > The writers of the cache are synchronized by locks, but there may be > concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache without locking. > > Therefore, the nmethod's field _exception_cache needs to be volatile > and adding new entries must be done by releasing stores. (Loading > seems to be fine without acquire because there's an address dependency > from the load of the cache to the usage of its contents which is > sufficient to ensure ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to read the > volatile field _state only once. It is certainly undesired to force > the compiler to load it from memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.v.raghavan at oracle.com Mon Apr 4 08:09:08 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Mon, 4 Apr 2016 01:09:08 -0700 (PDT) Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: <56FD74F2.2080102@oracle.com> References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> Message-ID: <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Hi, Please review the revised fix for JDK- 8149488. : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). Points from Michael for the record - " > I believe Dean is right, I have debugged this and analyzed the usage model, > we never made use of the upper components > and register allocation has been right for VecZ for a good deal of time. > > All we need for a change is, > Regmask.cpp: > > uint RegMask::Size() const { > extern uint8_t bitsInByte[256]; > > A one line change. > > -Michael. > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > where we make use of VecZ and the upper bank of registers." So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. Confirmed no issues with 'JPRT -testset hotspot' run. Thanks, Rahul > -----Original Message----- > From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > > Michael, isn't the correct size for this table 256? I missed how VecZ > relates to the table size. > > dl > > On 3/31/2016 9:58 AM, Berg, Michael C wrote: > > Up until now we have gotten along with the size constraint only. > > Let us have both the size and the table though for completeness. > > I think we can leave the name though. > > > > -Michael > > > > -----Original Message----- > > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > > Sent: Thursday, March 31, 2016 9:18 AM > > To: Dean Long ; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > > > > Hi Michael, > > > > With respect to below thread, request help with some questions. > > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size. > > Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on > targets that support it. > > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > > > > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > > (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in > RegMask::Size()) > > > > ----- src/share/vm/libadt/vectset.hpp > > +#define BITS_IN_BYTE_ARRAY_SIZE 256 > > + > > > > ----- src/share/vm/opto/regmask.cpp > > - extern uint8_t bitsInByte[512]; > > + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > > > > ----- src/share/vm/libadt/vectset.cpp > > -uint8_t bitsInByte[256] = { > > +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > > > > I can send revised webrev for above if all okay. Please tell me if I am missing something. > > > > > > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > > > > Thanks, > > Rahul > > > >> -----Original Message----- > >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > >> > >>> -----Original Message----- > >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > >>> > >>> When do we access elements 256 .. 511? Wouldn't that mean we have > >>> 9-bit bytes? > >> Got your point Dean, Thanks. > >> I too got some questions here now; will check and reply soon. > >> > >> -Rahul > >> > >>> dl > >>> > >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > >>>> Hi, > >>>> > >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > >>>> > >>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>> : > >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > >>>> > >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > >>>> Confirmed no issues with 'JPRT -testset hotspot' run. > >>>> > >>>> Thanks, > >>>> Rahul > >>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > >>> dev at openjdk.java.net > >>>>> Should we not extend: > >>>>> > >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > >>>>> uint8_t bitsInByte[256] = { // ... > >>>>> > >>>>> to 512 > >>>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > >>>>> > >>>>> So how do we intend to map a VecZ register without 512 bits? > >>>>> > >>>>> -Michael > >>>>> > >>>>> -----Original Message----- > >>>>> From: hotspot-compiler-dev > >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > >>>>> Of Vladimir Ivanov > >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > >>>>> hotspot-compiler-dev at openjdk.java.net > >>>>> > >>>>> Rahul, > >>>>> > >>>>> Can we define a constant instead and use it in both places? > >>>>> > >>>>> Best regards, > >>>>> Vladimir Ivanov > >>>>> > >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Please review the patch for JDK- 8149488. > >>>>>> > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > >>>>>> > >>>>>> Corrected the bitsInByte array size in declaration. > >>>>>> > >>>>>> Thanks, > >>>>>> Rahul > >>>>>> > From zoltan.majo at oracle.com Mon Apr 4 10:49:40 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 4 Apr 2016 12:49:40 +0200 Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop optimizations to 'develop' In-Reply-To: <56FE6A41.6030703@oracle.com> References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com> <56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com> <56FE6A41.6030703@oracle.com> Message-ID: <570246C4.7050504@oracle.com> Thank you, Vladimir and Chris, for the reviews! For the record: I'll push the latest webrev (webrev.03) today. Best regards, Zoltan On 04/01/2016 02:32 PM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! > > On 03/31/2016 05:57 PM, Vladimir Kozlov wrote: >> It is nice to have not product flags which is easy to remove :) > > Yes, indeed. :-) > >> >> Clean up looks good. > > Thank you. > >> >> Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions >> -XX:-LoopLimitCheck" only? It has interesting code shape. Add comment >> that it was ran with "-XX:+UnlockDiagnosticVMOptions >> -XX:-LoopLimitCheck" to trigger problem. > > Yes, of course. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/ > > Thank you! > > Best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 3/31/16 1:15 AM, Zolt?n Maj? wrote: >>> Hi Vladimir, >>> >>> >>> thank you for your feedback! >>> >>> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote: >>>> These flags were added when I fixed long standing C2 problem with >>>> counted loops: 5091921. >>>> They were added to have ability to revert back to original code if >>>> new code cause a problem. >>>> Looks like the old code which executed with these flags switched >>>> off become rotten. >>>> >>>> Zoltan, did you find what cause the crash? Looks like product VM >>>> was used in the bug report. What result gives >>>> fastdebug VM? >>> >>> I've tried starting different VM versions with the flag(s) off. The >>> most frequent error I get is >>> >>> # Internal Error >>> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), >>> pid=32727, tid=32746 >>> # assert(false) failed: Bad graph detected in build_loop_late >>> >>> So it seems that the code executed with the flags off has indeed >>> become rotten. >>> >>>> Converting flags to develop will not prevent problems happening >>>> with fastdebug VM where these flags could be switched >>>> off even when they are develop. >>>> >>>> If the problem with original code (flags are off) is something >>>> fundamental we may simple remove old code and remove >>>> these flags and have only new code. 5 years already passed since >>>> 5091921 was fixed. >>> >>> Yes, I agree. I think it's reasonable to remove the old code. >>> >>> Here is the new webrev: >>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/ >>> >>> The changes pass JPRT. >>> >>> I've changed the title of the bug to "Cleanup: Remove some unused >>> flags/code in loop optimizations" to better reflect >>> what the change is doing. I have kept the original title in the RFR. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote: >>>>> Hi, >>>>> >>>>> >>>>> please review the patch for 8072422. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8072422 >>>>> >>>>> Problem: Some flags controlling loop optimizations are currently >>>>> 'diagnostic'. Even though these flags are useful >>>>> mostly for compiler-related development, their value can be >>>>> changed not only in >>>>> fastdebug, but also also in release builds, >>>>> >>>>> Solution: Change the flags to 'develop'. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/ >>>>> >>>>> Testing: >>>>> - locally built/started VM; >>>>> - locally executed >>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java. >>>>> >>>>> Thank you and best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>> > From zoltan.majo at oracle.com Mon Apr 4 10:50:39 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 4 Apr 2016 12:50:39 +0200 Subject: [9] RFR (XS): 8072422: Change a number of flags controlling loop optimizations to 'develop' In-Reply-To: <570246C4.7050504@oracle.com> References: <56F29981.20706@oracle.com> <56F31F93.3050101@oracle.com> <56FCDCB4.2050704@oracle.com> <56FD4906.4040909@oracle.com> <56FE6A41.6030703@oracle.com> <570246C4.7050504@oracle.com> Message-ID: <570246FF.6030206@oracle.com> P.S.: Typo in my previous mail: I meant webrev.02 (and not webrev.03). Sorry. On 04/04/2016 12:49 PM, Zolt?n Maj? wrote: > Thank you, Vladimir and Chris, for the reviews! For the record: I'll > push the latest webrev (webrev.03) today. > > Best regards, > > > Zoltan > > On 04/01/2016 02:32 PM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! >> >> On 03/31/2016 05:57 PM, Vladimir Kozlov wrote: >>> It is nice to have not product flags which is easy to remove :) >> >> Yes, indeed. :-) >> >>> >>> Clean up looks good. >> >> Thank you. >> >>> >>> Can you leave test but remove "-XX:+UnlockDiagnosticVMOptions >>> -XX:-LoopLimitCheck" only? It has interesting code shape. Add >>> comment that it was ran with "-XX:+UnlockDiagnosticVMOptions >>> -XX:-LoopLimitCheck" to trigger problem. >> >> Yes, of course. Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8072422/webrev.02/ >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 3/31/16 1:15 AM, Zolt?n Maj? wrote: >>>> Hi Vladimir, >>>> >>>> >>>> thank you for your feedback! >>>> >>>> On 03/23/2016 11:58 PM, Vladimir Kozlov wrote: >>>>> These flags were added when I fixed long standing C2 problem with >>>>> counted loops: 5091921. >>>>> They were added to have ability to revert back to original code if >>>>> new code cause a problem. >>>>> Looks like the old code which executed with these flags switched >>>>> off become rotten. >>>>> >>>>> Zoltan, did you find what cause the crash? Looks like product VM >>>>> was used in the bug report. What result gives >>>>> fastdebug VM? >>>> >>>> I've tried starting different VM versions with the flag(s) off. The >>>> most frequent error I get is >>>> >>>> # Internal Error >>>> (/home/zmajo/Documents/repos/8072422/hotspot/src/share/vm/opto/loopnode.cpp:3615), >>>> pid=32727, tid=32746 >>>> # assert(false) failed: Bad graph detected in build_loop_late >>>> >>>> So it seems that the code executed with the flags off has indeed >>>> become rotten. >>>> >>>>> Converting flags to develop will not prevent problems happening >>>>> with fastdebug VM where these flags could be switched >>>>> off even when they are develop. >>>>> >>>>> If the problem with original code (flags are off) is something >>>>> fundamental we may simple remove old code and remove >>>>> these flags and have only new code. 5 years already passed since >>>>> 5091921 was fixed. >>>> >>>> Yes, I agree. I think it's reasonable to remove the old code. >>>> >>>> Here is the new webrev: >>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.01/ >>>> >>>> The changes pass JPRT. >>>> >>>> I've changed the title of the bug to "Cleanup: Remove some unused >>>> flags/code in loop optimizations" to better reflect >>>> what the change is doing. I have kept the original title in the RFR. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/23/16 6:26 AM, Zolt?n Maj? wrote: >>>>>> Hi, >>>>>> >>>>>> >>>>>> please review the patch for 8072422. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8072422 >>>>>> >>>>>> Problem: Some flags controlling loop optimizations are currently >>>>>> 'diagnostic'. Even though these flags are useful >>>>>> mostly for compiler-related development, their value can be >>>>>> changed not only in >>>>>> fastdebug, but also also in release builds, >>>>>> >>>>>> Solution: Change the flags to 'develop'. >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~zmajo/8072422/webrev.00/ >>>>>> >>>>>> Testing: >>>>>> - locally built/started VM; >>>>>> - locally executed >>>>>> runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java. >>>>>> >>>>>> Thank you and best regards, >>>>>> >>>>>> >>>>>> Zoltan >>>>>> >>>> >> > From martin.doerr at sap.com Mon Apr 4 12:27:49 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Apr 2016 12:27:49 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> Message-ID: Hi Christian, thanks for taking a look. I had checked the other places which use set_exception_cache. They either set it to NULL or to an unmodified pre-existing object (which gets released after creation by a cumulative memory barrier after my change). Both should be ok. I have seen many places in hotspot where we have a set_... function and a release_set_... one. So I thought this was kind of common practice. But I don?t have a strong opinion on it. Best regards, Martin From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Freitag, 1. April 2016 18:34 To: Doerr, Martin Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On Apr 1, 2016, at 2:37 AM, Doerr, Martin > wrote: Hello everyone, we have found a concurrency problem with the nmethod?s exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Does it make sense to keep: void set_exception_cache(ExceptionCache *ec) { _exception_cache = ec; } or would it be safer to always do the store-release even when clearing the cache? Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Mon Apr 4 12:57:31 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 4 Apr 2016 12:57:31 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <56FEA4FE.2010807@redhat.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <56FEA4FE.2010807@redhat.com> Message-ID: <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap> Hi Andrew, there are many places in hotspot where we rely on ordering by address dependency. That sounds feasible to me since we're not supporting Alpha processors. The load of the ExceptionCache pointer is only used to access elements of the cache, not to establish ordering of other accesses. I don't think compilers are allowed to break anything here because I have made the field volatile. This prevents optimizers from reordering, optimizing out or duplicating some of the loads. All supported processors respect the ordering (due to address dependency), too, so I believe we're ok. I'm not sure if this is what you're concerned about. Did I miss anything? Best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Freitag, 1. April 2016 18:43 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On 04/01/2016 01:37 PM, Doerr, Martin wrote: > Therefore, the nmethod's field _exception_cache needs to be volatile > and adding new entries must be done by releasing stores. (Loading > seems to be fine without acquire because there's an address > dependency from the load of the cache to the usage of its contents > which is sufficient to ensure ordering on all openjdk platforms.) I think that's very risky. We can't be really sure what an optimizer might do in this area, as discussed at (very) considerable length in concurrency forums. memory_order_consume does this correctly in C++11 but we're not yet using C++11. I'd use acquire and leave a note that in future this can be replaced by memory_order_consume. Andrew. From aleksey.shipilev at oracle.com Mon Apr 4 13:02:09 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 4 Apr 2016 16:02:09 +0300 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign In-Reply-To: <56FE5D00.7000209@oracle.com> References: <56FE5D00.7000209@oracle.com> Message-ID: <570265D1.6040905@oracle.com> On 04/01/2016 02:35 PM, Aleksey Shipilev wrote: > Hi, > > compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify > String Concat strategies, because some of them are loading new methods > and use them during String concat linkage and execution. Notably, this > will happen inside of the asserts. We need to prime the asserts before > using them in-between counter polls. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8153265 > > Webrev: > http://cr.openjdk.java.net/~shade/8153265/webrev.00/ > > Testing: offending test in oob/-Xcomp modes Anyone? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From ivan at azulsystems.com Mon Apr 4 13:12:20 2016 From: ivan at azulsystems.com (Ivan Krylov) Date: Mon, 4 Apr 2016 16:12:20 +0300 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign In-Reply-To: <570265D1.6040905@oracle.com> References: <56FE5D00.7000209@oracle.com> <570265D1.6040905@oracle.com> Message-ID: <57026834.6090907@azulsystems.com> Looks right, but I am not a reviewer. Ivan On 04/04/2016 16:02, Aleksey Shipilev wrote: > On 04/01/2016 02:35 PM, Aleksey Shipilev wrote: >> Hi, >> >> compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify >> String Concat strategies, because some of them are loading new methods >> and use them during String concat linkage and execution. Notably, this >> will happen inside of the asserts. We need to prime the asserts before >> using them in-between counter polls. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8153265 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8153265/webrev.00/ >> >> Testing: offending test in oob/-Xcomp modes > Anyone? > > Thanks, > -Aleksey > > From aph at redhat.com Mon Apr 4 14:01:32 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Apr 2016 15:01:32 +0100 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <56FEA4FE.2010807@redhat.com> <2b46fe6dee9a454694ec220ff2dfcb77@DEWDFE13DE14.global.corp.sap> Message-ID: <570273BC.4090605@redhat.com> On 04/04/2016 01:57 PM, Doerr, Martin wrote: > there are many places in hotspot where we rely on ordering by > address dependency. That sounds feasible to me since we're not > supporting Alpha processors. The load of the ExceptionCache pointer > is only used to access elements of the cache, not to establish > ordering of other accesses. > > I don't think compilers are allowed to break anything here because I > have made the field volatile. This prevents optimizers from > reordering, optimizing out or duplicating some of the loads. All > supported processors respect the ordering (due to address > dependency), too, so I believe we're ok. That sounds alright, at least from an informal reasoning perspective. > I'm not sure if this is what you're concerned about. Did I miss > anything? I don't think so. I presume you've read Hans Boehm's paper where he points out that it's very hard to rely on dependencies for memory ordering in any high-level language [1]. For that reason I tend to err on the side of caution when reasoning about memory. You're sailing a bit too close to the rocks for my comfort. :-) Andrew. [1] http://www.hboehm.info/c++mm/dependencies.html From dean.long at oracle.com Mon Apr 4 18:34:45 2016 From: dean.long at oracle.com (Dean Long) Date: Mon, 4 Apr 2016 11:34:45 -0700 Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Message-ID: <5702B3C5.8070507@oracle.com> Looks OK. dl On 4/4/2016 1:09 AM, Rahul Raghavan wrote: > Hi, > > Please review the revised fix for JDK- 8149488. > > : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ > > Based on further checking and thanks to clarifications from Michael, > it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', > (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). > > Points from Michael for the record - " > > I believe Dean is right, I have debugged this and analyzed the usage model, > > we never made use of the upper components > > and register allocation has been right for VecZ for a good deal of time. > > > > All we need for a change is, > > Regmask.cpp: > > > > uint RegMask::Size() const { > > extern uint8_t bitsInByte[256]; > > > > A one line change. > > > > -Michael. > > > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > > where we make use of VecZ and the upper bank of registers." > > So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. > > Confirmed no issues with 'JPRT -testset hotspot' run. > > Thanks, > Rahul > >> -----Original Message----- >> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM >> >> Michael, isn't the correct size for this table 256? I missed how VecZ >> relates to the table size. >> >> dl >> >> On 3/31/2016 9:58 AM, Berg, Michael C wrote: >>> Up until now we have gotten along with the size constraint only. >>> Let us have both the size and the table though for completeness. >>> I think we can leave the name though. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] >>> Sent: Thursday, March 31, 2016 9:18 AM >>> To: Dean Long ; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C >>> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp >>> >>> Hi Michael, >>> >>> With respect to below thread, request help with some questions. >>> Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size. >>> Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on >> targets that support it. >>> But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. >>> Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? >>> >>> So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? >>> (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in >> RegMask::Size()) >>> ----- src/share/vm/libadt/vectset.hpp >>> +#define BITS_IN_BYTE_ARRAY_SIZE 256 >>> + >>> >>> ----- src/share/vm/opto/regmask.cpp >>> - extern uint8_t bitsInByte[512]; >>> + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; >>> >>> ----- src/share/vm/libadt/vectset.cpp >>> -uint8_t bitsInByte[256] = { >>> +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { >>> >>> I can send revised webrev for above if all okay. Please tell me if I am missing something. >>> >>> >>> OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? >>> (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') >>> >>> Thanks, >>> Rahul >>> >>>> -----Original Message----- >>>> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM >>>> >>>>> -----Original Message----- >>>>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM >>>>> >>>>> When do we access elements 256 .. 511? Wouldn't that mean we have >>>>> 9-bit bytes? >>>> Got your point Dean, Thanks. >>>> I too got some questions here now; will check and reply soon. >>>> >>>> -Rahul >>>> >>>>> dl >>>>> >>>>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: >>>>>> Hi, >>>>>> >>>>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. >>>>>> >>>>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 >>>>>> : >>>>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ >>>>>> >>>>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. >>>>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. >>>>>> Confirmed no issues with 'JPRT -testset hotspot' run. >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: >>>>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- >>>>> dev at openjdk.java.net >>>>>>> Should we not extend: >>>>>>> >>>>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: >>>>>>> uint8_t bitsInByte[256] = { // ... >>>>>>> >>>>>>> to 512 >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' >>>>>>> >>>>>>> So how do we intend to map a VecZ register without 512 bits? >>>>>>> >>>>>>> -Michael >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: hotspot-compiler-dev >>>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>>>> Of Vladimir Ivanov >>>>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; >>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>> >>>>>>> Rahul, >>>>>>> >>>>>>> Can we define a constant instead and use it in both places? >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please review the patch for JDK- 8149488. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 >>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ >>>>>>>> >>>>>>>> Corrected the bitsInByte array size in declaration. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rahul >>>>>>>> From michael.c.berg at intel.com Mon Apr 4 20:05:05 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Mon, 4 Apr 2016 20:05:05 +0000 Subject: FW: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Message-ID: FYI -----Original Message----- From: Berg, Michael C Sent: Monday, April 04, 2016 12:42 PM To: 'Rahul Raghavan' Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp Looks ok Rahul. Thanks, Michael -----Original Message----- From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] Sent: Monday, April 04, 2016 1:09 AM To: hotspot-compiler-dev at openjdk.java.net Cc: Dean Long ; Berg, Michael C ; Tobias Hartmann ; Vladimir Ivanov Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp Hi, Please review the revised fix for JDK- 8149488. : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). Points from Michael for the record - " > I believe Dean is right, I have debugged this and analyzed the usage model, > we never made use of the upper components > and register allocation has been right for VecZ for a good deal of time. > > All we need for a change is, > Regmask.cpp: > > uint RegMask::Size() const { > extern uint8_t bitsInByte[256]; > > A one line change. > > -Michael. > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > where we make use of VecZ and the upper bank of registers." So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. Confirmed no issues with 'JPRT -testset hotspot' run. Thanks, Rahul > -----Original Message----- > From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > > Michael, isn't the correct size for this table 256? I missed how VecZ > relates to the table size. > > dl > > On 3/31/2016 9:58 AM, Berg, Michael C wrote: > > Up until now we have gotten along with the size constraint only. > > Let us have both the size and the table though for completeness. > > I think we can leave the name though. > > > > -Michael > > > > -----Original Message----- > > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > > Sent: Thursday, March 31, 2016 9:18 AM > > To: Dean Long ; > > hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > > > > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in > > regmask.cpp > > > > Hi Michael, > > > > With respect to below thread, request help with some questions. > > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet Size. > > Also comment got was for requirement to extend bitsInByte table to > > 512 size, for consistent mapping for VecZ register also, on > targets that support it. > > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > > > > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > > (without extending current bitsInByte array contents) (Anyhow at > > present values above 0xFF is never indexed for bitsInByte in > RegMask::Size()) > > > > ----- src/share/vm/libadt/vectset.hpp > > +#define BITS_IN_BYTE_ARRAY_SIZE 256 > > + > > > > ----- src/share/vm/opto/regmask.cpp > > - extern uint8_t bitsInByte[512]; > > + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > > > > ----- src/share/vm/libadt/vectset.cpp > > -uint8_t bitsInByte[256] = { > > +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > > > > I can send revised webrev for above if all okay. Please tell me if I am missing something. > > > > > > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > > > > Thanks, > > Rahul > > > >> -----Original Message----- > >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > >> > >>> -----Original Message----- > >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > >>> > >>> When do we access elements 256 .. 511? Wouldn't that mean we have > >>> 9-bit bytes? > >> Got your point Dean, Thanks. > >> I too got some questions here now; will check and reply soon. > >> > >> -Rahul > >> > >>> dl > >>> > >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > >>>> Hi, > >>>> > >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > >>>> > >>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>> : > >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > >>>> > >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > >>>> Confirmed no issues with 'JPRT -testset hotspot' run. > >>>> > >>>> Thanks, > >>>> Rahul > >>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > >>> dev at openjdk.java.net > >>>>> Should we not extend: > >>>>> > >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > >>>>> uint8_t bitsInByte[256] = { // ... > >>>>> > >>>>> to 512 > >>>>> > >>>>> -----Original Message----- > >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > >>>>> > >>>>> So how do we intend to map a VecZ register without 512 bits? > >>>>> > >>>>> -Michael > >>>>> > >>>>> -----Original Message----- > >>>>> From: hotspot-compiler-dev > >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > >>>>> Of Vladimir Ivanov > >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > >>>>> hotspot-compiler-dev at openjdk.java.net > >>>>> > >>>>> Rahul, > >>>>> > >>>>> Can we define a constant instead and use it in both places? > >>>>> > >>>>> Best regards, > >>>>> Vladimir Ivanov > >>>>> > >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Please review the patch for JDK- 8149488. > >>>>>> > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>> Webrev: > >>>>>> http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > >>>>>> > >>>>>> Corrected the bitsInByte array size in declaration. > >>>>>> > >>>>>> Thanks, > >>>>>> Rahul > >>>>>> > From doug.simon at oracle.com Mon Apr 4 21:30:02 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 4 Apr 2016 23:30:02 +0200 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod Message-ID: The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. https://bugs.openjdk.java.net/browse/JDK-8153439 http://cr.openjdk.java.net/~dnsimon/8153439 From igor.veresov at oracle.com Mon Apr 4 21:46:18 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 4 Apr 2016 14:46:18 -0700 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: Message-ID: <3542A78F-57C3-48D4-ADAB-923760F33EE7@oracle.com> Looks good. igor > On Apr 4, 2016, at 2:30 PM, Doug Simon wrote: > > The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. > > https://bugs.openjdk.java.net/browse/JDK-8153439 > http://cr.openjdk.java.net/~dnsimon/8153439 From christian.thalinger at oracle.com Mon Apr 4 21:50:40 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 11:50:40 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: Message-ID: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> Thanks for the quick turnaround. Looks good. > On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: > > The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. > > https://bugs.openjdk.java.net/browse/JDK-8153439 > http://cr.openjdk.java.net/~dnsimon/8153439 From christian.thalinger at oracle.com Mon Apr 4 22:34:20 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 12:34:20 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> Message-ID: <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> No, not good. We are failing a couple JVMCI tests. Looking into it? > On Apr 4, 2016, at 11:50 AM, Christian Thalinger wrote: > > Thanks for the quick turnaround. Looks good. > >> On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: >> >> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. >> >> https://bugs.openjdk.java.net/browse/JDK-8153439 >> http://cr.openjdk.java.net/~dnsimon/8153439 > From vladimir.kozlov at oracle.com Mon Apr 4 22:57:15 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 15:57:15 -0700 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <56FE7DB5.404@oracle.com> References: <56FE7DB5.404@oracle.com> Message-ID: <5702F14B.1020403@oracle.com> 2 tests have -XX:+PrintCompilation flag added. Why you need it? Thanks, Vladimir On 4/1/16 6:55 AM, Nils Eliasson wrote: > Hi all, > > Please review this fix. > > Summary: > There is a mismatch in the CompilerWhiteBox testcases between the > callable and the executable constructors. SimpleTestCase$Helper > implements all constructors and methods that are tested. However since > Helper is an inner class there will be an extra (javac created) > constructor that has the parent class as an appended argument. The > callable will invoke this constructor, but the executable will reference > the normal constructor. > > Solution: > Stop have the Helper as an inner class. Rename it to > SimpleTestCaseHelper for some uniqueness in compiler commands and > directives. > > Testing: > Run all hotspot/compiler/whitebox tests on all platforms, and all > hotspot/compiler tests on one platform. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 > Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ > > Best regards, > Nils Eliasson From vladimir.kozlov at oracle.com Mon Apr 4 23:12:58 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 16:12:58 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: <5702F4FA.3090305@oracle.com> Looks good to me. Thanks, Vladimir On 4/1/16 11:28 AM, Igor Veresov wrote: > When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 > Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ > > Thanks, > igor > From igor.veresov at oracle.com Mon Apr 4 23:17:08 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 4 Apr 2016 16:17:08 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5702F4FA.3090305@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5702F4FA.3090305@oracle.com> Message-ID: <77FFC56E-2B29-4D0F-8EB6-C181DFFD895D@oracle.com> Thanks, Vladimir! Can I please get another review from the runtime team? igor > On Apr 4, 2016, at 4:12 PM, Vladimir Kozlov wrote: > > Looks good to me. > > Thanks, > Vladimir > > On 4/1/16 11:28 AM, Igor Veresov wrote: >> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >> >> Thanks, >> igor >> From vladimir.kozlov at oracle.com Mon Apr 4 23:25:02 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 16:25:02 -0700 Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler In-Reply-To: References: Message-ID: <5702F7CE.3040600@oracle.com> Bug number in links is incorrect. Should be: https://bugs.openjdk.java.net/browse/JDK-8151003 http://cr.openjdk.java.net/~mcberg/8151003/webrev.02/ Changes looks good. Very nice clean up. I will start testing. I see you changed code for AVX > 2 in macroAssembler_x86.hpp. Is it because new instructions faster or to avoid mixing evex and non-evex instructions? Thanks, Vladimir On 4/1/16 2:51 PM, Berg, Michael C wrote: > Hi All, > > I would like to contribute some clean up on the x86 assembler applied to > vex encoding to address the usage of the nds assembler parameter. > > For all instructions which use nds source xmm registers, the validity > check has been removed. It was originally placed there here: > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.1269 > > And propagated. Now nds register usage is fully compliant with each isa > descrption. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001 > webrev: > > http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/ > > Thanks, > > Michael > From michael.c.berg at intel.com Mon Apr 4 23:30:04 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Mon, 4 Apr 2016 23:30:04 +0000 Subject: RFR(M) 8151003 remove nds validity checks from vex x86 assembler In-Reply-To: <5702F7CE.3040600@oracle.com> References: <5702F7CE.3040600@oracle.com> Message-ID: Before we were aliasing, which lent some ambiguity regarding AVX2 and EVEX usage, as the aliased forms had more programming via the imm field. This way they are fully separate. Thanks, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, April 04, 2016 4:25 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M) 8151003 remove nds validity checks from vex x86 assembler Bug number in links is incorrect. Should be: https://bugs.openjdk.java.net/browse/JDK-8151003 http://cr.openjdk.java.net/~mcberg/8151003/webrev.02/ Changes looks good. Very nice clean up. I will start testing. I see you changed code for AVX > 2 in macroAssembler_x86.hpp. Is it because new instructions faster or to avoid mixing evex and non-evex instructions? Thanks, Vladimir On 4/1/16 2:51 PM, Berg, Michael C wrote: > Hi All, > > I would like to contribute some clean up on the x86 assembler applied > to vex encoding to address the usage of the nds assembler parameter. > > For all instructions which use nds source xmm registers, the validity > check has been removed. It was originally placed there here: > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/006050192a5a#l1.12 > 69 > > And propagated. Now nds register usage is fully compliant with each > isa descrption. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151001 > webrev: > > http://cr.openjdk.java.net/~mcberg/8151001/webrev.02/ > > Thanks, > > Michael > From michael.c.berg at intel.com Mon Apr 4 23:32:25 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Mon, 4 Apr 2016 23:32:25 +0000 Subject: CR for RFR 8151573 In-Reply-To: <56FF3A04.5090601@oracle.com> References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: Vladimir, did you restart the integration testing after the small change I sent? Regards, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, April 01, 2016 8:18 PM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8151573 I start preintegration testing. Thanks, Vladimir On 3/31/16 8:36 PM, Berg, Michael C wrote: > Vladimir, I think I have addressed every concern in the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ > > I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. > The code is fully retested with no issues. > > Thanks, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, March 30, 2016 9:56 PM > To: Berg, Michael C ; > 'hotspot-compiler-dev at openjdk.java.net' > > Subject: Re: CR for RFR 8151573 > > On 3/30/16 4:57 PM, Berg, Michael C wrote: >> See below for context. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 3:51 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> Michael, >> >> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >> >> multi_version_post_loops() can use is_canonical_main_loop_entry() >> from >> 8148754 but you need to modify it to move >> is_Main() assert to other call sites. >> >> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. > > I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? > >> >> I did not get rce'd post loop checks in loopnode.cpp. >> >> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. > > I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. > > You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new > struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. > > Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? > There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). > Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? > >> >> Swap next checks since has_range_checks() may be expensive scanning loop body: >> + // only process RCE'd main loops >> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >> >> Ok, makes sense. > > Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. > But, please, rename has_range_checks(cl) method to avoid confusion. > > Thanks, > Vladimir > >> >> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >> >> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >> >> >> Why you need local copies?: >> >> - visited.Clear(); >> - clones.clear(); >> + Arena *a = Thread::current()->resource_area(); >> + VectorSet visited(a); >> + Node_Stack clones(a, main_head->back_control()->outcnt()); >> >> I will look into this, and see if it can be cleaned up. >> >> >> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >> >> Ok, I will look into a version without PostLoopInfo. >> >> Thanks, >> Vladimir >> >> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>> Here is an update after full testing, the webrev is: >>> >>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>> >>> Please review and comment, >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Berg, Michael C >>> Sent: Wednesday, March 16, 2016 10:30 AM >>> To: Vladimir Kozlov ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: RE: CR for RFR 8151573 >>> >>> Putting a hold on the review, retesting everything on my end. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 16, 2016 8:42 AM >>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: CR for RFR 8151573 >>> >>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>> Vladimir: >>>> >>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>> >>> I understand that we can get some benefits. But in general case they will not be visible. >>> >>>> >>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>> >>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>> >>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>> >>> Regards, >>> Vladimir >>> >>>> >>>> Regards, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> As we all know we can always construct microbenchmarks which shows >>>> 30% >>>> - 50% difference. When in real application we will never see >>>> difference. I still don't see a real reason why we should spend >>>> time and optimize >>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>> >>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>> Correction below... >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-compiler-dev >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>> Of Berg, Michael C >>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: RE: CR for RFR 8151573 >>>>> >>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>> >>>>> for(int i = 0; i < process_len; i++) >>>>> { >>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>> } >>>>> >>>>> The above code makes 9 vector ops. >>>>> >>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>> >>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>> >>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>> >>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> Hi Michael, >>>>> >>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute multi-versioning post loops for range >>>>>> check elimination. Beforehand cfg optimizations after register >>>>>> allocation were where post loop optimizations were done for range >>>>>> checks. I have added code which produces the desired effect much >>>>>> earlier by introducing a safe transformation which will minimally >>>>>> allow a range check free version of the final post loop to >>>>>> execute up until the point it actually has to take a range check >>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>> If during optimization we discover that we know enough to remove >>>>>> the range check version of the post loop, mostly by exposing the >>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>> will eliminate the range check post loop altogether much like cfg >>>>>> optimizations did, but much earlier. This gives optimizations >>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>> mask vectors which map to the residual iterations. Programmable >>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>> >>>>>> This code was tested as follows: >>>>>> >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>> >>>>>> >>>>>> webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Michael >>>>>> From christian.thalinger at oracle.com Tue Apr 5 00:00:05 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 14:00:05 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: > On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: > > No, not good. We are failing a couple JVMCI tests. Looking into it? Ok, this got a little out of control but for the better: http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. > >> On Apr 4, 2016, at 11:50 AM, Christian Thalinger wrote: >> >> Thanks for the quick turnaround. Looks good. >> >>> On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: >>> >>> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153439 >>> http://cr.openjdk.java.net/~dnsimon/8153439 >> > From igor.veresov at oracle.com Tue Apr 5 00:10:41 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 4 Apr 2016 17:10:41 -0700 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: Seems alright. igor > On Apr 4, 2016, at 5:00 PM, Christian Thalinger wrote: > > >> On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: >> >> No, not good. We are failing a couple JVMCI tests. Looking into it? > > Ok, this got a little out of control but for the better: > > http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ > > The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. > > While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. > > Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. > >> >>> On Apr 4, 2016, at 11:50 AM, Christian Thalinger wrote: >>> >>> Thanks for the quick turnaround. Looks good. >>> >>>> On Apr 4, 2016, at 11:30 AM, Doug Simon wrote: >>>> >>>> The JVMCI speculation log mechanism might not be used by a specific JVMCI compiler or JVMCI compiler configuration. In this case, the nmethod::_speculation_log field should be initialized to NULL to reduce unnecessary processing of the field by the GC. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153439 >>>> http://cr.openjdk.java.net/~dnsimon/8153439 >>> >> > From christian.thalinger at oracle.com Tue Apr 5 00:24:12 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Apr 2016 14:24:12 -1000 Subject: RFR (XS) 8153265: compiler/whitebox/ForceNMethodSweepTest should not assume asserts are benign In-Reply-To: <570265D1.6040905@oracle.com> References: <56FE5D00.7000209@oracle.com> <570265D1.6040905@oracle.com> Message-ID: <46818653-AC18-4C9F-BBEB-B1F98CC39FBA@oracle.com> Should be alright. > On Apr 4, 2016, at 3:02 AM, Aleksey Shipilev wrote: > > On 04/01/2016 02:35 PM, Aleksey Shipilev wrote: >> Hi, >> >> compiler/whitebox/ForceNMethodSweepTest would fail if you juggle Indify >> String Concat strategies, because some of them are loading new methods >> and use them during String concat linkage and execution. Notably, this >> will happen inside of the asserts. We need to prime the asserts before >> using them in-between counter polls. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8153265 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8153265/webrev.00/ >> >> Testing: offending test in oob/-Xcomp modes > > Anyone? > > Thanks, > -Aleksey -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 5 01:29:54 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Apr 2016 18:29:54 -0700 Subject: CR for RFR 8151573 In-Reply-To: References: <56E881A9.7070004@oracle.com> <56E89CA4.8010201@oracle.com> <56E97EC5.6030608@oracle.com> <56FC5852.2030101@oracle.com> <56FCADE3.20403@oracle.com> <56FF3A04.5090601@oracle.com> Message-ID: <57031512.2060607@oracle.com> Changes looks good. I resubmit testing with new changes (4a). Thanks, Vladimir On 4/1/16 10:16 PM, Berg, Michael C wrote: > That small revision is reflected in: > > https://bugs.openjdk.java.net/browse/JDK-8151573 > > and can be accessed at: > > http://cr.openjdk.java.net/~mcberg/8151573/webrev.04a/ > > Regards, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C > Sent: Friday, April 01, 2016 8:25 PM > To: Vladimir Kozlov ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: RE: CR for RFR 8151573 > > I have to make a two line change, I am testing it on my end. I will pass the webrev directly to you when my tests conclude. > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, April 01, 2016 8:18 PM > To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: CR for RFR 8151573 > > I start preintegration testing. > > Thanks, > Vladimir > > On 3/31/16 8:36 PM, Berg, Michael C wrote: >> Vladimir, I think I have addressed every concern in the latest webrev: >> >> http://cr.openjdk.java.net/~mcberg/8151573/webrev.03/ >> >> I wound up leaving divergent local copies of the clone code as there are two full separate contexts now in three locations. Adding more parameters didn't seem to be a win to get around it. >> The code is fully retested with no issues. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, March 30, 2016 9:56 PM >> To: Berg, Michael C ; >> 'hotspot-compiler-dev at openjdk.java.net' >> >> Subject: Re: CR for RFR 8151573 >> >> On 3/30/16 4:57 PM, Berg, Michael C wrote: >>> See below for context. >>> >>> Thanks, >>> Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, March 30, 2016 3:51 PM >>> To: Berg, Michael C ; >>> 'hotspot-compiler-dev at openjdk.java.net' >>> >>> Subject: Re: CR for RFR 8151573 >>> >>> Michael, >>> >>> First, please update to latest sources of hs-comp. 8148754 changes modified the same files and it is difficult to apply your changes. >>> >>> multi_version_post_loops() can use is_canonical_main_loop_entry() >>> from >>> 8148754 but you need to modify it to move >>> is_Main() assert to other call sites. >>> >>> ok, but should not the name then be is_canonical_loop_entry()? Since it will be for non main loops as well. Also the canonical test is different. I can leave the name, but it will be overloaded afterward with two types of functionality. The grape shape test changes between main and post loops. Perhaps better to leave them separate given they are different? I will leave this one to last so that we have time to discuss this. >> >> I am fine with name change. The only difference I see is that your code checks cur_cmp->in(1) opcode for Opaque1 vs in(2). The question is why you check in(1) - it should be in(2)? >> >>> >>> I did not get rce'd post loop checks in loopnode.cpp. >>> >>> First I will have to explain what I am doing with do_range_check(). That code resolves the question of: Do all the range checks have load ranges, only these types of loops are canonical for multiversioning. >>> Basically that means that if RCE succeeds in removing all checks in main, the post loop may still have non-canonical forms that will not be easily handled by subsuming them into the multiversioned loops limit as a min test of all ranges and the limit. In those cases we really do want to know if we have non-canonical forms, so I thinking leaving the code as is in do_range_check is key to that discovery. Then insert_scalar_rced_post_loop(...) creates a copy of main which successfully passed range check elimination and is canonical. Basically this short circuits us from adding any non canonical multiversions, leaving the code in its original form when we do not pass. >>> The copy of main works with the canonical range checks in the final loop to rework the limit of the newly inserted clean post loop so that we never execute a range check index, leaving it for the final loop to do so or for the optimizer to realize that no such case exists in which case the clean loop becomes the one and only copy. If we cannot multiversion transform the loop we added we eliminate it. >> >> I understand that you want to make clean copy of main loop when it could be vectorized but before it is unrolled. >> >> You should add comment into do_range_check() about what you said (looking only for range checks). Also don't use old_new >> struct but use simple counter and return it as result (or return bool from do_range_check()) to indicate that only range checks were in loop. >> >> Anyway, I was asked about checks in loopnode.cpp. First, there is some expectation about sequence of post loops. But from what I understand you are looking for a post loops which are not created by insert_scalar_rced_post_loop(). Right? >> There is no comment what it is searching for. There are 3 types of post loop now: atomic, rced and regular (from insert_pre_post_loops()). >> Why not mark rced loop (RCEPostLoop?) to simplify search instead of executing has_range_checks(cl)? What about AtomicPostLoop? >> >>> >>> Swap next checks since has_range_checks() may be expensive scanning loop body: >>> + // only process RCE'd main loops >>> + if (cl->has_range_checks() || !cl->is_main_loop()) return; >>> >>> Ok, makes sense. >> >> Sorry, I made mistake due to you using the same name for 2 different methods. One is query cl->has_range_checks() which checks flag. And an other has_range_checks(cl) which search loop for If nodes. In such case you can ignore my check swap request. >> But, please, rename has_range_checks(cl) method to avoid confusion. >> >> Thanks, >> Vladimir >> >>> >>> Also if there are no checks in loop you will scan loop's body again since no_checks state is not recorded. >>> I would suggest to scan body unconditionally because code you added to do_range_check() may not precise (you decrement count only for LoadRange when increment for all Ifs). This way you don't need to change old_new around do_range_check() call. >>> >>> I perceive the real problem is don't scan more than once after we check. I will move towards that solution. >>> >>> >>> Why you need local copies?: >>> >>> - visited.Clear(); >>> - clones.clear(); >>> + Arena *a = Thread::current()->resource_area(); >>> + VectorSet visited(a); >>> + Node_Stack clones(a, main_head->back_control()->outcnt()); >>> >>> I will look into this, and see if it can be cleaned up. >>> >>> >>> I don't think PostLoopInfo is needed. insert_post_loop() can return post_head node and and &main_exit could be passed to it to set. >>> >>> Ok, I will look into a version without PostLoopInfo. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/30/16 1:44 PM, Berg, Michael C wrote: >>>> Here is an update after full testing, the webrev is: >>>> >>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.02/ >>>> >>>> Please review and comment, >>>> >>>> Thanks, >>>> Michael >>>> >>>> -----Original Message----- >>>> From: hotspot-compiler-dev >>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>>> Berg, Michael C >>>> Sent: Wednesday, March 16, 2016 10:30 AM >>>> To: Vladimir Kozlov ; >>>> 'hotspot-compiler-dev at openjdk.java.net' >>>> >>>> Subject: RE: CR for RFR 8151573 >>>> >>>> Putting a hold on the review, retesting everything on my end. >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Wednesday, March 16, 2016 8:42 AM >>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: CR for RFR 8151573 >>>> >>>> On 3/15/16 5:29 PM, Berg, Michael C wrote: >>>>> Vladimir: >>>>> >>>>> The why programmable SIMD depends upon this that all versions of the final post loop have range checks in them until very late, after register allocation, and might be cleaned up in cfg optimizations, but are not always so. With multiversioning, we always remove the range checks in our key loop. >>>> >>>> I understand that we can get some benefits. But in general case they will not be visible. >>>> >>>>> >>>>> With regards to the pre loop, pre loops have special checks too do they not, requiring flow in many cases? >>>>> Programmable SIMD needs tight loops to accurately facilitate masked iteration mapping. >>>> >>>> Yes, after you explained me vector masking I now understand why it could be used for post loop. >>>> >>>> After thinking about this I would suggest for you to look on arraycopy and generate_fill stubs instead in stub_Generator_x86*.cpp (may be only 64-bit). They also have post loops but changes would be only platform specific, smaller and easy to understand and test. Also arraycopy and 'fill' code are used very frequently by Java applications so we may get more benefits than optimizing general loops. >>>> >>>> Regards, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Michael >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, March 15, 2016 4:37 PM >>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: CR for RFR 8151573 >>>>> >>>>> As we all know we can always construct microbenchmarks which shows >>>>> 30% >>>>> - 50% difference. When in real application we will never see >>>>> difference. I still don't see a real reason why we should spend >>>>> time and optimize >>>>> *POST* loops. We already have vectorized post loop to improve performance. Note, additional loop opts code will rise its maintenance cost. >>>>> >>>>> Why "programmable SIMD" depends on it? What about pre-loop? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/15/16 4:14 PM, Berg, Michael C wrote: >>>>>> Correction below... >>>>>> >>>>>> -----Original Message----- >>>>>> From: hotspot-compiler-dev >>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf >>>>>> Of Berg, Michael C >>>>>> Sent: Tuesday, March 15, 2016 4:08 PM >>>>>> To: Vladimir Kozlov; 'hotspot-compiler-dev at openjdk.java.net' >>>>>> Subject: RE: CR for RFR 8151573 >>>>>> >>>>>> Vladimir for programmable SIMD which is the optimization which uses this implementation, I get the following on micros and code in general that look like this: >>>>>> >>>>>> for(int i = 0; i < process_len; i++) >>>>>> { >>>>>> d[i]= (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >>>>>> } >>>>>> >>>>>> The above code makes 9 vector ops. >>>>>> >>>>>> For float with vector length VecZ, I get as much as 1.3x and for int as much as 1.4x uplift. >>>>>> For double and long on VecZ it is smaller, but then so is the value of vectorization on those types anyways. >>>>>> The value process_len is some fraction of the array length in my measurements. The idea of the metrics Is to pose a post loop with a modest amount of iterations in it. For instance N is the max trip of the post loop, and N is 1..VecZ-1 size, then for float we could do as many as 15 iterations in the fixup loop. >>>>>> >>>>>> An example would be array_length = 512, process_len is a range of 81..96, we create a VecZ loop which was superunrolled 4 times with vector length 16, or unroll of 64, we align process 4 iterations, and the vectorized post loop is executed 1 time, leaving the remaining work in the final post loop, in this case possibly a mutilversioned post loop. We start that final loop at iteration 81 so we always do at least 1 iteration fixup, and as many as 15. If we left the fixup loop as a scalar loop that would mean 1 to 15 iterations plus our initial loops which have {4,1,1} iterations as a group or 6 to get us to index 80. By vectorizing the fixup loop to one iteration we now always have 7 iterations in our loops for all ranges of 81..96, without this optimization and programmable SIMD, we would have the initial 6 plush 1 to 15 more, or a range of 7 to 21 iterations. >>>>>> >>>>>> Would you prefer I integrate this with programmable SIMD and submit the patches as one? >>>>>> >>>>>> I thought it would be easier to do them separately. Also, exposing the post loops to this path offloads cfg processing to earlier compilation, making the graph less complex through register allocation. >>>>>> >>>>>> Regards, >>>>>> Michael >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, March 15, 2016 2:42 PM >>>>>> To: Berg, Michael C; 'hotspot-compiler-dev at openjdk.java.net' >>>>>> Subject: Re: CR for RFR 8151573 >>>>>> >>>>>> Hi Michael, >>>>>> >>>>>> Changes are significant so they have to be justified. Especially since we are in later stage of jdk9 development. Do you have performance numbers (not only for microbenchmarhks) which show the benefit of these changes? >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 3/15/16 2:04 PM, Berg, Michael C wrote: >>>>>>> Hi Folks, >>>>>>> >>>>>>> I would like to contribute multi-versioning post loops for range >>>>>>> check elimination. Beforehand cfg optimizations after register >>>>>>> allocation were where post loop optimizations were done for range >>>>>>> checks. I have added code which produces the desired effect much >>>>>>> earlier by introducing a safe transformation which will minimally >>>>>>> allow a range check free version of the final post loop to >>>>>>> execute up until the point it actually has to take a range check >>>>>>> exception by re-ranging the limit of the rce'd loop, then exit >>>>>>> the rce'd post loop and take the range check exception in the legacy loops execution if required. >>>>>>> If during optimization we discover that we know enough to remove >>>>>>> the range check version of the post loop, mostly by exposing the >>>>>>> load range values into the limit logic of the rce'd post loop, we >>>>>>> will eliminate the range check post loop altogether much like cfg >>>>>>> optimizations did, but much earlier. This gives optimizations >>>>>>> like programmable SIMD (via SuperWord) the opportunity to >>>>>>> vectorize the rce'd post loops to a single iteration based on >>>>>>> mask vectors which map to the residual iterations. Programmable >>>>>>> SIMD will be a follow on change set utilizing this code to stage >>>>>>> its work. This optimization also exposes the rce'd post loop without flow to other optimizations. >>>>>>> Currently I have enabled this optimization for x86 only. We base >>>>>>> this loop on successfully rce'd main loops and if for whatever reason, multiversioning fails, we eliminate the loop we added. >>>>>>> >>>>>>> This code was tested as follows: >>>>>>> >>>>>>> >>>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8151573 >>>>>>> >>>>>>> >>>>>>> webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~mcberg/8151573/webrev.01/ >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Michael >>>>>>> From vivek.r.deshpande at intel.com Tue Apr 5 06:25:46 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 5 Apr 2016 06:25:46 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> Hi Christian We have updated the patch as per the suggested changes. The webrev for the same is at this location for your review. http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ We will soon send another patch for CompilerDirectives changes. Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Tuesday, March 29, 2016 11:29 AM To: Rukmannagari, Shravya Cc: Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > wrote: Hi Christian, We would add separate files for each intrinsic. By splitting the CompilerDirectives, do you mean we have to add a separate file. Sorry I didn?t exactly get it. Oh, sorry, I wasn?t clear enough. Please file a new enhancement for the CompilerDirectives changes and integrate them separately. Thanks, Shravya Rukmannagari. From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, March 28, 2016 5:18 PM To: Deshpande, Vivek R > Cc: hotspot compiler >; Vladimir Kozlov >; Rukmannagari, Shravya > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 I left this comment in the bug: I think for the saneness of the macroAssembler_libm_x86_*.cpp files we should put every intrinsic in its own file, like we did for macroAssembler_x86_sha.cpp. They are already too big: $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp Also, can we split out the CompilerDirectives changes? On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > wrote: Hi all We would like to contribute a patch which optimizes tan and log10 X86 architecture using Intel LIBM library. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8152907 webrev: http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Tue Apr 5 07:11:40 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Apr 2016 09:11:40 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks Message-ID: <5703652C.6000000@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8151724 http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. Tested with JPRT and RBT. Thanks, Tobias From zoltan.majo at oracle.com Tue Apr 5 07:15:43 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 5 Apr 2016 09:15:43 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703652C.6000000@oracle.com> References: <5703652C.6000000@oracle.com> Message-ID: <5703661F.9040403@oracle.com> Hi Tobias, that looks good to me. Thank you for fixing this issue. Best regards, Zoltan On 04/05/2016 09:11 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8151724 > http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ > > The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. > > Tested with JPRT and RBT. > > Thanks, > Tobias From tobias.hartmann at oracle.com Tue Apr 5 07:22:25 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Apr 2016 09:22:25 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703661F.9040403@oracle.com> References: <5703652C.6000000@oracle.com> <5703661F.9040403@oracle.com> Message-ID: <570367B1.9080600@oracle.com> Hi Zoltan, thanks for the review! Best regards, Tobias On 05.04.2016 09:15, Zolt?n Maj? wrote: > Hi Tobias, > > > that looks good to me. Thank you for fixing this issue. > > Best regards, > > > Zoltan > > On 04/05/2016 09:11 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8151724 >> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ >> >> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. >> >> Tested with JPRT and RBT. >> >> Thanks, >> Tobias > From martin.doerr at sap.com Tue Apr 5 10:10:09 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 5 Apr 2016 10:10:09 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57020636.7010806@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> Message-ID: <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Tue Apr 5 13:54:44 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 5 Apr 2016 15:54:44 +0200 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <5702F14B.1020403@oracle.com> References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com> Message-ID: <5703C3A4.2040907@oracle.com> Hi Vladimir, On 2016-04-05 00:57, Vladimir Kozlov wrote: > 2 tests have -XX:+PrintCompilation flag added. Why you need it? > It helps a lot to have a compilation log to start with when these hard to reproduce failures happen. Those two tests test the compilation parts of the WB API. I ran into another issue in these tests - the compile()-method in CompilerWhiteBoxTest is not reliable unless the invocation counter decay is turned off. I added a check of the UseCounterDecay-flag in that method so that no one will miss it by accident. Best regards, Nils Eliasson > Thanks, > Vladimir > > On 4/1/16 6:55 AM, Nils Eliasson wrote: >> Hi all, >> >> Please review this fix. >> >> Summary: >> There is a mismatch in the CompilerWhiteBox testcases between the >> callable and the executable constructors. SimpleTestCase$Helper >> implements all constructors and methods that are tested. However since >> Helper is an inner class there will be an extra (javac created) >> constructor that has the parent class as an appended argument. The >> callable will invoke this constructor, but the executable will reference >> the normal constructor. >> >> Solution: >> Stop have the Helper as an inner class. Rename it to >> SimpleTestCaseHelper for some uniqueness in compiler commands and >> directives. >> >> Testing: >> Run all hotspot/compiler/whitebox tests on all platforms, and all >> hotspot/compiler tests on one platform. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 >> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ >> >> Best regards, >> Nils Eliasson From nils.eliasson at oracle.com Tue Apr 5 13:56:07 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 5 Apr 2016 15:56:07 +0200 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <5703C3A4.2040907@oracle.com> References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com> <5703C3A4.2040907@oracle.com> Message-ID: <5703C3F7.1020900@oracle.com> I forgot the webrev link: http://cr.openjdk.java.net/~neliasso/8151880/webrev.03/ Regards, Nils On 2016-04-05 15:54, Nils Eliasson wrote: > Hi Vladimir, > > On 2016-04-05 00:57, Vladimir Kozlov wrote: >> 2 tests have -XX:+PrintCompilation flag added. Why you need it? >> > > It helps a lot to have a compilation log to start with when these hard > to reproduce failures happen. Those two tests test the compilation > parts of the WB API. > > I ran into another issue in these tests - the compile()-method in > CompilerWhiteBoxTest is not reliable unless the invocation counter > decay is turned off. I added a check of the UseCounterDecay-flag in > that method so that no one will miss it by accident. > > Best regards, > Nils Eliasson > > >> Thanks, >> Vladimir >> >> On 4/1/16 6:55 AM, Nils Eliasson wrote: >>> Hi all, >>> >>> Please review this fix. >>> >>> Summary: >>> There is a mismatch in the CompilerWhiteBox testcases between the >>> callable and the executable constructors. SimpleTestCase$Helper >>> implements all constructors and methods that are tested. However since >>> Helper is an inner class there will be an extra (javac created) >>> constructor that has the parent class as an appended argument. The >>> callable will invoke this constructor, but the executable will >>> reference >>> the normal constructor. >>> >>> Solution: >>> Stop have the Helper as an inner class. Rename it to >>> SimpleTestCaseHelper for some uniqueness in compiler commands and >>> directives. >>> >>> Testing: >>> Run all hotspot/compiler/whitebox tests on all platforms, and all >>> hotspot/compiler tests on one platform. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ >>> >>> Best regards, >>> Nils Eliasson > From doug.simon at oracle.com Tue Apr 5 14:16:19 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 5 Apr 2016 16:16:19 +0200 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: > On 05 Apr 2016, at 02:00, Christian Thalinger wrote: > > >> On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: >> >> No, not good. We are failing a couple JVMCI tests. Looking into it? > > Ok, this got a little out of control but for the better: > > http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ > > The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. > > While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. > > Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. I agree. Can you modify your derivative webrev for that? Of course, we wouldn?t need the cast to HotSpotSpeculationLog in HotSpotCodeCacheProvider once you?ve made that change. -Doug From vladimir.x.ivanov at oracle.com Tue Apr 5 15:12:19 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 5 Apr 2016 18:12:19 +0300 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining Message-ID: <5703D5D3.7020801@oracle.com> http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8152590 Constant folding of stable field loads only happens during parsing. During incremental (post-parse) inlining some loads can become foldable, but they aren't optimized. Though the fix is pretty trivial (webrev.00.02), I decided to refactor relevant code and get rid of redundant parts. To ease the review I split the change into 4 parts: (1) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.00 * extracted all constant-related checks into ciField::constant_value()/ciField::constant_value_of(); * common constant folding logic into GraphKit::make_constant_from_field()/Type::make_constant_*(): Parse::do_get_xxx() / LibraryCallKit::inline_unsafe_access() GraphKit::make_constant_from_field(ciField*, Node*) Type::make_constant_from_field(ciField*, ...) Type::make_from_constant(ciConstant, ...) * fold_stable_ary_elem is moved to Type::make_constant_from_array_element() * check_mismatched_access is moved to type.cpp (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 Refactored constant folding logic for static final fields and unified folding logic with instance fields: is_constant() depends only on the flags and caller should check return value from ciField::constant_value() for validity (ciConstant.is_valid()) (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 Do constant folding for fields (both static and instance) in LoadNode::Value. (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 Mark CallSite::target field as constant. Also: * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. Thanks! Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Apr 5 15:20:08 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 08:20:08 -0700 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703652C.6000000@oracle.com> References: <5703652C.6000000@oracle.com> Message-ID: <5703D7A8.3090602@oracle.com> Looks good. Tobias, thank you for investigating asserts crashes. Thanks, Vladimir On 4/5/16 12:11 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8151724 > http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ > > The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. > > Tested with JPRT and RBT. > > Thanks, > Tobias > From tom.rodriguez at oracle.com Tue Apr 5 15:21:53 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 5 Apr 2016 08:21:53 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: looks good to me. tom > On Apr 1, 2016, at 11:28 AM, Igor Veresov wrote: > > When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 > Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ > > Thanks, > igor From tom.rodriguez at oracle.com Tue Apr 5 15:23:49 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 5 Apr 2016 08:23:49 -0700 Subject: RFR(XS) 8153315: [JVMCI] evol_method dependencies failures should return dependencies_failed In-Reply-To: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com> References: <110053E2-1E1A-4705-AF0F-597AFB4C372D@oracle.com> Message-ID: Thanks! tom > On Apr 1, 2016, at 1:18 PM, Igor Veresov wrote: > > Looks good. > > igor > >> On Apr 1, 2016, at 12:47 PM, Tom Rodriguez wrote: >> >> http://cr.openjdk.java.net/~never/8153315/webrev >> >> This fixes a minor issue which showed up while debugging Java code. evol_method dependences can change at any time so it?s just a normal dependence failure not invalid dependencies. Graal considers it an error to build invalid dependencies so it complained. Tested under the Eclipse debugger. >> >> tom > From tobias.hartmann at oracle.com Tue Apr 5 15:28:57 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Apr 2016 17:28:57 +0200 Subject: [9] RFR(S): 8151724: Remove -XX:GenerateCompilerNullChecks In-Reply-To: <5703D7A8.3090602@oracle.com> References: <5703652C.6000000@oracle.com> <5703D7A8.3090602@oracle.com> Message-ID: <5703D9B9.6010206@oracle.com> Thanks, Vladimir! Best regards, Tobias On 05.04.2016 17:20, Vladimir Kozlov wrote: > Looks good. Tobias, thank you for investigating asserts crashes. > > Thanks, > Vladimir > > On 4/5/16 12:11 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8151724 >> http://cr.openjdk.java.net/~thartmann/8151724/webrev.00/ >> >> The flag -XX:GenerateCompilerNullChecks is develop and was originally added for performance testing without null checks in compiled code. Today, the compilers crash with various asserts if null checks are disabled and I would therefore like to remove the flag. >> >> Tested with JPRT and RBT. >> >> Thanks, >> Tobias >> From vladimir.kozlov at oracle.com Tue Apr 5 15:32:57 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 08:32:57 -0700 Subject: RFR(S): 8151880: EnqueueMethodForCompilationTest.java still fails to compile method In-Reply-To: <5703C3F7.1020900@oracle.com> References: <56FE7DB5.404@oracle.com> <5702F14B.1020403@oracle.com> <5703C3A4.2040907@oracle.com> <5703C3F7.1020900@oracle.com> Message-ID: <5703DAA9.7010707@oracle.com> Yes, adding UseCounterDecay make sense in this case. And I agree with PrintCompilation since it help diagnose problems. Reviewed. Thanks, Vladimir On 4/5/16 6:56 AM, Nils Eliasson wrote: > I forgot the webrev link: > http://cr.openjdk.java.net/~neliasso/8151880/webrev.03/ > > Regards, > Nils > > On 2016-04-05 15:54, Nils Eliasson wrote: >> Hi Vladimir, >> >> On 2016-04-05 00:57, Vladimir Kozlov wrote: >>> 2 tests have -XX:+PrintCompilation flag added. Why you need it? >>> >> >> It helps a lot to have a compilation log to start with when these hard >> to reproduce failures happen. Those two tests test the compilation >> parts of the WB API. >> >> I ran into another issue in these tests - the compile()-method in >> CompilerWhiteBoxTest is not reliable unless the invocation counter >> decay is turned off. I added a check of the UseCounterDecay-flag in >> that method so that no one will miss it by accident. >> >> Best regards, >> Nils Eliasson >> >> >>> Thanks, >>> Vladimir >>> >>> On 4/1/16 6:55 AM, Nils Eliasson wrote: >>>> Hi all, >>>> >>>> Please review this fix. >>>> >>>> Summary: >>>> There is a mismatch in the CompilerWhiteBox testcases between the >>>> callable and the executable constructors. SimpleTestCase$Helper >>>> implements all constructors and methods that are tested. However since >>>> Helper is an inner class there will be an extra (javac created) >>>> constructor that has the parent class as an appended argument. The >>>> callable will invoke this constructor, but the executable will >>>> reference >>>> the normal constructor. >>>> >>>> Solution: >>>> Stop have the Helper as an inner class. Rename it to >>>> SimpleTestCaseHelper for some uniqueness in compiler commands and >>>> directives. >>>> >>>> Testing: >>>> Run all hotspot/compiler/whitebox tests on all platforms, and all >>>> hotspot/compiler tests on one platform. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8151880 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8151880/webrev.02/ >>>> >>>> Best regards, >>>> Nils Eliasson >> > From igor.veresov at oracle.com Tue Apr 5 15:57:41 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 08:57:41 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: Thanks, Tom! igor > On Apr 5, 2016, at 8:21 AM, Tom Rodriguez wrote: > > looks good to me. > > tom > >> On Apr 1, 2016, at 11:28 AM, Igor Veresov wrote: >> >> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >> >> Thanks, >> igor > From lois.foltan at oracle.com Tue Apr 5 16:30:27 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 12:30:27 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: <5703E823.8050400@oracle.com> Hi Igor, I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? Thanks, Lois On 4/1/2016 2:28 PM, Igor Veresov wrote: > When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 > Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ > > Thanks, > igor From igor.veresov at oracle.com Tue Apr 5 16:50:49 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 09:50:49 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703E823.8050400@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> Message-ID: <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> Hi Lois, Thanks for looking at it. Yes, it passes all hotspot jtreg tests. igor > On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: > > Hi Igor, > > I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? > > Thanks, > Lois > > On 4/1/2016 2:28 PM, Igor Veresov wrote: >> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >> >> Thanks, >> igor > From lois.foltan at oracle.com Tue Apr 5 17:34:15 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 13:34:15 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> Message-ID: <5703F717.702@oracle.com> On 4/5/2016 12:50 PM, Igor Veresov wrote: > Hi Lois, > > Thanks for looking at it. Yes, it passes all hotspot jtreg tests. > > igor Hi Igor, Thanks for waiting on this. A couple of comments: - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. Just curious did you also run the testbase default methods tests? Lois > >> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >> >> Hi Igor, >> >> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >> >> Thanks, >> Lois >> >> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>> >>> Thanks, >>> igor From igor.veresov at oracle.com Tue Apr 5 17:44:56 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 10:44:56 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703F717.702@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: > On Apr 5, 2016, at 10:34 AM, Lois Foltan wrote: > > > On 4/5/2016 12:50 PM, Igor Veresov wrote: >> Hi Lois, >> >> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >> >> igor > Hi Igor, > > Thanks for waiting on this. A couple of comments: > > - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. > > - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. > I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? > Just curious did you also run the testbase default methods tests? Yes, within the context of a closed project. That?s actually what made these changes necessary. igor > Lois > >> >>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>> >>> Hi Igor, >>> >>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>> >>> Thanks, >>> Lois >>> >>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>> >>>> Thanks, >>>> igor > From tom.rodriguez at oracle.com Tue Apr 5 17:55:04 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 5 Apr 2016 10:55:04 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703F717.702@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: <29A506A0-BEC9-4514-B8E9-92E6C29B1E40@oracle.com> > Thanks for waiting on this. A couple of comments: > > - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. > > - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. Unless i?m misunderstanding the code, I?d say that it was a bug that nostatics was being passed as false. The standard naming in LinkResolver follows a pattern where the name is associated with the byte code being used for the invoke. So if you are in {linktime,runtime}_resolve_{foo}_method then an invokefoo byte code is what?s being used. resolve_interface_method is currently not following that naming scheme, which I think should be fixed. Maybe resolve_method_in_interface? I do agree we should be sure that the paths leading to this call agree about the actual byte code being used, but I can?t see a path where a different byte code could have been passed in. tom > > Just curious did you also run the testbase default methods tests? > Lois > >> >>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>> >>> Hi Igor, >>> >>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>> >>> Thanks, >>> Lois >>> >>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>> >>>> Thanks, >>>> igor > From igor.veresov at oracle.com Tue Apr 5 17:56:55 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 10:56:55 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: > On Apr 5, 2016, at 10:44 AM, Igor Veresov wrote: > >> >> On Apr 5, 2016, at 10:34 AM, Lois Foltan wrote: >> >> >> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>> Hi Lois, >>> >>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>> >>> igor >> Hi Igor, >> >> Thanks for waiting on this. A couple of comments: >> >> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >> >> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >> > > I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. igor > >> Just curious did you also run the testbase default methods tests? > > Yes, within the context of a closed project. That?s actually what made these changes necessary. > > igor > >> Lois >> >>> >>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>>> >>>> Hi Igor, >>>> >>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>> >>>> Thanks, >>>> Lois >>>> >>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>> >>>>> Thanks, >>>>> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From karen.kinnear at oracle.com Tue Apr 5 18:12:41 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 5 Apr 2016 14:12:41 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> Message-ID: <01AE161D-FAE2-4F37-9E79-3951DB516EE0@oracle.com> Igor, I?d like to get back to you on this before you check in the change please. I need to sanity check the JVMS and the code. I have another set of tests written by Vladimir Ivanov that I will send you as well. thanks, Karen > On Apr 5, 2016, at 11:57 AM, Igor Veresov wrote: > > Thanks, Tom! > > igor > >> On Apr 5, 2016, at 8:21 AM, Tom Rodriguez wrote: >> >> looks good to me. >> >> tom >> >>> On Apr 1, 2016, at 11:28 AM, Igor Veresov wrote: >>> >>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>> >>> Thanks, >>> igor >> > From karen.kinnear at oracle.com Tue Apr 5 19:04:12 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 5 Apr 2016 15:04:12 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <5703F717.702@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: I am in agreement with Lois that the JVMS looks good with moving the exception. With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next meeting I will check one more time. It might be worth adding a comment. My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, so that you get the correct behavior depending on the requesting byte code. I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so I could use help studying this a bit more to understand if this really is resolution or is really selection. thanks, Karen > On Apr 5, 2016, at 1:34 PM, Lois Foltan wrote: > > > On 4/5/2016 12:50 PM, Igor Veresov wrote: >> Hi Lois, >> >> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >> >> igor > Hi Igor, > > Thanks for waiting on this. A couple of comments: > > - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. > > - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. > > Just curious did you also run the testbase default methods tests? > Lois > >> >>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>> >>> Hi Igor, >>> >>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>> >>> Thanks, >>> Lois >>> >>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>> >>>> Thanks, >>>> igor > From vladimir.kozlov at oracle.com Tue Apr 5 19:23:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 12:23:44 -0700 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining In-Reply-To: <5703D5D3.7020801@oracle.com> References: <5703D5D3.7020801@oracle.com> Message-ID: <570410C0.7080105@oracle.com> On 4/5/16 8:12 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00 > https://bugs.openjdk.java.net/browse/JDK-8152590 > > Constant folding of stable field loads only happens during parsing. > During incremental (post-parse) inlining some loads can become foldable, > but they aren't optimized. > > Though the fix is pretty trivial (webrev.00.02), I decided to refactor > relevant code and get rid of redundant parts. > > To ease the review I split the change into 4 parts: > > (1) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.00 > > * extracted all constant-related checks into > ciField::constant_value()/ciField::constant_value_of(); > > * common constant folding logic into > GraphKit::make_constant_from_field()/Type::make_constant_*(): > > Parse::do_get_xxx() / LibraryCallKit::inline_unsafe_access() > GraphKit::make_constant_from_field(ciField*, Node*) > Type::make_constant_from_field(ciField*, ...) > Type::make_from_constant(ciConstant, ...) > > * fold_stable_ary_elem is moved to > Type::make_constant_from_array_element() > > * check_mismatched_access is moved to type.cpp Type::make_constant_from_field() - is_stable_array and stable_dimension are needed only at the end for make_from_constant() call, move them there. check_mismatched_access() result is used only in assert. Should you put the call and assert under #ifdef ASSERT? > > > (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 > > Refactored constant folding logic for static final fields and unified > folding logic with instance fields: is_constant() depends only on the > flags and caller should check return value from > ciField::constant_value() for validity (ciConstant.is_valid()) > Good. > > (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 > > Do constant folding for fields (both static and instance) in > LoadNode::Value. Good. > > > (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 > > Mark CallSite::target field as constant. Okay. Thanks, Vladimir > > > Also: > > * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java > > Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. > > Thanks! > > Best regards, > Vladimir Ivanov From igor.veresov at oracle.com Tue Apr 5 19:43:51 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 12:43:51 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> > On Apr 5, 2016, at 12:04 PM, Karen Kinnear wrote: > > I am in agreement with Lois that the JVMS looks good with moving the exception. Thanks! > > With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next > meeting I will check one more time. It might be worth adding a comment. Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. > > My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks > if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. > That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). igor > I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the > corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, > so that you get the correct behavior depending on the requesting byte code. > > I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so > I could use help studying this a bit more to understand if this really is resolution or is really selection. > > thanks, > Karen > >> On Apr 5, 2016, at 1:34 PM, Lois Foltan wrote: >> >> >> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>> Hi Lois, >>> >>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>> >>> igor >> Hi Igor, >> >> Thanks for waiting on this. A couple of comments: >> >> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >> >> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >> >> Just curious did you also run the testbase default methods tests? >> Lois >> >>> >>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan wrote: >>>> >>>> Hi Igor, >>>> >>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>> >>>> Thanks, >>>> Lois >>>> >>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>> >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>> >>>>> Thanks, >>>>> igor >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Apr 5 19:47:26 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 5 Apr 2016 22:47:26 +0300 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining In-Reply-To: <570410C0.7080105@oracle.com> References: <5703D5D3.7020801@oracle.com> <570410C0.7080105@oracle.com> Message-ID: <5704164E.3040006@oracle.com> Thanks for review, Vladimir! Updated version: http://cr.openjdk.java.net/~vlivanov/8152590/webrev.01 >> * check_mismatched_access is moved to type.cpp > > Type::make_constant_from_field() - is_stable_array and stable_dimension > are needed only at the end for make_from_constant() call, move them there. Done. > check_mismatched_access() result is used only in assert. Should you put > the call and assert under #ifdef ASSERT? No, it's a bug in the change: con should be used instead of field_value. check_mismatched_access filters out invalid accesses and adjusts the value for unsigned loads. Fixed. Best regards, Vladimir Ivanov PS: I'll enhance the tests to catch unsigned field load case. > >> >> >> (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 >> >> Refactored constant folding logic for static final fields and unified >> folding logic with instance fields: is_constant() depends only on the >> flags and caller should check return value from >> ciField::constant_value() for validity (ciConstant.is_valid()) >> > > Good. > >> >> (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 >> >> Do constant folding for fields (both static and instance) in >> LoadNode::Value. > > Good. > >> >> >> (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 >> >> Mark CallSite::target field as constant. > > Okay. > > Thanks, > Vladimir > >> >> >> Also: >> >> * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java >> >> Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From lois.foltan at oracle.com Tue Apr 5 19:54:14 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 15:54:14 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> Message-ID: <570417E6.60107@oracle.com> On 4/5/2016 1:56 PM, Igor Veresov wrote: > >> On Apr 5, 2016, at 10:44 AM, Igor Veresov > > wrote: >> >>> >>> On Apr 5, 2016, at 10:34 AM, Lois Foltan >> > wrote: >>> >>> >>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>> Hi Lois, >>>> >>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>> >>>> igor >>> Hi Igor, >>> >>> Thanks for waiting on this. A couple of comments: >>> >>> - Section 6.5 Instructions for invokeinterface does indicate that a >>> "Linking Exceptions" the VM can throw an ICCE if the resolved method >>> is static or private. So I think moving this exception from runtime >>> to linktime is okay. >>> >>> - I'm concerned about the change on line #998, #1030, #1316. I >>> don't think you are necessarily guaranteed to have the bytecodes >>> that you are now passing to resolve_interface_method. For example, >>> line #998 within linktime_resolve_static_method, you may not have an >>> invokestatic here, you may have another invoke* bytecode trying to >>> invoke a static interface method. Passing in >>> Bytecodes::_invokestatic seems wrong, because even if the resolved >>> method is static, "nostatics" was set to false. >>> >> >> I looked at the call graphs of these guys and >> linktime_resolve_X_method() methods actually seem to only called >> within invokeX contexts. But may be I missed something. Can you give >> an example of a path that may cause, say, >> linktime_resolve_static_method() be invoked for non-invokestatic >> instruction? > > Actually, the easier way to think about it, would be: The answer > returned by resolve_interface_method() is result of a method > resolution in the interface class ?as if? it were invoked by the given > bytecode instruction. It of course doesn?t not mean that the > invocation is really a result of the said instruction. The context is. > As you may see the logic around ?nostatic? did not change and the > logic around resolve_interface_method() being called within the > invokeinterface context is what we want it to be. > > The same effect could have been achieved by adding another bool > argument to resolve_interface_method() to indicate a question within > the invokeinterface context. But passing a bytecode makes it an easier > to read code. Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. Lois > > igor > >> >>> Just curious did you also run the testbase default methods tests? >> >> Yes, within the context of a closed project. That?s actually what >> made these changes necessary. >> >> igor >> >>> Lois >>> >>>> >>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan >>>> > wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> I know you have two reviews for this but could you hold off >>>>> committing until I or Karen Kinnear have a chance to review. We >>>>> both worked in this area a lot to support default methods in JDK >>>>> 8. Also, have you run the hotspot/test/runtime/SelectionResolution >>>>> tests on this? >>>>> >>>>> Thanks, >>>>> Lois >>>>> >>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>> When invoking private interface methods with invokeinterface we >>>>>> throw ICCE. The check for that happens in the runtime part of the >>>>>> resolution, however, doing it at linktime seems like a better >>>>>> place, since the check doesn't depend on the receiver type. It >>>>>> also allows compiler interfaces that rely on linktime resolution >>>>>> to detect inconsistencies during parsing (see >>>>>> ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() >>>>>> (JVMCI) that are affected). >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>> >>>>>> >>>>>> Thanks, >>>>>> igor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 5 20:07:11 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 13:07:11 -0700 Subject: [9] RFR (L): 8152590: C2: @Stable support doesn't always work w/ incremental inlining In-Reply-To: <5704164E.3040006@oracle.com> References: <5703D5D3.7020801@oracle.com> <570410C0.7080105@oracle.com> <5704164E.3040006@oracle.com> Message-ID: <57041AEF.4090907@oracle.com> This looks good. Thanks, Vladimir K On 4/5/16 12:47 PM, Vladimir Ivanov wrote: > Thanks for review, Vladimir! > > Updated version: > http://cr.openjdk.java.net/~vlivanov/8152590/webrev.01 >>> * check_mismatched_access is moved to type.cpp >> >> Type::make_constant_from_field() - is_stable_array and stable_dimension >> are needed only at the end for make_from_constant() call, move them >> there. > > Done. > >> check_mismatched_access() result is used only in assert. Should you put >> the call and assert under #ifdef ASSERT? > No, it's a bug in the change: con should be used instead of field_value. > check_mismatched_access filters out invalid accesses and adjusts the > value for unsigned loads. > > Fixed. > > Best regards, > Vladimir Ivanov > > PS: I'll enhance the tests to catch unsigned field load case. > >> >>> >>> >>> (2) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.01 >>> >>> Refactored constant folding logic for static final fields and unified >>> folding logic with instance fields: is_constant() depends only on the >>> flags and caller should check return value from >>> ciField::constant_value() for validity (ciConstant.is_valid()) >>> >> >> Good. >> >>> >>> (3) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.02 >>> >>> Do constant folding for fields (both static and instance) in >>> LoadNode::Value. >> >> Good. >> >>> >>> >>> (4) http://cr.openjdk.java.net/~vlivanov/8152590/webrev.00.03 >>> >>> Mark CallSite::target field as constant. >> >> Okay. >> >> Thanks, >> Vladimir >> >>> >>> >>> Also: >>> >>> * fixed test/compiler/unsafe/UnsafeGetStableArrayElement.java >>> >>> Testing: JPRT, RBT (pit-hs-comp w/ parse time folding on/off), octane. >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Apr 5 20:33:34 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 13:33:34 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> Message-ID: <5704211E.5090007@oracle.com> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? I will start pre-integration testing. Thanks, Vladimir On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: > Hi Christian > > We have updated the patch as per the suggested changes. > > The webrev for the same is at this location for your review. > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ > > We will soon send another patch for CompilerDirectives changes. > > Regards, > > Vivek > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:* Tuesday, March 29, 2016 11:29 AM > *To:* Rukmannagari, Shravya > *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler > *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 > > On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > > wrote: > > Hi Christian, > > We would add separate files for each intrinsic. By splitting the > CompilerDirectives, do you mean we have to add a separate file. > Sorry I didn?t exactly get it. > > Oh, sorry, I wasn?t clear enough. Please file a new enhancement for > the CompilerDirectives changes and integrate them separately. > > > > Thanks, > > Shravya Rukmannagari. > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:*Monday, March 28, 2016 5:18 PM > *To:*Deshpande, Vivek R > > *Cc:*hotspot compiler >; Vladimir Kozlov > >; > Rukmannagari, Shravya > > *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I left this comment in the bug: > > I think for the saneness of the macroAssembler_libm_x86_*.cpp files we > should put every intrinsic in its own file, like we did for > macroAssembler_x86_sha.cpp. They are already too big: > > $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp > 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp > 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp > > Also, can we split out the CompilerDirectives changes? > > > > > On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > > > wrote: > > Hi all > > We would like to contribute a patch which optimizestan and log10 > X86architecture usingIntel LIBM library. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8152907 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ > > Thanks and regards, > > Vivek > From vivek.r.deshpande at intel.com Tue Apr 5 20:41:48 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 5 Apr 2016 20:41:48 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <5704211E.5090007@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> Hi Vladimir I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. Thank you for the review. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 05, 2016 1:34 PM To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? I will start pre-integration testing. Thanks, Vladimir On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: > Hi Christian > > We have updated the patch as per the suggested changes. > > The webrev for the same is at this location for your review. > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01 > / > > We will soon send another patch for CompilerDirectives changes. > > Regards, > > Vivek > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:* Tuesday, March 29, 2016 11:29 AM > *To:* Rukmannagari, Shravya > *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler > *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 > > On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > > wrote: > > Hi Christian, > > We would add separate files for each intrinsic. By splitting the > CompilerDirectives, do you mean we have to add a separate file. > Sorry I didn?t exactly get it. > > Oh, sorry, I wasn?t clear enough. Please file a new enhancement for > the CompilerDirectives changes and integrate them separately. > > > > Thanks, > > Shravya Rukmannagari. > > *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] > *Sent:*Monday, March 28, 2016 5:18 PM > *To:*Deshpande, Vivek R > > *Cc:*hotspot compiler >; Vladimir Kozlov > >; > Rukmannagari, Shravya > > *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I left this comment in the bug: > > I think for the saneness of the macroAssembler_libm_x86_*.cpp files we > should put every intrinsic in its own file, like we did for > macroAssembler_x86_sha.cpp. They are already too big: > > $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp > 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp > 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp > > Also, can we split out the CompilerDirectives changes? > > > > > On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > > > wrote: > > Hi all > > We would like to contribute a patch which optimizestan and log10 > X86architecture usingIntel LIBM library. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8152907 > webrev: > > > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00 > / > > Thanks and regards, > > Vivek > From vladimir.kozlov at oracle.com Tue Apr 5 20:47:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 13:47:28 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> Message-ID: <57042460.5070306@oracle.com> I again can't apply changes because of CR at the end of lines in patch file. Vladimir On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: > > Hi Vladimir > > I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. > Thank you for the review. > > Regards, > Vivek > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:34 PM > To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? > > I will start pre-integration testing. > > Thanks, > Vladimir > > On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >> Hi Christian >> >> We have updated the patch as per the suggested changes. >> >> The webrev for the same is at this location for your review. >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01 >> / >> >> We will soon send another patch for CompilerDirectives changes. >> >> Regards, >> >> Vivek >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:* Tuesday, March 29, 2016 11:29 AM >> *To:* Rukmannagari, Shravya >> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >> > > wrote: >> >> Hi Christian, >> >> We would add separate files for each intrinsic. By splitting the >> CompilerDirectives, do you mean we have to add a separate file. >> Sorry I didn?t exactly get it. >> >> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >> the CompilerDirectives changes and integrate them separately. >> >> >> >> Thanks, >> >> Shravya Rukmannagari. >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Monday, March 28, 2016 5:18 PM >> *To:*Deshpande, Vivek R > > >> *Cc:*hotspot compiler > >; Vladimir Kozlov >> >; >> Rukmannagari, Shravya > > >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I left this comment in the bug: >> >> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we >> should put every intrinsic in its own file, like we did for >> macroAssembler_x86_sha.cpp. They are already too big: >> >> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >> >> Also, can we split out the CompilerDirectives changes? >> >> >> >> >> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >> > >> wrote: >> >> Hi all >> >> We would like to contribute a patch which optimizestan and log10 >> X86architecture usingIntel LIBM library. >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8152907 >> webrev: >> >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00 >> / >> >> Thanks and regards, >> >> Vivek >> From vivek.r.deshpande at intel.com Tue Apr 5 21:27:00 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 5 Apr 2016 21:27:00 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <57042460.5070306@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> HI Vladimir Sorry about that. Please check this webrev http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02/ I have updated it. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 05, 2016 1:47 PM To: Deshpande, Vivek R; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 I again can't apply changes because of CR at the end of lines in patch file. Vladimir On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: > > Hi Vladimir > > I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. > Thank you for the review. > > Regards, > Vivek > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:34 PM > To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? > > I will start pre-integration testing. > > Thanks, > Vladimir > > On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >> Hi Christian >> >> We have updated the patch as per the suggested changes. >> >> The webrev for the same is at this location for your review. >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >> 1 >> / >> >> We will soon send another patch for CompilerDirectives changes. >> >> Regards, >> >> Vivek >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:* Tuesday, March 29, 2016 11:29 AM >> *To:* Rukmannagari, Shravya >> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >> > > wrote: >> >> Hi Christian, >> >> We would add separate files for each intrinsic. By splitting the >> CompilerDirectives, do you mean we have to add a separate file. >> Sorry I didn?t exactly get it. >> >> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >> the CompilerDirectives changes and integrate them separately. >> >> >> >> Thanks, >> >> Shravya Rukmannagari. >> >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >> > >> *Cc:*hotspot compiler > >; Vladimir Kozlov >> >; >> Rukmannagari, Shravya > > >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I left this comment in the bug: >> >> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >> we should put every intrinsic in its own file, like we did for >> macroAssembler_x86_sha.cpp. They are already too big: >> >> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >> >> Also, can we split out the CompilerDirectives changes? >> >> >> >> >> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >> > >> wrote: >> >> Hi all >> >> We would like to contribute a patch which optimizestan and log10 >> X86architecture usingIntel LIBM library. >> >> Could you please review and sponsor this patch. >> >> Bug-id: >> >> https://bugs.openjdk.java.net/browse/JDK-8152907 >> webrev: >> >> >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >> 0 >> / >> >> Thanks and regards, >> >> Vivek >> From vladimir.kozlov at oracle.com Tue Apr 5 21:42:04 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 14:42:04 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> Message-ID: <5704312C.9000605@oracle.com> Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: error: use of undeclared identifier 'SharedRuntime' __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); Note templateInterpreterGenerator_x86_32.cpp has that #include. It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. Vladimir On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: > HI Vladimir > > Sorry about that. > Please check this webrev > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02/ > I have updated it. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:47 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I again can't apply changes because of CR at the end of lines in patch file. > > Vladimir > > On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >> >> Hi Vladimir >> >> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >> Thank you for the review. >> >> Regards, >> Vivek >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:34 PM >> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >> >> I will start pre-integration testing. >> >> Thanks, >> Vladimir >> >> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>> Hi Christian >>> >>> We have updated the patch as per the suggested changes. >>> >>> The webrev for the same is at this location for your review. >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >>> 1 >>> / >>> >>> We will soon send another patch for CompilerDirectives changes. >>> >>> Regards, >>> >>> Vivek >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>> *To:* Rukmannagari, Shravya >>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>> >> > wrote: >>> >>> Hi Christian, >>> >>> We would add separate files for each intrinsic. By splitting the >>> CompilerDirectives, do you mean we have to add a separate file. >>> Sorry I didn?t exactly get it. >>> >>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>> the CompilerDirectives changes and integrate them separately. >>> >>> >>> >>> Thanks, >>> >>> Shravya Rukmannagari. >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>> > >>> *Cc:*hotspot compiler >> >; Vladimir Kozlov >>> >; >>> Rukmannagari, Shravya >> > >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> I left this comment in the bug: >>> >>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>> we should put every intrinsic in its own file, like we did for >>> macroAssembler_x86_sha.cpp. They are already too big: >>> >>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>> >>> Also, can we split out the CompilerDirectives changes? >>> >>> >>> >>> >>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>> > >>> wrote: >>> >>> Hi all >>> >>> We would like to contribute a patch which optimizestan and log10 >>> X86architecture usingIntel LIBM library. >>> >>> Could you please review and sponsor this patch. >>> >>> Bug-id: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>> webrev: >>> >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >>> 0 >>> / >>> >>> Thanks and regards, >>> >>> Vivek >>> From igor.veresov at oracle.com Tue Apr 5 22:08:03 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 15:08:03 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <57041A5D.4040909@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <570417E6.60107@oracle.com> <5704192A.3010807@oracle.com> <57041A5D.4040909@oracle.com> Message-ID: <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> Coleen and Lois, So, ok to push? igor > On Apr 5, 2016, at 1:04 PM, Coleen Phillimore wrote: > > > > On 4/5/16 3:59 PM, Coleen Phillimore wrote: >> >> >> On 4/5/16 3:54 PM, Lois Foltan wrote: >>> >>> On 4/5/2016 1:56 PM, Igor Veresov wrote: >>>> >>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov > wrote: >>>>> >>>>>> >>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan > wrote: >>>>>> >>>>>> >>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>> Hi Lois, >>>>>>> >>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>> >>>>>>> igor >>>>>> Hi Igor, >>>>>> >>>>>> Thanks for waiting on this. A couple of comments: >>>>>> >>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>> >>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>> >>>>> >>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? >>>> >>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. >>>> >>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. >>> >>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. >> >> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148 > > No, sorry, this change passes the 'tag'. nvm. > > Coleen >> >> Just waiting for some test changes. >> >> Coleen >> >>> Lois >>> >>>> >>>> igor >>>> >>>>> >>>>>> Just curious did you also run the testbase default methods tests? >>>>> >>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary. >>>>> >>>>> igor >>>>> >>>>>> Lois >>>>>> >>>>>>> >>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>> >>>>>>>> Hi Igor, >>>>>>>> >>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Lois >>>>>>>> >>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>> >>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From lois.foltan at oracle.com Tue Apr 5 22:12:37 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 05 Apr 2016 18:12:37 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <570417E6.60107@oracle.com> <5704192A.3010807@oracle.com> <57041A5D.4040909@oracle.com> <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> Message-ID: <57043855.1000405@oracle.com> On 4/5/2016 6:08 PM, Igor Veresov wrote: > Coleen and Lois, > > So, ok to push? Yes, for me. Have you addressed all of Karen's concerns as well? Lois > > igor > > >> On Apr 5, 2016, at 1:04 PM, Coleen Phillimore wrote: >> >> >> >> On 4/5/16 3:59 PM, Coleen Phillimore wrote: >>> >>> On 4/5/16 3:54 PM, Lois Foltan wrote: >>>> On 4/5/2016 1:56 PM, Igor Veresov wrote: >>>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov > wrote: >>>>>> >>>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan > wrote: >>>>>>> >>>>>>> >>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>>> Hi Lois, >>>>>>>> >>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>>> >>>>>>>> igor >>>>>>> Hi Igor, >>>>>>> >>>>>>> Thanks for waiting on this. A couple of comments: >>>>>>> >>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>>> >>>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>>> >>>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? >>>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. >>>>> >>>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. >>>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. >>> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148 >> No, sorry, this change passes the 'tag'. nvm. >> >> Coleen >>> Just waiting for some test changes. >>> >>> Coleen >>> >>>> Lois >>>> >>>>> igor >>>>> >>>>>>> Just curious did you also run the testbase default methods tests? >>>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary. >>>>>> >>>>>> igor >>>>>> >>>>>>> Lois >>>>>>> >>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>>> >>>>>>>>> Hi Igor, >>>>>>>>> >>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Lois >>>>>>>>> >>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>>> >>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> igor From igor.veresov at oracle.com Tue Apr 5 22:16:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 15:16:48 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <57043855.1000405@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <570417E6.60107@oracle.com> <5704192A.3010807@oracle.com> <57041A5D.4040909@oracle.com> <11AC503F-CF56-4DC4-85EE-728B2D82FC6E@oracle.com> <57043855.1000405@oracle.com> Message-ID: > On Apr 5, 2016, at 3:12 PM, Lois Foltan wrote: > > > On 4/5/2016 6:08 PM, Igor Veresov wrote: >> Coleen and Lois, >> >> So, ok to push? > Yes, for me. Have you addressed all of Karen's concerns as well? Right.. Karen, is it alright? igor > Lois > >> >> igor >> >> >>> On Apr 5, 2016, at 1:04 PM, Coleen Phillimore wrote: >>> >>> >>> >>> On 4/5/16 3:59 PM, Coleen Phillimore wrote: >>>> >>>> On 4/5/16 3:54 PM, Lois Foltan wrote: >>>>> On 4/5/2016 1:56 PM, Igor Veresov wrote: >>>>>>> On Apr 5, 2016, at 10:44 AM, Igor Veresov > wrote: >>>>>>> >>>>>>>> On Apr 5, 2016, at 10:34 AM, Lois Foltan > wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>>>> Hi Lois, >>>>>>>>> >>>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>>>> >>>>>>>>> igor >>>>>>>> Hi Igor, >>>>>>>> >>>>>>>> Thanks for waiting on this. A couple of comments: >>>>>>>> >>>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>>>> >>>>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>>>> >>>>>>> I looked at the call graphs of these guys and linktime_resolve_X_method() methods actually seem to only called within invokeX contexts. But may be I missed something. Can you give an example of a path that may cause, say, linktime_resolve_static_method() be invoked for non-invokestatic instruction? >>>>>> Actually, the easier way to think about it, would be: The answer returned by resolve_interface_method() is result of a method resolution in the interface class ?as if? it were invoked by the given bytecode instruction. It of course doesn?t not mean that the invocation is really a result of the said instruction. The context is. As you may see the logic around ?nostatic? did not change and the logic around resolve_interface_method() being called within the invokeinterface context is what we want it to be. >>>>>> >>>>>> The same effect could have been achieved by adding another bool argument to resolve_interface_method() to indicate a question within the invokeinterface context. But passing a bytecode makes it an easier to read code. >>>>> Okay, I saw Tom's reply as well to my comments and I reviewed all the call paths. I think the change is okay. I kind of wish that the original bytecode was stored in the LinkInfo structure so that we could just reference the actual bytecode used and make decisions based on that instead of the parameter approach, but that maybe a RFE to investigate later. >>>> There's a change coming that does store the original bytecode for bug https://bugs.openjdk.java.net/browse/JDK-8145148 >>> No, sorry, this change passes the 'tag'. nvm. >>> >>> Coleen >>>> Just waiting for some test changes. >>>> >>>> Coleen >>>> >>>>> Lois >>>>> >>>>>> igor >>>>>> >>>>>>>> Just curious did you also run the testbase default methods tests? >>>>>>> Yes, within the context of a closed project. That?s actually what made these changes necessary. >>>>>>> >>>>>>> igor >>>>>>> >>>>>>>> Lois >>>>>>>> >>>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>>>> >>>>>>>>>> Hi Igor, >>>>>>>>>> >>>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Lois >>>>>>>>>> >>>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>>>> >>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> igor > From karen.kinnear at oracle.com Tue Apr 5 22:33:17 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 5 Apr 2016 18:33:17 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> Message-ID: <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> Igor, Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter for instance? If so, I am ok with checking this in - further notes below. > On Apr 5, 2016, at 3:43 PM, Igor Veresov wrote: > > >> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >> >> I am in agreement with Lois that the JVMS looks good with moving the exception. > > Thanks! >> >> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >> meeting I will check one more time. It might be worth adding a comment. > > Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ > Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. > >> >> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >> > > That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 > In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. That is ok with me - I will add a note to the bug. Also: I see a ciMethod::check_call that has a comment - IT appears to fail when applied to an invoke interface call site. FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take the subtleties of invoke interface and invoke special into account. > > igor > >> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >> so that you get the correct behavior depending on the requesting byte code. >> >> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >> I could use help studying this a bit more to understand if this really is resolution or is really selection. >> >> thanks, >> Karen >> >>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>> >>> >>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>> Hi Lois, >>>> >>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>> >>>> igor >>> Hi Igor, >>> >>> Thanks for waiting on this. A couple of comments: >>> >>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>> >>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>> >>> Just curious did you also run the testbase default methods tests? >>> Lois >>> >>>> >>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>> >>>>> Hi Igor, >>>>> >>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>> >>>>> Thanks, >>>>> Lois >>>>> >>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>> >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>> >>>>>> Thanks, >>>>>> igor >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Tue Apr 5 23:22:37 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Apr 2016 16:22:37 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> Message-ID: <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> > On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: > > Igor, > > Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter > for instance? Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. > > If so, I am ok with checking this in - further notes below. > >> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >> >> >>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>> >>> I am in agreement with Lois that the JVMS looks good with moving the exception. >> >> Thanks! >>> >>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>> meeting I will check one more time. It might be worth adding a comment. >> >> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >> >>> >>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>> >> >> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). > > Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match > the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. > That is ok with me - I will add a note to the bug. Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? > > Also: I see a ciMethod::check_call that has a comment - > IT appears to fail when applied to an invoke interface call site. > FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. > This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? igor > Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take > the subtleties of invoke interface and invoke special into account. >> >> igor >> >>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>> so that you get the correct behavior depending on the requesting byte code. >>> >>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>> >>> thanks, >>> Karen >>> >>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>> >>>> >>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>> Hi Lois, >>>>> >>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>> >>>>> igor >>>> Hi Igor, >>>> >>>> Thanks for waiting on this. A couple of comments: >>>> >>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>> >>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>> >>>> Just curious did you also run the testbase default methods tests? >>>> Lois >>>> >>>>> >>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>> >>>>>> Hi Igor, >>>>>> >>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>> >>>>>> Thanks, >>>>>> Lois >>>>>> >>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>> >>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>> >>>>>>> Thanks, >>>>>>> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Apr 5 23:52:57 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 5 Apr 2016 13:52:57 -1000 Subject: RFR (S): 8153439: do not install an empty SpeculationLog in an nmethod In-Reply-To: References: <8340CC96-15C3-4BF6-B79A-8A90BE2F215D@oracle.com> <3E1B8B2F-EEFB-43EA-B409-8E002D29E2BC@oracle.com> Message-ID: > On Apr 5, 2016, at 4:16 AM, Doug Simon wrote: > >> >> On 05 Apr 2016, at 02:00, Christian Thalinger wrote: >> >> >>> On Apr 4, 2016, at 12:34 PM, Christian Thalinger wrote: >>> >>> No, not good. We are failing a couple JVMCI tests. Looking into it? >> >> Ok, this got a little out of control but for the better: >> >> http://cr.openjdk.java.net/~twisti/8153439/webrev.01/ >> >> The actual fix is to check for a null log argument. The rest is moving the tests into an mx-controlled directory so we can edit and run the tests in an IDE. This made it much easier to figure out what the issue was because stupid jtreg just swallowed all exceptions. >> >> While moving the tests I fixed a bunch of them because they didn?t have the proper @compile directives and so failed when running standalone. Again, stupid jtreg. >> >> Also, I?m wondering if hasSpeculations() should be an interface method in SpeculationLog. I think it should. > > I agree. Can you modify your derivative webrev for that? Of course, we wouldn?t need the cast to HotSpotSpeculationLog in HotSpotCodeCacheProvider once you?ve made that change. Sure. Here is the new webrev: http://cr.openjdk.java.net/~twisti/8153439/webrev.02/ > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Wed Apr 6 01:37:30 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 5 Apr 2016 15:37:30 -1000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> Message-ID: <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> > On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R wrote: > > Hi Christian > > We have updated the patch as per the suggested changes. > The webrev for the same is at this location for your review. > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ There are: 73 #ifdef _LP64 368 #else 655 #endif in the new files but I don?t see them share any code. Maybe it would be better to have dedicated x86_32 and x86_64 files. Then the ifdefs are not required. > > We will soon send another patch for CompilerDirectives changes. > > Regards, > Vivek > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Tuesday, March 29, 2016 11:29 AM > To: Rukmannagari, Shravya > Cc: Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > > On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya > wrote: > > Hi Christian, > We would add separate files for each intrinsic. By splitting the CompilerDirectives, do you mean we have to add a separate file. Sorry I didn?t exactly get it. > > Oh, sorry, I wasn?t clear enough. Please file a new enhancement for the CompilerDirectives changes and integrate them separately. > > > > Thanks, > Shravya Rukmannagari. > ? <> > From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Monday, March 28, 2016 5:18 PM > To: Deshpande, Vivek R > > Cc: hotspot compiler >; Vladimir Kozlov >; Rukmannagari, Shravya > > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I left this comment in the bug: > > I think for the saneness of the macroAssembler_libm_x86_*.cpp files we should put every intrinsic in its own file, like we did for macroAssembler_x86_sha.cpp. They are already too big: > > $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp > 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp > 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp > > Also, can we split out the CompilerDirectives changes? > > > > On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R > wrote: > > Hi all > > We would like to contribute a patch which optimizes tan and log10 X86 architecture using Intel LIBM library. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8152907 > webrev: > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 6 01:46:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Apr 2016 18:46:44 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> Message-ID: <57046A84.6040707@oracle.com> Multiple files are not always good. May be in a future we can rewrite this code to use shared parts (code or data). I think current split is enough for these changes. Thanks, Vladimir On 4/5/16 6:37 PM, Christian Thalinger wrote: > >> On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R >> > wrote: >> >> Hi Christian >> We have updated the patch as per the suggested changes. >> The webrev for the same is at this location for your review. >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ > > There are: > > 73 #ifdef _LP64 > > 368 #else > > 655 #endif > > in the new files but I don?t see them share any code. Maybe it would be > better to have dedicated x86_32 and x86_64 files. Then the ifdefs are > not required. > >> We will soon send another patch for CompilerDirectives changes. >> Regards, >> Vivek >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Tuesday, March 29, 2016 11:29 AM >> *To:*Rukmannagari, Shravya >> *Cc:*Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >> > > wrote: >> Hi Christian, >> We would add separate files for each intrinsic. By splitting the >> CompilerDirectives, do you mean we have to add a separate file. >> Sorry I didn?t exactly get it. >> >> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >> the CompilerDirectives changes and integrate them separately. >> >> >> Thanks, >> Shravya Rukmannagari. >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Monday, March 28, 2016 5:18 PM >> *To:*Deshpande, Vivek R > > >> *Cc:*hotspot compiler > >; Vladimir Kozlov >> >; >> Rukmannagari, Shravya > > >> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >> I left this comment in the bug: >> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we >> should put every intrinsic in its own file, like we did for >> macroAssembler_x86_sha.cpp. They are already too big: >> >> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >> Also, can we split out the CompilerDirectives changes? >> >> >> >> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >> > >> wrote: >> Hi all >> We would like to contribute a patch which optimizestan and log10 >> X86architecture usingIntel LIBM library. >> Could you please review and sponsor this patch. >> Bug-id: >> https://bugs.openjdk.java.net/browse/JDK-8152907 >> webrev: >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ >> Thanks and regards, >> Vivek > From jamsheed.c.m at oracle.com Wed Apr 6 08:10:52 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Wed, 6 Apr 2016 13:40:52 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> Message-ID: <5704C48C.2070502@oracle.com> Thanks for the reply. trying to understand stuffs. > void nmethod::add_handler_for_exception_and_pc(Handle exception, > address pc, address handler) { > // There are potential race conditions during exception cache > updates, so we > // must own the ExceptionCache_lock before doing ANY modifications. > Because > // we don't lock during reads, it is possible to have several > threads attempt > // to update the cache with the same data. We need to check for > already inserted > // copies of the current data before adding it. > > MutexLocker ml(ExceptionCache_lock); > ExceptionCache* target_entry = > exception_cache_entry_for_exception(exception); > > if (target_entry == NULL || > !target_entry->add_address_and_handler(pc,handler)) { > target_entry = new ExceptionCache(exception,pc,handler); > add_exception_cache_entry(target_entry); > } > } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). > address nmethod::handler_for_exception_and_pc(Handle exception, > address pc) { > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. > ExceptionCache* ec = exception_cache(); > while (ec != NULL) { > address ret_val; > if ((ret_val = ec->match(exception,pc)) != NULL) { > return ret_val; > } > ec = ec->next(); > } > return NULL; > } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > thanks for pointing me to it. Interesting that you have found such a > problem so shortly before me J > > My webrev addresses some aspects which are not covered by your fix: > > -add_handler_for_exception_and_pc adds a new ExceptionCache instance > in the other case. They need to get released as well. > > -The readers of the _exception_cache field are not safe, yet. As > Andrew Haley pointed out, optimizers may modify load accesses for > non-volatile fields. > > So I think my change is still needed. > > And after taking a closer look at your change, I think the _count > field which is addressed by your fix needs to be volatile as well. I > can incorporate that in my change if you like. > > Would you agree? > > Best regards, > > Martin > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of > *Jamsheed C m > *Sent:* Montag, 4. April 2016 08:14 > *To:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > "nmethod's exception cache not multi-thread safe" bug is fixed in b107 > bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 > fix changeset: > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 > discussion link: > http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html > > Best Regards, > Jamsheed > > On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s exception > cache. Readers of the cache may read stale data on weak memory > platforms. > > The writers of the cache are synchronized by locks, but there may > be concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache without > locking. > > Therefore, the nmethod's field _exception_cache needs to be > volatile and adding new entries must be done by releasing stores. > (Loading seems to be fine without acquire because there's an > address dependency from the load of the cache to the usage of its > contents which is sufficient to ensure ordering on all openjdk > platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to read > the volatile field _state only once. It is certainly undesired to > force the compiler to load it from memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Wed Apr 6 09:19:26 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 6 Apr 2016 09:19:26 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <5704C48C.2070502@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> Message-ID: Hi Jamsheed, here are the cases of add_handler_for_exception_and_pc we should talk about: Case 1: A new ExceptionCache instance needs to get added. The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below. The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL. One could argue that this is not critical, but I guess this was not intended? At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do). I think releasing the completely initialized ExceptionCache instance is a much cleaner design. Case 2: An existing ExceptionCache instance gets a new entry. In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths. I have added the acquire barrier for the _count field here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ Does this answer your questions or is anything still unclear? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 10:11 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thanks for the reply. trying to understand stuffs. void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) { // There are potential race conditions during exception cache updates, so we // must own the ExceptionCache_lock before doing ANY modifications. Because // we don't lock during reads, it is possible to have several threads attempt // to update the cache with the same data. We need to check for already inserted // copies of the current data before adding it. MutexLocker ml(ExceptionCache_lock); ExceptionCache* target_entry = exception_cache_entry_for_exception(exception); if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) { target_entry = new ExceptionCache(exception,pc,handler); add_exception_cache_entry(target_entry); } } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). address nmethod::handler_for_exception_and_pc(Handle exception, address pc) { // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. ExceptionCache* ec = exception_cache(); while (ec != NULL) { address ret_val; if ((ret_val = ec->match(exception,pc)) != NULL) { return ret_val; } ec = ec->next(); } return NULL; } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Wed Apr 6 10:53:14 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Apr 2016 12:53:14 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of Message-ID: <5704EA9A.7020202@oracle.com> Hi, please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: https://bugs.openjdk.java.net/browse/JDK-8153514 http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. Thanks, Tobias From zoltan.majo at oracle.com Wed Apr 6 10:59:08 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 6 Apr 2016 12:59:08 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704EA9A.7020202@oracle.com> References: <5704EA9A.7020202@oracle.com> Message-ID: <5704EBFC.5010804@oracle.com> Hi Tobias, that looks good to me! Best regards, Zoltan On 04/06/2016 12:53 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: > > https://bugs.openjdk.java.net/browse/JDK-8153514 > http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ > http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ > > The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. > > I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. > > Thanks, > Tobias From tobias.hartmann at oracle.com Wed Apr 6 11:05:52 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Apr 2016 13:05:52 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704EBFC.5010804@oracle.com> References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com> Message-ID: <5704ED90.6050803@oracle.com> Thanks, Zoltan! Best regards, Tobias On 06.04.2016 12:59, Zolt?n Maj? wrote: > Hi Tobias, > > > that looks good to me! > > Best regards, > > > Zoltan > > > On 04/06/2016 12:53 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >> >> https://bugs.openjdk.java.net/browse/JDK-8153514 >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >> >> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >> >> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >> >> Thanks, >> Tobias > From igor.ignatyev at oracle.com Wed Apr 6 11:38:42 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 6 Apr 2016 14:38:42 +0300 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704ED90.6050803@oracle.com> References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com> <5704ED90.6050803@oracle.com> Message-ID: Hi Tobias, looks good to me, thanks for implementing this. Thanks, ? Igor > On Apr 6, 2016, at 2:05 PM, Tobias Hartmann wrote: > > Thanks, Zoltan! > > Best regards, > Tobias > > On 06.04.2016 12:59, Zolt?n Maj? wrote: >> Hi Tobias, >> >> >> that looks good to me! >> >> Best regards, >> >> >> Zoltan >> >> >> On 04/06/2016 12:53 PM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153514 >>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >>> >>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >>> >>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >>> >>> Thanks, >>> Tobias >> From tobias.hartmann at oracle.com Wed Apr 6 11:39:42 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Apr 2016 13:39:42 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: References: <5704EA9A.7020202@oracle.com> <5704EBFC.5010804@oracle.com> <5704ED90.6050803@oracle.com> Message-ID: <5704F57E.5000505@oracle.com> Thanks, Igor! Best regards, Tobias On 06.04.2016 13:38, Igor Ignatyev wrote: > Hi Tobias, > > looks good to me, thanks for implementing this. > > Thanks, > ? Igor >> On Apr 6, 2016, at 2:05 PM, Tobias Hartmann wrote: >> >> Thanks, Zoltan! >> >> Best regards, >> Tobias >> >> On 06.04.2016 12:59, Zolt?n Maj? wrote: >>> Hi Tobias, >>> >>> >>> that looks good to me! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>> >>> On 04/06/2016 12:53 PM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153514 >>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >>>> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >>>> >>>> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >>>> >>>> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >>>> >>>> Thanks, >>>> Tobias >>> > From jamsheed.c.m at oracle.com Wed Apr 6 11:54:02 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Wed, 6 Apr 2016 17:24:02 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> Message-ID: <5704F8DA.9030000@oracle.com> Hi Martin, On 4/6/2016 2:49 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > here are the cases of add_handler_for_exception_and_pc we should talk > about: > > Case 1: A new ExceptionCache instance needs to get added. > > The storestore barrier you have added is used in the constructor of > the ExceptionCache and it releases the most critical fields of it. I > think this is what you explained in [1] in your email below. > > The new values of _count and _next fields are written afterwards and > hence not covered by this release barrier. Readers of the > _exception_cache may read _count==0 or _next==NULL. > > One could argue that this is not critical, but I guess this was not > intended? > > At least the _exception_cache field needs to be volatile to prevent > optimizers from breaking anything. This is always needed for fields > which are accessed concurrently by multiple threads without locks (as > the readers do). > > I think releasing the completely initialized ExceptionCache instance > is a much cleaner design. > Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again). Quoting comment from read path. > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. Weak memory platforms may have a few more false negatives. but isn't that OK ? This helps us, as we can remove volatile from picture, and actually good for read paths. > Case 2: An existing ExceptionCache instance gets a new entry. > > In this case your storestore barrier is good to release all updated > fields. However, we need to consider the readers, too. The _count > field needs to be volatile and the load must acquire. Otherwise, stale > data may get read by processors which perform loads on speculative paths. > storestore mem barrier handles this, as count <= no of real entries. and there is always locked slow path to check again. As said before, there may be a few more false negatives in weak memory platforms than strong ones. Best Regards, Jamsheed > I have added the acquire barrier for the _count field here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ > > > Does this answer your questions or is anything still unclear? > > Best regards, > > Martin > > *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com] > *Sent:* Mittwoch, 6. April 2016 10:11 > *To:* Doerr, Martin ; > hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Thanks for the reply. trying to understand stuffs. > > > void nmethod::add_handler_for_exception_and_pc(Handle exception, > address pc, address handler) { > // There are potential race conditions during exception cache > updates, so we > // must own the ExceptionCache_lock before doing ANY > modifications. Because > // we don't lock during reads, it is possible to have several > threads attempt > // to update the cache with the same data. We need to check for > already inserted > // copies of the current data before adding it. > > MutexLocker ml(ExceptionCache_lock); > ExceptionCache* target_entry = > exception_cache_entry_for_exception(exception); > > if (target_entry == NULL || > !target_entry->add_address_and_handler(pc,handler)) { > target_entry = new ExceptionCache(exception,pc,handler); > add_exception_cache_entry(target_entry); > } > } > > > [1]there is a storestore mem barrier before count is updated in > add_address_and_handler > this ensure exception pc and handler address are updated before count > is incremented and Exception cache entry is updated at ( > nm->_exception_cache or in the list ec->_next ). > > > address nmethod::handler_for_exception_and_pc(Handle exception, > address pc) { > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. > ExceptionCache* ec = exception_cache(); > while (ec != NULL) { > address ret_val; > if ((ret_val = ec->match(exception,pc)) != NULL) { > return ret_val; > } > ec = ec->next(); > } > return NULL; > } > > > and in read logic. we first check ec entry is available (non null > check) before proceeding further. > if ec is non null and ec_type,excpetion pc, and handler are available > by[1]. though count can be reordered and not updated with new value. > > this fixes the issue. why you think it doesn't? > > Best Regards, > Jamsheed > > On 4/5/2016 3:40 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > thanks for pointing me to it. Interesting that you have found such > a problem so shortly before me J > > My webrev addresses some aspects which are not covered by your fix: > > -add_handler_for_exception_and_pc adds a new ExceptionCache > instance in the other case. They need to get released as well. > > -The readers of the _exception_cache field are not safe, yet. As > Andrew Haley pointed out, optimizers may modify load accesses for > non-volatile fields. > > So I think my change is still needed. > > And after taking a closer look at your change, I think the _count > field which is addressed by your fix needs to be volatile as well. > I can incorporate that in my change if you like. > > Would you agree? > > Best regards, > > Martin > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf > Of *Jamsheed C m > *Sent:* Montag, 4. April 2016 08:14 > *To:* hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > "nmethod's exception cache not multi-thread safe" bug is fixed in > b107 > bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 > fix changeset: > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 > discussion link: > http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html > > Best Regards, > Jamsheed > > On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s > exception cache. Readers of the cache may read stale data on > weak memory platforms. > > The writers of the cache are synchronized by locks, but there > may be concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache > without locking. > > Therefore, the nmethod's field _exception_cache needs to be > volatile and adding new entries must be done by releasing > stores. (Loading seems to be fine without acquire because > there's an address dependency from the load of the cache to > the usage of its contents which is sufficient to ensure > ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive to > read the volatile field _state only once. It is certainly > undesired to force the compiler to load it from memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Wed Apr 6 13:24:54 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 6 Apr 2016 13:24:54 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <5704F8DA.9030000@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> Message-ID: Hi Jamsheed and all, thanks for your explanation. About Case 1: I basically agree with that reading _next==NULL or _count==0 only leads to false negatives and is not critical. Yes, we could live with a few more false negatives on weak memory model platforms (even though this is not my preferred design). About Case 2: What I'm missing on the reader's side of the _count field is something which prevents processors from speculatively loading the contents of the ExceptionCache. In ExceptionCache::test_address, the _count only affects the control flow. PPC and ARM processors can predict branches which depend on the _count field and load speculatively from the pc and handler fields (which may be stale data!). Due to out-of-order execution of the loads, it can actually happen, that the new _count value is observed, but stale data is read from pc and handler fields. I guess it is highly unlikely that we will ever observe this, but there's no guarantee. I think my concern about using non-volatile fields for the _exception_cache is also still valid. Nothing prevents C++ Compilers from loading the pointer twice from memory. They may expect to get the pointer to the same instance both times but actually get two different ones. For example, this may lead to the situation that handler_for_exception_and_pc uses one ExceptionCache instance for calling the match function and another one (du to reload of non-volatile field) for calling next(). May other people would like to comment on this lengthy discussion as well? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 13:54 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, On 4/6/2016 2:49 PM, Doerr, Martin wrote: Hi Jamsheed, here are the cases of add_handler_for_exception_and_pc we should talk about: Case 1: A new ExceptionCache instance needs to get added. The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below. The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL. One could argue that this is not critical, but I guess this was not intended? At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do). I think releasing the completely initialized ExceptionCache instance is a much cleaner design. Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again). Quoting comment from read path. // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. Weak memory platforms may have a few more false negatives. but isn't that OK ? This helps us, as we can remove volatile from picture, and actually good for read paths. Case 2: An existing ExceptionCache instance gets a new entry. In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths. storestore mem barrier handles this, as count <= no of real entries. and there is always locked slow path to check again. As said before, there may be a few more false negatives in weak memory platforms than strong ones. Best Regards, Jamsheed I have added the acquire barrier for the _count field here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ Does this answer your questions or is anything still unclear? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 10:11 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thanks for the reply. trying to understand stuffs. void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) { // There are potential race conditions during exception cache updates, so we // must own the ExceptionCache_lock before doing ANY modifications. Because // we don't lock during reads, it is possible to have several threads attempt // to update the cache with the same data. We need to check for already inserted // copies of the current data before adding it. MutexLocker ml(ExceptionCache_lock); ExceptionCache* target_entry = exception_cache_entry_for_exception(exception); if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) { target_entry = new ExceptionCache(exception,pc,handler); add_exception_cache_entry(target_entry); } } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). address nmethod::handler_for_exception_and_pc(Handle exception, address pc) { // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. ExceptionCache* ec = exception_cache(); while (ec != NULL) { address ret_val; if ((ret_val = ec->match(exception,pc)) != NULL) { return ret_val; } ec = ec->next(); } return NULL; } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Wed Apr 6 16:01:19 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Apr 2016 17:01:19 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: Add Arrays.fill stub code In-Reply-To: References: Message-ID: <570532CF.1050903@redhat.com> On 04/06/2016 01:51 PM, Long Chen wrote: > Please review this patch for generating stub code for ArrayFill on aarch64 > platform. > > Performance test case: > http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.java > Testing result: http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.html > Patch: http://people.linaro.org/~long.chen/ArrayFill/ArrayFill.patch > > At same time, refactoring ClearArrayNode's code generation, as it can be > used by Array fill too. Looks good. Minor nit: + // Generate stub for disjoint fill. If "aligned" is true, the + // "to" address is assumed to be heapword aligned. "disjoint" doesn't make any sense here. > Following up I would like to propose a patch to use DC ZVA for large array > zeroing. OK. I haven't seen much advantage to using DC ZVA, but I'm prepared to listen. Andrew. From vladimir.kozlov at oracle.com Wed Apr 6 16:50:34 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Apr 2016 09:50:34 -0700 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <5704EA9A.7020202@oracle.com> References: <5704EA9A.7020202@oracle.com> Message-ID: <57053E5A.6050705@oracle.com> Looks good. Thanks, Vladimir On 4/6/16 3:53 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: > > https://bugs.openjdk.java.net/browse/JDK-8153514 > http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ > http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ > > The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. > > I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. > > Thanks, > Tobias > From vivek.r.deshpande at intel.com Wed Apr 6 17:14:58 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 6 Apr 2016 17:14:58 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <5704312C.9000605@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> <5704312C.9000605@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> Hi Vladimir Please let me know, if I need to provide an updated patch with this change. Thanks for all your help. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 05, 2016 2:42 PM To: Deshpande, Vivek R; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: error: use of undeclared identifier 'SharedRuntime' __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); Note templateInterpreterGenerator_x86_32.cpp has that #include. It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. Vladimir On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: > HI Vladimir > > Sorry about that. > Please check this webrev > http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02 > / > I have updated it. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 1:47 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > I again can't apply changes because of CR at the end of lines in patch file. > > Vladimir > > On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >> >> Hi Vladimir >> >> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >> Thank you for the review. >> >> Regards, >> Vivek >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:34 PM >> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >> >> I will start pre-integration testing. >> >> Thanks, >> Vladimir >> >> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>> Hi Christian >>> >>> We have updated the patch as per the suggested changes. >>> >>> The webrev for the same is at this location for your review. >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>> 0 >>> 1 >>> / >>> >>> We will soon send another patch for CompilerDirectives changes. >>> >>> Regards, >>> >>> Vivek >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>> *To:* Rukmannagari, Shravya >>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>> >> > wrote: >>> >>> Hi Christian, >>> >>> We would add separate files for each intrinsic. By splitting the >>> CompilerDirectives, do you mean we have to add a separate file. >>> Sorry I didn?t exactly get it. >>> >>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>> the CompilerDirectives changes and integrate them separately. >>> >>> >>> >>> Thanks, >>> >>> Shravya Rukmannagari. >>> >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>> > >>> *Cc:*hotspot compiler >> >; Vladimir Kozlov >>> >; >>> Rukmannagari, Shravya >> > >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> I left this comment in the bug: >>> >>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>> we should put every intrinsic in its own file, like we did for >>> macroAssembler_x86_sha.cpp. They are already too big: >>> >>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>> >>> Also, can we split out the CompilerDirectives changes? >>> >>> >>> >>> >>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>> > >>> wrote: >>> >>> Hi all >>> >>> We would like to contribute a patch which optimizestan and log10 >>> X86architecture usingIntel LIBM library. >>> >>> Could you please review and sponsor this patch. >>> >>> Bug-id: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>> webrev: >>> >>> >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>> 0 >>> 0 >>> / >>> >>> Thanks and regards, >>> >>> Vivek >>> From christian.thalinger at oracle.com Wed Apr 6 17:14:56 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 6 Apr 2016 07:14:56 -1000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <57046A84.6040707@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <1F40DA68-E79D-4372-9234-B64CE85B662B@oracle.com> <57046A84.6040707@oracle.com> Message-ID: > On Apr 5, 2016, at 3:46 PM, Vladimir Kozlov wrote: > > Multiple files are not always good. May be in a future we can rewrite this code to use shared parts (code or data). I think current split is enough for these changes. Alright. > > Thanks, > Vladimir > > On 4/5/16 6:37 PM, Christian Thalinger wrote: >> >>> On Apr 4, 2016, at 8:25 PM, Deshpande, Vivek R >>> > wrote: >>> >>> Hi Christian >>> We have updated the patch as per the suggested changes. >>> The webrev for the same is at this location for your review. >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.01/ >> >> There are: >> >> 73 #ifdef _LP64 >> >> 368 #else >> >> 655 #endif >> >> in the new files but I don?t see them share any code. Maybe it would be >> better to have dedicated x86_32 and x86_64 files. Then the ifdefs are >> not required. >> >>> We will soon send another patch for CompilerDirectives changes. >>> Regards, >>> Vivek >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Tuesday, March 29, 2016 11:29 AM >>> *To:*Rukmannagari, Shravya >>> *Cc:*Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>> >> > wrote: >>> Hi Christian, >>> We would add separate files for each intrinsic. By splitting the >>> CompilerDirectives, do you mean we have to add a separate file. >>> Sorry I didn?t exactly get it. >>> >>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>> the CompilerDirectives changes and integrate them separately. >>> >>> >>> Thanks, >>> Shravya Rukmannagari. >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Monday, March 28, 2016 5:18 PM >>> *To:*Deshpande, Vivek R >> > >>> *Cc:*hotspot compiler >> >; Vladimir Kozlov >>> >; >>> Rukmannagari, Shravya >> > >>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> I left this comment in the bug: >>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files we >>> should put every intrinsic in its own file, like we did for >>> macroAssembler_x86_sha.cpp. They are already too big: >>> >>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>> Also, can we split out the CompilerDirectives changes? >>> >>> >>> >>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>> > >>> wrote: >>> Hi all >>> We would like to contribute a patch which optimizestan and log10 >>> X86architecture usingIntel LIBM library. >>> Could you please review and sponsor this patch. >>> Bug-id: >>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>> webrev: >>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.00/ >>> Thanks and regards, >>> Vivek >> From vladimir.kozlov at oracle.com Wed Apr 6 17:17:47 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Apr 2016 10:17:47 -0700 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> <5704312C.9000605@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> Message-ID: <570544BB.6040509@oracle.com> I added #include myself and PIT testing passed. I need to know who is author or contributor. Thanks, Vladimir On 4/6/16 10:14 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please let me know, if I need to provide an updated patch with this change. > Thanks for all your help. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 2:42 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: > > hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: > error: use of undeclared identifier 'SharedRuntime' > __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); > > Note templateInterpreterGenerator_x86_32.cpp has that #include. > > It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. > > Vladimir > > > > On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: >> HI Vladimir >> >> Sorry about that. >> Please check this webrev >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.02 >> / >> I have updated it. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:47 PM >> To: Deshpande, Vivek R; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I again can't apply changes because of CR at the end of lines in patch file. >> >> Vladimir >> >> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >>> >>> Hi Vladimir >>> >>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >>> Thank you for the review. >>> >>> Regards, >>> Vivek >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, April 05, 2016 1:34 PM >>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >>> Cc: hotspot compiler >>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >>> >>> I will start pre-integration testing. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>>> Hi Christian >>>> >>>> We have updated the patch as per the suggested changes. >>>> >>>> The webrev for the same is at this location for your review. >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 1 >>>> / >>>> >>>> We will soon send another patch for CompilerDirectives changes. >>>> >>>> Regards, >>>> >>>> Vivek >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>>> *To:* Rukmannagari, Shravya >>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>>> >>> > wrote: >>>> >>>> Hi Christian, >>>> >>>> We would add separate files for each intrinsic. By splitting the >>>> CompilerDirectives, do you mean we have to add a separate file. >>>> Sorry I didn?t exactly get it. >>>> >>>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement for >>>> the CompilerDirectives changes and integrate them separately. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Shravya Rukmannagari. >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>>> > >>>> *Cc:*hotspot compiler >>> >; Vladimir Kozlov >>>> >; >>>> Rukmannagari, Shravya >>> > >>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> I left this comment in the bug: >>>> >>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>>> we should put every intrinsic in its own file, like we did for >>>> macroAssembler_x86_sha.cpp. They are already too big: >>>> >>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>>> >>>> Also, can we split out the CompilerDirectives changes? >>>> >>>> >>>> >>>> >>>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>>> > >>>> wrote: >>>> >>>> Hi all >>>> >>>> We would like to contribute a patch which optimizestan and log10 >>>> X86architecture usingIntel LIBM library. >>>> >>>> Could you please review and sponsor this patch. >>>> >>>> Bug-id: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>>> webrev: >>>> >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 0 >>>> / >>>> >>>> Thanks and regards, >>>> >>>> Vivek >>>> From vivek.r.deshpande at intel.com Wed Apr 6 17:24:01 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 6 Apr 2016 17:24:01 +0000 Subject: RFR (M): 8152907: Update for tan and log10 for x86 In-Reply-To: <570544BB.6040509@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A52043@ORSMSX106.amr.corp.intel.com> <98666E26-763E-40E9-838B-B612D4BAF468@oracle.com> <8D6F463991A1574A8A803B8DA605414F6098A1@ORSMSX111.amr.corp.intel.com> <97DECB4B-8B7D-4FD1-806B-43823C07F809@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A595CB@ORSMSX106.amr.corp.intel.com> <5704211E.5090007@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59DBD@ORSMSX106.amr.corp.intel.com> <57042460.5070306@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A59E3B@ORSMSX106.amr.corp.intel.com> <5704312C.9000605@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB4B@ORSMSX106.amr.corp.intel.com> <570544BB.6040509@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A5AB87@ORSMSX106.amr.corp.intel.com> Thanks Vladimir. Code contributed by: Shravya Rukmannagari (shravya.rukmannagari at intel.com) and Vivek Deshpande (vivek.r.deshpande at intel.com) Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 06, 2016 10:18 AM To: Deshpande, Vivek R; Rukmannagari, Shravya Cc: hotspot compiler Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 I added #include myself and PIT testing passed. I need to know who is author or contributor. Thanks, Vladimir On 4/6/16 10:14 AM, Deshpande, Vivek R wrote: > Hi Vladimir > > Please let me know, if I need to provide an updated patch with this change. > Thanks for all your help. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 05, 2016 2:42 PM > To: Deshpande, Vivek R; Rukmannagari, Shravya > Cc: hotspot compiler > Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 > > Problem found during build. Looks like we need #include "runtime/sharedRuntime.hpp" in templateInterpreterGenerator_x86_64.cpp: > > hotspot/src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp:379:56: > error: use of undeclared identifier 'SharedRuntime' > __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, > SharedRuntime::dexp))); > > Note templateInterpreterGenerator_x86_32.cpp has that #include. > > It was on macosx where -DDONT_USE_PRECOMPILED_HEADER is used. > > Vladimir > > > > On 4/5/16 2:27 PM, Deshpande, Vivek R wrote: >> HI Vladimir >> >> Sorry about that. >> Please check this webrev >> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev.0 >> 2 >> / >> I have updated it. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 05, 2016 1:47 PM >> To: Deshpande, Vivek R; Rukmannagari, Shravya >> Cc: hotspot compiler >> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >> >> I again can't apply changes because of CR at the end of lines in patch file. >> >> Vladimir >> >> On 4/5/16 1:41 PM, Deshpande, Vivek R wrote: >>> >>> Hi Vladimir >>> >>> I will send you the patch with macroAssembler_libm_x86_*.cpp files removed. >>> Thank you for the review. >>> >>> Regards, >>> Vivek >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Tuesday, April 05, 2016 1:34 PM >>> To: Deshpande, Vivek R; Christian Thalinger; Rukmannagari, Shravya >>> Cc: hotspot compiler >>> Subject: Re: RFR (M): 8152907: Update for tan and log10 for x86 >>> >>> It looks good to me but I don't see macroAssembler_libm_x86_*.cpp files changes in webrev. Did you used 'hg remove' or simple removed them? >>> >>> I will start pre-integration testing. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/4/16 11:25 PM, Deshpande, Vivek R wrote: >>>> Hi Christian >>>> >>>> We have updated the patch as per the suggested changes. >>>> >>>> The webrev for the same is at this location for your review. >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 1 >>>> / >>>> >>>> We will soon send another patch for CompilerDirectives changes. >>>> >>>> Regards, >>>> >>>> Vivek >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:* Tuesday, March 29, 2016 11:29 AM >>>> *To:* Rukmannagari, Shravya >>>> *Cc:* Deshpande, Vivek R; Vladimir Kozlov; hotspot compiler >>>> *Subject:* Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> On Mar 29, 2016, at 6:38 AM, Rukmannagari, Shravya >>>> >>> > wrote: >>>> >>>> Hi Christian, >>>> >>>> We would add separate files for each intrinsic. By splitting the >>>> CompilerDirectives, do you mean we have to add a separate file. >>>> Sorry I didn?t exactly get it. >>>> >>>> Oh, sorry, I wasn?t clear enough. Please file a new enhancement >>>> for the CompilerDirectives changes and integrate them separately. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Shravya Rukmannagari. >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Monday, March 28, 2016 5:18 PM *To:*Deshpande, Vivek R >>>> > >>>> *Cc:*hotspot compiler >>> >; Vladimir Kozlov >>>> >; >>>> Rukmannagari, Shravya >>> > >>>> *Subject:*Re: RFR (M): 8152907: Update for tan and log10 for x86 >>>> >>>> I left this comment in the bug: >>>> >>>> I think for the saneness of the macroAssembler_libm_x86_*.cpp files >>>> we should put every intrinsic in its own file, like we did for >>>> macroAssembler_x86_sha.cpp. They are already too big: >>>> >>>> $ wc -l hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_*.cpp >>>> 4571 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_32.cpp >>>> 3945 hotspot/src/cpu/x86/vm/macroAssembler_libm_x86_64.cpp >>>> >>>> Also, can we split out the CompilerDirectives changes? >>>> >>>> >>>> >>>> >>>> On Mar 28, 2016, at 1:52 PM, Deshpande, Vivek R >>>> > >>>> wrote: >>>> >>>> Hi all >>>> >>>> We would like to contribute a patch which optimizestan and log10 >>>> X86architecture usingIntel LIBM library. >>>> >>>> Could you please review and sponsor this patch. >>>> >>>> Bug-id: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8152907 >>>> webrev: >>>> >>>> >>>> http://cr.openjdk.java.net/~vdeshpande/libm_tanlog10/8152907/webrev. >>>> 0 >>>> 0 >>>> / >>>> >>>> Thanks and regards, >>>> >>>> Vivek >>>> From igor.veresov at oracle.com Wed Apr 6 17:49:31 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 6 Apr 2016 10:49:31 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> Message-ID: <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately. Thanks, igor > On Apr 5, 2016, at 4:22 PM, Igor Veresov wrote: > > >> On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: >> >> Igor, >> >> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter >> for instance? > > Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. > >> >> If so, I am ok with checking this in - further notes below. >> >>> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >>> >>> >>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>>> >>>> I am in agreement with Lois that the JVMS looks good with moving the exception. >>> >>> Thanks! >>>> >>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>>> meeting I will check one more time. It might be worth adding a comment. >>> >>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >>> >>>> >>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>>> >>> >>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). >> >> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match >> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. >> That is ok with me - I will add a note to the bug. > > Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? > >> >> Also: I see a ciMethod::check_call that has a comment - >> IT appears to fail when applied to an invoke interface call site. >> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. >> > > This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? > > igor > >> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take >> the subtleties of invoke interface and invoke special into account. >>> >>> igor >>> >>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>>> so that you get the correct behavior depending on the requesting byte code. >>>> >>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>>> >>>> thanks, >>>> Karen >>>> >>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>>> >>>>> >>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>> Hi Lois, >>>>>> >>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>> >>>>>> igor >>>>> Hi Igor, >>>>> >>>>> Thanks for waiting on this. A couple of comments: >>>>> >>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>> >>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>> >>>>> Just curious did you also run the testbase default methods tests? >>>>> Lois >>>>> >>>>>> >>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>> >>>>>>> Hi Igor, >>>>>>> >>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>> >>>>>>> Thanks, >>>>>>> Lois >>>>>>> >>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>> >>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>> >>>>>>>> Thanks, >>>>>>>> igor > From tobias.hartmann at oracle.com Thu Apr 7 06:13:06 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 7 Apr 2016 08:13:06 +0200 Subject: [9] RFR(S): 8153514: Whitebox API should allow compilation of In-Reply-To: <57053E5A.6050705@oracle.com> References: <5704EA9A.7020202@oracle.com> <57053E5A.6050705@oracle.com> Message-ID: <5705FA72.4030304@oracle.com> Thanks, Vladimir. Best regards, Tobias On 06.04.2016 18:50, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/6/16 3:53 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch that adds support to the Whitebox API to compile the static initializer of a class: >> >> https://bugs.openjdk.java.net/browse/JDK-8153514 >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.root.00/ >> http://cr.openjdk.java.net/~thartmann/8153514/webrev.00/ >> >> The static initializer is not accessible through reflection and can therefore not be compiled by using the existing WhiteBox::enqueueMethodForCompilation feature. >> >> I did not add tests for the new feature because that would require implementing additional methods to check for a successful compilation (or parsing the log compilation output). The feature will be exercised by JDK-8153515 which should be enough for testing. >> >> Thanks, >> Tobias >> From rahul.v.raghavan at oracle.com Thu Apr 7 06:43:08 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 6 Apr 2016 23:43:08 -0700 (PDT) Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> Message-ID: > -----Original Message----- > From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: Tuesday, April 05, 2016 1:35 AM > > FYI > > -----Original Message----- > From: Berg, Michael C > Sent: Monday, April 04, 2016 12:42 PM > To: 'Rahul Raghavan' > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > > Looks ok Rahul. Thank you Michael. > > Thanks, > Michael > > -----Original Message----- > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > Sent: Monday, April 04, 2016 1:09 AM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Dean Long ; Berg, Michael C ; Tobias Hartmann > ; Vladimir Ivanov > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > > Hi, > > Please review the revised fix for JDK- 8149488. > > : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ > > Based on further checking and thanks to clarifications from Michael, it was verified that 8149488 issue can be fixed by just correcting > the bitsInByte size to 256 in 'regmask.cpp', (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). > > Points from Michael for the record - " > > I believe Dean is right, I have debugged this and analyzed the usage model, > > we never made use of the upper components > > and register allocation has been right for VecZ for a good deal of time. > > > > All we need for a change is, > > Regmask.cpp: > > > > uint RegMask::Size() const { > > extern uint8_t bitsInByte[256]; > > > > A one line change. > > > > -Michael. > > > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > > where we make use of VecZ and the upper bank of registers." > > So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. > > Confirmed no issues with 'JPRT -testset hotspot' run. > > Thanks, > Rahul > > > -----Original Message----- > > From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > > > > Michael, isn't the correct size for this table 256? I missed how VecZ > > relates to the table size. > > > > dl > > > > On 3/31/2016 9:58 AM, Berg, Michael C wrote: > > > Up until now we have gotten along with the size constraint only. > > > Let us have both the size and the table though for completeness. > > > I think we can leave the name though. > > > > > > -Michael > > > > > > -----Original Message----- > > > From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > > > Sent: Thursday, March 31, 2016 9:18 AM > > > To: Dean Long ; > > > hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > > > > > > Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in > > > regmask.cpp > > > > > > Hi Michael, > > > > > > With respect to below thread, request help with some questions. > > > Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet > Size. > > > Also comment got was for requirement to extend bitsInByte table to > > > 512 size, for consistent mapping for VecZ register also, on > > targets that support it. > > > But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > > > Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > > > > > > So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > > > (without extending current bitsInByte array contents) (Anyhow at > > > present values above 0xFF is never indexed for bitsInByte in > > RegMask::Size()) > > > > > > ----- src/share/vm/libadt/vectset.hpp > > > +#define BITS_IN_BYTE_ARRAY_SIZE 256 > > > + > > > > > > ----- src/share/vm/opto/regmask.cpp > > > - extern uint8_t bitsInByte[512]; > > > + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > > > > > > ----- src/share/vm/libadt/vectset.cpp > > > -uint8_t bitsInByte[256] = { > > > +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > > > > > > I can send revised webrev for above if all okay. Please tell me if I am missing something. > > > > > > > > > OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > > > (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > > > > > > Thanks, > > > Rahul > > > > > >> -----Original Message----- > > >> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > > >> > > >>> -----Original Message----- > > >>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > > >>> > > >>> When do we access elements 256 .. 511? Wouldn't that mean we have > > >>> 9-bit bytes? > > >> Got your point Dean, Thanks. > > >> I too got some questions here now; will check and reply soon. > > >> > > >> -Rahul > > >> > > >>> dl > > >>> > > >>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > > >>>> Hi, > > >>>> > > >>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > > >>>> > > >>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > > >>>> : > > >>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > > >>>> > > >>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > > >>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > > >>>> Confirmed no issues with 'JPRT -testset hotspot' run. > > >>>> > > >>>> Thanks, > > >>>> Rahul > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > > >>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > > >>> dev at openjdk.java.net > > >>>>> Should we not extend: > > >>>>> > > >>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > > >>>>> uint8_t bitsInByte[256] = { // ... > > >>>>> > > >>>>> to 512 > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > > >>>>> > > >>>>> So how do we intend to map a VecZ register without 512 bits? > > >>>>> > > >>>>> -Michael > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: hotspot-compiler-dev > > >>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > > >>>>> Of Vladimir Ivanov > > >>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > > >>>>> hotspot-compiler-dev at openjdk.java.net > > >>>>> > > >>>>> Rahul, > > >>>>> > > >>>>> Can we define a constant instead and use it in both places? > > >>>>> > > >>>>> Best regards, > > >>>>> Vladimir Ivanov > > >>>>> > > >>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > > >>>>>> Hi, > > >>>>>> > > >>>>>> Please review the patch for JDK- 8149488. > > >>>>>> > > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > > >>>>>> Webrev: > > >>>>>> http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > > >>>>>> > > >>>>>> Corrected the bitsInByte array size in declaration. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Rahul > > >>>>>> > > From rahul.v.raghavan at oracle.com Thu Apr 7 06:43:42 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 6 Apr 2016 23:43:42 -0700 (PDT) Subject: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp In-Reply-To: <5702B3C5.8070507@oracle.com> References: <56FC2A4B.5030905@oracle.com> <56FD74F2.2080102@oracle.com> <02d43f0c-f8a1-4f8c-9242-c110d78c314f@default> <5702B3C5.8070507@oracle.com> Message-ID: <224945da-cb88-4653-af5d-2865ac42ee08@default> > -----Original Message----- > From: Dean Long > Sent: Tuesday, April 05, 2016 12:05 AM > > Looks OK. Thank you Dean. > > dl > > On 4/4/2016 1:09 AM, Rahul Raghavan wrote: > > Hi, > > > > Please review the revised fix for JDK- 8149488. > > > > : http://cr.openjdk.java.net/~rraghavan/8149488/webrev.02/ > > > > Based on further checking and thanks to clarifications from Michael, > > it was verified that 8149488 issue can be fixed by just correcting the bitsInByte size to 256 in 'regmask.cpp', > > (and that earlier mentioned case of extending bitsInByte table size to 512, is not required). > > > > Points from Michael for the record - " > > > I believe Dean is right, I have debugged this and analyzed the usage model, > > > we never made use of the upper components > > > and register allocation has been right for VecZ for a good deal of time. > > > > > > All we need for a change is, > > > Regmask.cpp: > > > > > > uint RegMask::Size() const { > > > extern uint8_t bitsInByte[256]; > > > > > > A one line change. > > > > > > -Michael. > > > > > > p.s. I ran this on sde for the new hw and on the hw itself, found all is working with the above change. > > > I tested SPECjvm2008 which as tons of EVEX code generated in it on SKX > > > where we make use of VecZ and the upper bank of registers." > > > > So in this revised webrev.02, defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 256. > > > > Confirmed no issues with 'JPRT -testset hotspot' run. > > > > Thanks, > > Rahul > > > >> -----Original Message----- > >> From: Dean Long > Sent: Friday, April 01, 2016 12:35 AM > >> > >> Michael, isn't the correct size for this table 256? I missed how VecZ > >> relates to the table size. > >> > >> dl > >> > >> On 3/31/2016 9:58 AM, Berg, Michael C wrote: > >>> Up until now we have gotten along with the size constraint only. > >>> Let us have both the size and the table though for completeness. > >>> I think we can leave the name though. > >>> > >>> -Michael > >>> > >>> -----Original Message----- > >>> From: Rahul Raghavan [mailto:rahul.v.raghavan at oracle.com] > >>> Sent: Thursday, March 31, 2016 9:18 AM > >>> To: Dean Long ; hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > >>> Subject: RE: RFR(S): 8149488: Incorrect declaration of bitsInByte in regmask.cpp > >>> > >>> Hi Michael, > >>> > >>> With respect to below thread, request help with some questions. > >>> Understood that the bitsInByte[] lookup table tells the number of bits set for the looked-up number and is useful in VectorSet > Size. > >>> Also comment got was for requirement to extend bitsInByte table to 512 size, for consistent mapping for VecZ register also, on > >> targets that support it. > >>> But currently for bitsInByte used in VectorSet::Size(), RegMask::Size(), only the original 256 size is sufficient or useful here. > >>> Could not find usage of extended set of values in bitsInByte table! Is it for something to be used in future? > >>> > >>> So for now it seems following original fix for 8149488 - only correcting the size at RegMask::Size(), seems to be okay? > >>> (without extending current bitsInByte array contents) (Anyhow at present values above 0xFF is never indexed for bitsInByte in > >> RegMask::Size()) > >>> ----- src/share/vm/libadt/vectset.hpp > >>> +#define BITS_IN_BYTE_ARRAY_SIZE 256 > >>> + > >>> > >>> ----- src/share/vm/opto/regmask.cpp > >>> - extern uint8_t bitsInByte[512]; > >>> + extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > >>> > >>> ----- src/share/vm/libadt/vectset.cpp > >>> -uint8_t bitsInByte[256] = { > >>> +uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > >>> > >>> I can send revised webrev for above if all okay. Please tell me if I am missing something. > >>> > >>> > >>> OR else if extending bitsInByte array size and entries indeed is the correct fix, the array name also should be changed, right ? > >>> (to maybe 'bitCountArray[BIT_COUNT_ARRAY_SIZE]') > >>> > >>> Thanks, > >>> Rahul > >>> > >>>> -----Original Message----- > >>>> From: Rahul Raghavan > Sent: Thursday, March 31, 2016 6:49 PM > >>>> > >>>>> -----Original Message----- > >>>>> From: Dean Long > Sent: Thursday, March 31, 2016 1:05 AM > >>>>> > >>>>> When do we access elements 256 .. 511? Wouldn't that mean we have > >>>>> 9-bit bytes? > >>>> Got your point Dean, Thanks. > >>>> I too got some questions here now; will check and reply soon. > >>>> > >>>> -Rahul > >>>> > >>>>> dl > >>>>> > >>>>> On 3/30/2016 12:19 PM, Rahul Raghavan wrote: > >>>>>> Hi, > >>>>>> > >>>>>> With respect to below email thread, request help to review revised webrev.01 for 8149488. > >>>>>> > >>>>>> : https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>> : > >>>>>> http://cr.openjdk.java.net/~rraghavan/8149488/webrev.01/ > >>>>>> > >>>>>> Now as required, fixed the issue by extending bitsInByte array size from 256 to 512. > >>>>>> Defined and used BITS_IN_BYTE_ARRAY_SIZE for the array size as 512. > >>>>>> Confirmed no issues with 'JPRT -testset hotspot' run. > >>>>>> > >>>>>> Thanks, > >>>>>> Rahul > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Berg, Michael C [mailto:michael.c.berg at intel.com] > Sent: > >>>>>>> Thursday, February 11, 2016 11:32 PM > To: hotspot-compiler- > >>>>> dev at openjdk.java.net > >>>>>>> Should we not extend: > >>>>>>> > >>>>>>> This does not match the actual definition in share/vm/libadt/vectset.cpp: > >>>>>>> uint8_t bitsInByte[256] = { // ... > >>>>>>> > >>>>>>> to 512 > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Berg, Michael C > Sent: Thursday, February 11, 2016 9:57 AM > To: 'Vladimir Ivanov' > >>>>>>> > >>>>>>> So how do we intend to map a VecZ register without 512 bits? > >>>>>>> > >>>>>>> -Michael > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: hotspot-compiler-dev > >>>>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf > >>>>>>> Of Vladimir Ivanov > >>>>>>> Sent: Thursday, February 11, 2016 4:54 AM > To: Rahul Raghavan; > >>>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>>> > >>>>>>> Rahul, > >>>>>>> > >>>>>>> Can we define a constant instead and use it in both places? > >>>>>>> > >>>>>>> Best regards, > >>>>>>> Vladimir Ivanov > >>>>>>> > >>>>>>> On 2/11/16 10:42 AM, Rahul Raghavan wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Please review the patch for JDK- 8149488. > >>>>>>>> > >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8149488 > >>>>>>>> Webrev: http://cr.openjdk.java.net/~thartmann/8149488/webrev.00/ > >>>>>>>> > >>>>>>>> Corrected the bitsInByte array size in declaration. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Rahul > >>>>>>>> > From aleksey.shipilev at oracle.com Thu Apr 7 07:30:36 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 10:30:36 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <56FE87C2.50002@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> Message-ID: <57060C9C.4000000@oracle.com> On 04/01/2016 05:37 PM, Aleksey Shipilev wrote: > On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >> I would like to solicit comments for C1 support for new >> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >> The rest of new Unsafe methods that are not intrinsified by C1 are >> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >> emulated with existing APIs. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8152753 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ > > Update: > http://cr.openjdk.java.net/~shade/8152753/webrev.01/ > > Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some > other cleanups. > > Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT > hs-comp testset (some unrelated timeouts on SPARC). Anyone? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From jamsheed.c.m at oracle.com Thu Apr 7 08:41:34 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Thu, 7 Apr 2016 14:11:34 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> Message-ID: <57061D3E.8050408@oracle.com> Hi Martin, one comment: the count increment update should use release store + atomic update. ref: https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming Best Regards Jamsheed On 4/6/2016 6:54 PM, Doerr, Martin wrote: > > Hi Jamsheed and all, > > thanks for your explanation. > > About Case 1: > > I basically agree with that reading _next==NULL or _count==0 only > leads to false negatives and is not critical. > > Yes, we could live with a few more false negatives on weak memory > model platforms (even though this is not my preferred design). > > About Case 2: > > What I?m missing on the reader?s side of the _count field is something > which prevents processors from speculatively loading the contents of > the ExceptionCache. > > In ExceptionCache::test_address, the _count only affects the control flow. > > PPC and ARM processors can predict branches which depend on the _count > field and load speculatively from the pc and handler fields (which may > be stale data!). > > Due to out-of-order execution of the loads, it can actually happen, > that the new _count value is observed, but stale data is read from pc > and handler fields. > > I guess it is highly unlikely that we will ever observe this, but > there?s no guarantee. > > I think my concern about using non-volatile fields for the > _exception_cache is also still valid. > > Nothing prevents C++ Compilers from loading the pointer twice from > memory. They may expect to get the pointer to the same instance both > times but actually get two different ones. > > For example, this may lead to the situation that > handler_for_exception_and_pc uses one ExceptionCache instance for > calling the match function and another one (du to reload of > non-volatile field) for calling next(). > > May other people would like to comment on this lengthy discussion as well? > > Best regards, > > Martin > > *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com] > *Sent:* Mittwoch, 6. April 2016 13:54 > *To:* Doerr, Martin ; > hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > On 4/6/2016 2:49 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > here are the cases of add_handler_for_exception_and_pc we should > talk about: > > Case 1: A new ExceptionCache instance needs to get added. > > The storestore barrier you have added is used in the constructor > of the ExceptionCache and it releases the most critical fields of > it. I think this is what you explained in [1] in your email below. > > The new values of _count and _next fields are written afterwards > and hence not covered by this release barrier. Readers of the > _exception_cache may read _count==0 or _next==NULL. > > One could argue that this is not critical, but I guess this was > not intended? > > At least the _exception_cache field needs to be volatile to > prevent optimizers from breaking anything. This is always needed > for fields which are accessed concurrently by multiple threads > without locks (as the readers do). > > I think releasing the completely initialized ExceptionCache > instance is a much cleaner design. > > Having count < actual entries, or having _next = null is OK (as there > is always (locked)slow path to check again). > Quoting comment from read path. > > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen during > // the first few exception lookups for a given nmethod. > > Weak memory platforms may have a few more false negatives. but isn't > that OK ? > This helps us, as we can remove volatile from picture, and actually > good for read paths. > > > Case 2: An existing ExceptionCache instance gets a new entry. > > In this case your storestore barrier is good to release all > updated fields. However, we need to consider the readers, too. The > _count field needs to be volatile and the load must acquire. > Otherwise, stale data may get read by processors which perform > loads on speculative paths. > > storestore mem barrier handles this, as count <= no of real entries. > and there is always locked slow path to check again. > As said before, there may be a few more false negatives in weak memory > platforms than strong ones. > > Best Regards, > Jamsheed > > > I have added the acquire barrier for the _count field here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ > > > Does this answer your questions or is anything still unclear? > > Best regards, > > Martin > > *From:*Jamsheed C m [mailto:jamsheed.c.m at oracle.com] > *Sent:* Mittwoch, 6. April 2016 10:11 > *To:* Doerr, Martin > ; > hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Thanks for the reply. trying to understand stuffs. > > > > void nmethod::add_handler_for_exception_and_pc(Handle > exception, address pc, address handler) { > // There are potential race conditions during exception > cache updates, so we > // must own the ExceptionCache_lock before doing ANY > modifications. Because > // we don't lock during reads, it is possible to have > several threads attempt > // to update the cache with the same data. We need to check > for already inserted > // copies of the current data before adding it. > > MutexLocker ml(ExceptionCache_lock); > ExceptionCache* target_entry = > exception_cache_entry_for_exception(exception); > > if (target_entry == NULL || > !target_entry->add_address_and_handler(pc,handler)) { > target_entry = new ExceptionCache(exception,pc,handler); > add_exception_cache_entry(target_entry); > } > } > > > [1]there is a storestore mem barrier before count is updated in > add_address_and_handler > this ensure exception pc and handler address are updated before > count is incremented and Exception cache entry is updated at ( > nm->_exception_cache or in the list ec->_next ). > > > > address nmethod::handler_for_exception_and_pc(Handle > exception, address pc) { > // We never grab a lock to read the exception cache, so we may > // have false negatives. This is okay, as it can only happen > during > // the first few exception lookups for a given nmethod. > ExceptionCache* ec = exception_cache(); > while (ec != NULL) { > address ret_val; > if ((ret_val = ec->match(exception,pc)) != NULL) { > return ret_val; > } > ec = ec->next(); > } > return NULL; > } > > > and in read logic. we first check ec entry is available (non null > check) before proceeding further. > if ec is non null and ec_type,excpetion pc, and handler are > available by[1]. though count can be reordered and not updated > with new value. > > this fixes the issue. why you think it doesn't? > > Best Regards, > Jamsheed > > > On 4/5/2016 3:40 PM, Doerr, Martin wrote: > > Hi Jamsheed, > > thanks for pointing me to it. Interesting that you have found > such a problem so shortly before me J > > My webrev addresses some aspects which are not covered by your > fix: > > -add_handler_for_exception_and_pc adds a new ExceptionCache > instance in the other case. They need to get released as well. > > -The readers of the _exception_cache field are not safe, yet. > As Andrew Haley pointed out, optimizers may modify load > accesses for non-volatile fields. > > So I think my change is still needed. > > And after taking a closer look at your change, I think the > _count field which is addressed by your fix needs to be > volatile as well. I can incorporate that in my change if you like. > > Would you agree? > > Best regards, > > Martin > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On > Behalf Of *Jamsheed C m > *Sent:* Montag, 4. April 2016 08:14 > *To:* hotspot-compiler-dev at openjdk.java.net > > *Subject:* Re: RFR(S): 8153267: nmethod's exception cache not > multi-thread safe > > Hi Martin, > > "nmethod's exception cache not multi-thread safe" bug is > fixed in b107 > bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 > fix changeset: > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 > discussion link: > http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html > > Best Regards, > Jamsheed > > On 4/1/2016 6:07 PM, Doerr, Martin wrote: > > Hello everyone, > > we have found a concurrency problem with the nmethod?s > exception cache. Readers of the cache may read stale data > on weak memory platforms. > > The writers of the cache are synchronized by locks, but > there may be concurrent readers: The compiler runtimes use > nmethod::handler_for_exception_and_pc to access the cache > without locking. > > Therefore, the nmethod's field _exception_cache needs to > be volatile and adding new entries must be done by > releasing stores. (Loading seems to be fine without > acquire because there's an address dependency from the > load of the cache to the usage of its contents which is > sufficient to ensure ordering on all openjdk platforms.) > > I also added a minor cleanup: I changed nmethod::is_alive > to read the volatile field _state only once. It is > certainly undesired to force the compiler to load it from > memory twice. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ > > > Please review. I will also need a sponsor. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Thu Apr 7 08:51:34 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 09:51:34 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57060C9C.4000000@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> Message-ID: <57061F96.6080001@redhat.com> I'm very tempted to review this, but there is a rather odd thing: the bug does not explain the motivation for this change. Andrew. From aleksey.shipilev at oracle.com Thu Apr 7 08:55:32 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 11:55:32 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57061F96.6080001@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> Message-ID: <57062084.4020209@oracle.com> On 04/07/2016 11:51 AM, Andrew Haley wrote: > I'm very tempted to review this, but there is a rather odd thing: > the bug does not explain the motivation for this change. Not following you. What motivation do you need apart from "...the rest of new Unsafe methods that are not intrinsified by C1 are handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From martin.doerr at sap.com Thu Apr 7 09:08:08 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Apr 2016 09:08:08 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57061D3E.8050408@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> Message-ID: Hi Jamsheed, atomic update for the _count would only be required if there were multiply threads which attempt to increment it concurrently. However, updates are under lock, so we only have concurrent readers which is ok. I still think "volatile" does what we need here. Especially the xlC compiler on AIX tends to reload variables from memory. Exactly this can be prevented by making the field volatile. People who don't like volatile should come up with a different solution, please. Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Donnerstag, 7. April 2016 10:42 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, one comment: the count increment update should use release store + atomic update. ref: https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming Best Regards Jamsheed On 4/6/2016 6:54 PM, Doerr, Martin wrote: Hi Jamsheed and all, thanks for your explanation. About Case 1: I basically agree with that reading _next==NULL or _count==0 only leads to false negatives and is not critical. Yes, we could live with a few more false negatives on weak memory model platforms (even though this is not my preferred design). About Case 2: What I'm missing on the reader's side of the _count field is something which prevents processors from speculatively loading the contents of the ExceptionCache. In ExceptionCache::test_address, the _count only affects the control flow. PPC and ARM processors can predict branches which depend on the _count field and load speculatively from the pc and handler fields (which may be stale data!). Due to out-of-order execution of the loads, it can actually happen, that the new _count value is observed, but stale data is read from pc and handler fields. I guess it is highly unlikely that we will ever observe this, but there's no guarantee. I think my concern about using non-volatile fields for the _exception_cache is also still valid. Nothing prevents C++ Compilers from loading the pointer twice from memory. They may expect to get the pointer to the same instance both times but actually get two different ones. For example, this may lead to the situation that handler_for_exception_and_pc uses one ExceptionCache instance for calling the match function and another one (du to reload of non-volatile field) for calling next(). May other people would like to comment on this lengthy discussion as well? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 13:54 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, On 4/6/2016 2:49 PM, Doerr, Martin wrote: Hi Jamsheed, here are the cases of add_handler_for_exception_and_pc we should talk about: Case 1: A new ExceptionCache instance needs to get added. The storestore barrier you have added is used in the constructor of the ExceptionCache and it releases the most critical fields of it. I think this is what you explained in [1] in your email below. The new values of _count and _next fields are written afterwards and hence not covered by this release barrier. Readers of the _exception_cache may read _count==0 or _next==NULL. One could argue that this is not critical, but I guess this was not intended? At least the _exception_cache field needs to be volatile to prevent optimizers from breaking anything. This is always needed for fields which are accessed concurrently by multiple threads without locks (as the readers do). I think releasing the completely initialized ExceptionCache instance is a much cleaner design. Having count < actual entries, or having _next = null is OK (as there is always (locked)slow path to check again). Quoting comment from read path. // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. Weak memory platforms may have a few more false negatives. but isn't that OK ? This helps us, as we can remove volatile from picture, and actually good for read paths. Case 2: An existing ExceptionCache instance gets a new entry. In this case your storestore barrier is good to release all updated fields. However, we need to consider the readers, too. The _count field needs to be volatile and the load must acquire. Otherwise, stale data may get read by processors which perform loads on speculative paths. storestore mem barrier handles this, as count <= no of real entries. and there is always locked slow path to check again. As said before, there may be a few more false negatives in weak memory platforms than strong ones. Best Regards, Jamsheed I have added the acquire barrier for the _count field here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.01/ Does this answer your questions or is anything still unclear? Best regards, Martin From: Jamsheed C m [mailto:jamsheed.c.m at oracle.com] Sent: Mittwoch, 6. April 2016 10:11 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thanks for the reply. trying to understand stuffs. void nmethod::add_handler_for_exception_and_pc(Handle exception, address pc, address handler) { // There are potential race conditions during exception cache updates, so we // must own the ExceptionCache_lock before doing ANY modifications. Because // we don't lock during reads, it is possible to have several threads attempt // to update the cache with the same data. We need to check for already inserted // copies of the current data before adding it. MutexLocker ml(ExceptionCache_lock); ExceptionCache* target_entry = exception_cache_entry_for_exception(exception); if (target_entry == NULL || !target_entry->add_address_and_handler(pc,handler)) { target_entry = new ExceptionCache(exception,pc,handler); add_exception_cache_entry(target_entry); } } [1]there is a storestore mem barrier before count is updated in add_address_and_handler this ensure exception pc and handler address are updated before count is incremented and Exception cache entry is updated at ( nm->_exception_cache or in the list ec->_next ). address nmethod::handler_for_exception_and_pc(Handle exception, address pc) { // We never grab a lock to read the exception cache, so we may // have false negatives. This is okay, as it can only happen during // the first few exception lookups for a given nmethod. ExceptionCache* ec = exception_cache(); while (ec != NULL) { address ret_val; if ((ret_val = ec->match(exception,pc)) != NULL) { return ret_val; } ec = ec->next(); } return NULL; } and in read logic. we first check ec entry is available (non null check) before proceeding further. if ec is non null and ec_type,excpetion pc, and handler are available by[1]. though count can be reordered and not updated with new value. this fixes the issue. why you think it doesn't? Best Regards, Jamsheed On 4/5/2016 3:40 PM, Doerr, Martin wrote: Hi Jamsheed, thanks for pointing me to it. Interesting that you have found such a problem so shortly before me :) My webrev addresses some aspects which are not covered by your fix: - add_handler_for_exception_and_pc adds a new ExceptionCache instance in the other case. They need to get released as well. - The readers of the _exception_cache field are not safe, yet. As Andrew Haley pointed out, optimizers may modify load accesses for non-volatile fields. So I think my change is still needed. And after taking a closer look at your change, I think the _count field which is addressed by your fix needs to be volatile as well. I can incorporate that in my change if you like. Would you agree? Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Jamsheed C m Sent: Montag, 4. April 2016 08:14 To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Martin, "nmethod's exception cache not multi-thread safe" bug is fixed in b107 bug id: https://bugs.openjdk.java.net/browse/JDK-8143897 fix changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/f918c20107d9 discussion link: http://openjdk.5641.n7.nabble.com/RFR-XS-8143897-Weblogic12medrec-assert-handler-address-SharedRuntime-compute-compiled-exc-handler-nme-td255611.html Best Regards, Jamsheed On 4/1/2016 6:07 PM, Doerr, Martin wrote: Hello everyone, we have found a concurrency problem with the nmethod's exception cache. Readers of the cache may read stale data on weak memory platforms. The writers of the cache are synchronized by locks, but there may be concurrent readers: The compiler runtimes use nmethod::handler_for_exception_and_pc to access the cache without locking. Therefore, the nmethod's field _exception_cache needs to be volatile and adding new entries must be done by releasing stores. (Loading seems to be fine without acquire because there's an address dependency from the load of the cache to the usage of its contents which is sufficient to ensure ordering on all openjdk platforms.) I also added a minor cleanup: I changed nmethod::is_alive to read the volatile field _state only once. It is certainly undesired to force the compiler to load it from memory twice. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.00/ Please review. I will also need a sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Thu Apr 7 09:33:23 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 10:33:23 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57062084.4020209@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> Message-ID: <57062963.5070403@redhat.com> On 07/04/16 09:55, Aleksey Shipilev wrote: > On 04/07/2016 11:51 AM, Andrew Haley wrote: >> I'm very tempted to review this, but there is a rather odd thing: >> the bug does not explain the motivation for this change. > > Not following you. > > What motivation do you need apart from "...the rest of new Unsafe > methods that are not intrinsified by C1 are handled by Java fallbacks in > Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? As far as I know all of these are handled by native methods. Is that not correct? They seem to be. Andrew. From aleksey.shipilev at oracle.com Thu Apr 7 09:48:48 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 12:48:48 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57062963.5070403@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> Message-ID: <57062D00.5000609@oracle.com> On 04/07/2016 12:33 PM, Andrew Haley wrote: > On 07/04/16 09:55, Aleksey Shipilev wrote: >> On 04/07/2016 11:51 AM, Andrew Haley wrote: >>> I'm very tempted to review this, but there is a rather odd thing: >>> the bug does not explain the motivation for this change. >> >> Not following you. >> >> What motivation do you need apart from "...the rest of new Unsafe >> methods that are not intrinsified by C1 are handled by Java fallbacks in >> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? > > As far as I know all of these are handled by native methods. > Is that not correct? They seem to be. Yes. This is not a question on implementing CompareAndExchange: it is handled by unsafe.cpp bits well, and this is why runtime can work even without getting compilers in picture. This is about getting CompareAndExchange support in C1 in sync with CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native calls, and should do the same for new CompareAndExchange. In other words, out of the realm of VarHandles accessor methods, we are either covered by Unsafe.java shortcuts, or C1/C2 intrinsics that replace unsafe.cpp calls. Except for CompareAndExchange, until this RFE. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From aph at redhat.com Thu Apr 7 10:14:23 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 11:14:23 +0100 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> Message-ID: <570632FF.7090103@redhat.com> On 07/04/16 10:08, Doerr, Martin wrote: > atomic update for the _count would only be required if there were > multiply threads which attempt to increment it > concurrently. However, updates are under lock, so we only have > concurrent readers which is ok. > > I still think ?volatile? does what we need here. Especially the xlC > compiler on AIX tends to reload variables from memory. Exactly this > can be prevented by making the field volatile. I think your latest patch is OK. Whether volatile is really good enough, I don't know. The new(ish) C++ memory model treats this as a race, and therefore undefined behaviour. Old C++ didn't have a memory model, so the best we can do with racy code is guess about what our compilers might do. I certainly much prefer a release_store to the storestore fence used in the fix for 8143897. Andrew. From aph at redhat.com Thu Apr 7 10:17:15 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 11:17:15 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57062D00.5000609@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> <57062D00.5000609@oracle.com> Message-ID: <570633AB.6040704@redhat.com> On 07/04/16 10:48, Aleksey Shipilev wrote: > On 04/07/2016 12:33 PM, Andrew Haley wrote: >> On 07/04/16 09:55, Aleksey Shipilev wrote: >>> On 04/07/2016 11:51 AM, Andrew Haley wrote: >>>> I'm very tempted to review this, but there is a rather odd thing: >>>> the bug does not explain the motivation for this change. >>> >>> Not following you. >>> >>> What motivation do you need apart from "...the rest of new Unsafe >>> methods that are not intrinsified by C1 are handled by Java fallbacks in >>> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? >> >> As far as I know all of these are handled by native methods. >> Is that not correct? They seem to be. > > Yes. This is not a question on implementing CompareAndExchange: it is > handled by unsafe.cpp bits well, and this is why runtime can work even > without getting compilers in picture. I thought so. > This is about getting CompareAndExchange support in C1 in sync with > CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native > calls, and should do the same for new CompareAndExchange. So, this is entirely about making C1-generated code more efficient, by avoiding native calls. Right? Andrew. From aleksey.shipilev at oracle.com Thu Apr 7 10:26:28 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 13:26:28 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <570633AB.6040704@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> <57062D00.5000609@oracle.com> <570633AB.6040704@redhat.com> Message-ID: <570635D4.8020308@oracle.com> On 04/07/2016 01:17 PM, Andrew Haley wrote: > On 07/04/16 10:48, Aleksey Shipilev wrote: >> On 04/07/2016 12:33 PM, Andrew Haley wrote: >>> On 07/04/16 09:55, Aleksey Shipilev wrote: >>>> On 04/07/2016 11:51 AM, Andrew Haley wrote: >>>>> I'm very tempted to review this, but there is a rather odd thing: >>>>> the bug does not explain the motivation for this change. >>>> >>>> Not following you. >>>> >>>> What motivation do you need apart from "...the rest of new Unsafe >>>> methods that are not intrinsified by C1 are handled by Java fallbacks in >>>> Unsafe.java. compareAndExchange cannot be emulated with existing APIs"? >>> >>> As far as I know all of these are handled by native methods. >>> Is that not correct? They seem to be. >> >> Yes. This is not a question on implementing CompareAndExchange: it is >> handled by unsafe.cpp bits well, and this is why runtime can work even >> without getting compilers in picture. > > I thought so. > >> This is about getting CompareAndExchange support in C1 in sync with >> CompareAndSet support. C1 intrinsifies CompareAndSet to dodge native >> calls, and should do the same for new CompareAndExchange. > > So, this is entirely about making C1-generated code more efficient, > by avoiding native calls. Right? Yes. The same reason why CompareAndSwap and {get,put}* are intrinsified by C1. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From martin.doerr at sap.com Thu Apr 7 10:51:41 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Apr 2016 10:51:41 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <570632FF.7090103@redhat.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> Message-ID: <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> Hi Andrew, Jamsheed and all, thank you very much for your input. As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also correct). My change still contains a releasing store for newly created ExceptionCache instances. As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms. I think having the release doesn't hurt too much and makes the design a little cleaner. I also added comments based on your input. The new webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ Please review. I will also need a sponsor from Oracle, please. Thanks again and best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 7. April 2016 12:14 To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On 07/04/16 10:08, Doerr, Martin wrote: > atomic update for the _count would only be required if there were > multiply threads which attempt to increment it > concurrently. However, updates are under lock, so we only have > concurrent readers which is ok. > > I still think "volatile" does what we need here. Especially the xlC > compiler on AIX tends to reload variables from memory. Exactly this > can be prevented by making the field volatile. I think your latest patch is OK. Whether volatile is really good enough, I don't know. The new(ish) C++ memory model treats this as a race, and therefore undefined behaviour. Old C++ didn't have a memory model, so the best we can do with racy code is guess about what our compilers might do. I certainly much prefer a release_store to the storestore fence used in the fix for 8143897. Andrew. From aph at redhat.com Thu Apr 7 11:34:54 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 12:34:54 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <570635D4.8020308@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57061F96.6080001@redhat.com> <57062084.4020209@oracle.com> <57062963.5070403@redhat.com> <57062D00.5000609@oracle.com> <570633AB.6040704@redhat.com> <570635D4.8020308@oracle.com> Message-ID: <570645DE.4070708@redhat.com> On 04/07/2016 11:26 AM, Aleksey Shipilev wrote: >> So, this is entirely about making C1-generated code more efficient, >> > by avoiding native calls. Right? > Yes. The same reason why CompareAndSwap and {get,put}* are intrinsified > by C1. OK, I'll look at this today. Andrew. From pavel.punegov at oracle.com Thu Apr 7 13:32:53 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Thu, 7 Apr 2016 16:32:53 +0300 Subject: RFR(XS): 8140354: Unquarantine tests that failed with OutOfMemoryError Message-ID: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> Hi, please review this fix to unquarantine CompilerControl tests after the JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2] The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM. [1] https://bugs.openjdk.java.net/browse/JDK-8140354 [2] https://bugs.openjdk.java.net/browse/JDK-8144621 -- webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/ bug https://bugs.openjdk.java.net/browse/JDK-8153661 ? Pavel. -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Thu Apr 7 13:43:01 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 7 Apr 2016 16:43:01 +0300 Subject: RFR(XS): 8140354: Unquarantine tests that failed with OutOfMemoryError In-Reply-To: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> References: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> Message-ID: <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com> Pavel, looks good to me ? Igor > On Apr 7, 2016, at 4:32 PM, Pavel Punegov wrote: > > Hi, > > please review this fix to unquarantine CompilerControl tests after the JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2] > The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM. > > [1] https://bugs.openjdk.java.net/browse/JDK-8140354 > [2] https://bugs.openjdk.java.net/browse/JDK-8144621 > -- > webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/ > bug https://bugs.openjdk.java.net/browse/JDK-8153661 > > ? Pavel. > From aph at redhat.com Thu Apr 7 14:29:31 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 15:29:31 +0100 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57060C9C.4000000@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> Message-ID: <57066ECB.2010502@redhat.com> On 04/07/2016 08:30 AM, Aleksey Shipilev wrote: > On 04/01/2016 05:37 PM, Aleksey Shipilev wrote: >> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >>> I would like to solicit comments for C1 support for new >>> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >>> The rest of new Unsafe methods that are not intrinsified by C1 are >>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >>> emulated with existing APIs. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8152753 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ >> >> Update: >> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ >> >> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some >> other cleanups. >> >> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT >> hs-comp testset (some unrelated timeouts on SPARC). > > Anyone? Reviewed, OK. Andrew. From felix.yang at linaro.org Thu Apr 7 15:01:37 2016 From: felix.yang at linaro.org (Felix Yang) Date: Thu, 7 Apr 2016 23:01:37 +0800 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair Message-ID: Hi, Please review webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.00/ JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713 Currently, C2 compiler generate independent stores to clear short arrays whose size is no bigger than parameter InitArrayShortSize (refer to ClearArrayNode::Ideal function). For the aarch64 port, we have store pair instruction which can zero two memory words at a time and this will be good for code size and maybe performance for some micro-archs. For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions are generated with the patch, which mean about 2KB reduction in codesize. Tested with JTreg hotspot, langtools and jdk. Is it OK? Thanks, Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Apr 7 15:22:39 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 7 Apr 2016 18:22:39 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <56FE87C2.50002@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> Message-ID: <57067B3F.4060403@oracle.com> Aleksey, Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not Compiler::is_intrinsic_supported? vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you effectively completely disable the intrinsics on all non-x86 platforms. I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and check it in Compiler::is_intrinsic_supported. Best regards, Vladimir Ivanov On 4/1/16 5:37 PM, Aleksey Shipilev wrote: > On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >> I would like to solicit comments for C1 support for new >> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >> The rest of new Unsafe methods that are not intrinsified by C1 are >> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >> emulated with existing APIs. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8152753 >> >> Webrev: >> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ > > Update: > http://cr.openjdk.java.net/~shade/8152753/webrev.01/ > > Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some > other cleanups. > > Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT > hs-comp testset (some unrelated timeouts on SPARC). > > Thanks, > -Aleksey > From aleksey.shipilev at oracle.com Thu Apr 7 15:34:08 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 18:34:08 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57067B3F.4060403@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57067B3F.4060403@oracle.com> Message-ID: <57067DF0.5090707@oracle.com> On 04/07/2016 06:22 PM, Vladimir Ivanov wrote: > Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not > Compiler::is_intrinsic_supported? > > vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you > effectively completely disable the intrinsics on all non-x86 platforms. > > I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and > check it in Compiler::is_intrinsic_supported. I actually did that originally, see: http://cr.openjdk.java.net/~shade/8152753/webrev.00/ ...but then moved that to vmIntrinsics::is_disabled_by_flags by John's suggestion -- all flag sensing is done there. It is a matter of approach, really, and I think current version aligns better with the existing intrinsic flags. Non-x86 platforms have not yet implemented CAE intrinsics, and this forces their hand to implement both C1 and C2 parts before flipping the platform-dependent flag. Which may or may not be a good thing, but I don't have preference either way. -Aleksey > On 4/1/16 5:37 PM, Aleksey Shipilev wrote: >> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >>> I would like to solicit comments for C1 support for new >>> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >>> The rest of new Unsafe methods that are not intrinsified by C1 are >>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >>> emulated with existing APIs. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8152753 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ >> >> Update: >> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ >> >> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some >> other cleanups. >> >> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT >> hs-comp testset (some unrelated timeouts on SPARC). >> >> Thanks, >> -Aleksey >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From aph at redhat.com Thu Apr 7 15:35:14 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 7 Apr 2016 16:35:14 +0100 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair In-Reply-To: References: Message-ID: <57067E32.3010403@redhat.com> On 04/07/2016 04:01 PM, Felix Yang wrote: > Please review webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.00/ > JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713 > > Currently, C2 compiler generate independent stores to clear > short arrays whose size is no bigger than parameter > InitArrayShortSize (refer to ClearArrayNode::Ideal function). > For the aarch64 port, we have store pair instruction which can > zero two memory words at a time and this will be good for code > size and maybe performance for some micro-archs. > > For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions > are generated with the patch, which mean about 2KB reduction in > codesize. > > Tested with JTreg hotspot, langtools and jdk. Is it OK? It looks reasonable. It's rather a big slab of code for aarch64.ad, and I think that it should be in MacroAssembler. Long Chen created MacroAssembler::zero_words, and you should create an overload of zero_words which takes a constant int as an argument. Andrew. From rwestrel at redhat.com Thu Apr 7 15:41:27 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 7 Apr 2016 17:41:27 +0200 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57060C9C.4000000@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> Message-ID: <57067FA7.7030907@redhat.com> >> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ That c1_LIR.cpp change: 931 if (!opCompareAndSwap->_exchange) do_input(opCompareAndSwap->_cmp_value); doesn't seem right. Why do you need it? Roland. From aleksey.shipilev at oracle.com Thu Apr 7 15:46:04 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 7 Apr 2016 18:46:04 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57067FA7.7030907@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> Message-ID: <570680BC.7030305@oracle.com> On 04/07/2016 06:41 PM, Roland Westrelin wrote: > >>> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ > > That c1_LIR.cpp change: > > 931 if (!opCompareAndSwap->_exchange) > do_input(opCompareAndSwap->_cmp_value); > > doesn't seem right. Why do you need it? CompareAndSwap produces boolean result, and kills cmp_value and new_value. CompareAndExchange produces the "old"/null value result, which is stored at the same position as cmp_value. So, if you omit that line, LinearScan asserts when you are trying to use the result of CompareAndExchange. AFAIU, removing the "input" property from cmp_value, but leaving "temp" makes things back in order for CompareAndExchange. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From vladimir.x.ivanov at oracle.com Thu Apr 7 16:01:31 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 7 Apr 2016 19:01:31 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <57067DF0.5090707@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57067B3F.4060403@oracle.com> <57067DF0.5090707@oracle.com> Message-ID: <5706845B.2080007@oracle.com> Ok, I thought C2 support is already there on non-x86 platforms. I'm fine with both approaches then. Best regards, Vladimir Ivanov On 4/7/16 6:34 PM, Aleksey Shipilev wrote: > On 04/07/2016 06:22 PM, Vladimir Ivanov wrote: >> Why did you put the guards in vmIntrinsics::is_disabled_by_flags and not >> Compiler::is_intrinsic_supported? >> >> vmIntrinsics::is_disabled_by_flags affects both C1 & C2, so you >> effectively completely disable the intrinsics on all non-x86 platforms. >> >> I suggest to move InlineCompareAndExchange flag into c1_globals.hpp and >> check it in Compiler::is_intrinsic_supported. > > I actually did that originally, see: > http://cr.openjdk.java.net/~shade/8152753/webrev.00/ > > ...but then moved that to vmIntrinsics::is_disabled_by_flags by John's > suggestion -- all flag sensing is done there. It is a matter of > approach, really, and I think current version aligns better with the > existing intrinsic flags. > > Non-x86 platforms have not yet implemented CAE intrinsics, and this > forces their hand to implement both C1 and C2 parts before flipping the > platform-dependent flag. Which may or may not be a good thing, but I > don't have preference either way. > > -Aleksey > >> On 4/1/16 5:37 PM, Aleksey Shipilev wrote: >>> On 03/25/2016 07:29 PM, Aleksey Shipilev wrote: >>>> I would like to solicit comments for C1 support for new >>>> Unsafe.compareAndExchange intrinsics (we have support for them in C2). >>>> The rest of new Unsafe methods that are not intrinsified by C1 are >>>> handled by Java fallbacks in Unsafe.java. compareAndExchange cannot be >>>> emulated with existing APIs. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8152753 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~shade/8152753/webrev.00/ >>> >>> Update: >>> http://cr.openjdk.java.net/~shade/8152753/webrev.01/ >>> >>> Moved flags sensing to vmIntrinsics::is_disabled_by_flags, and did some >>> other cleanups. >>> >>> Testing: compiler/unsafe regression tests; targeted microbenchmarks; RBT >>> hs-comp testset (some unrelated timeouts on SPARC). >>> >>> Thanks, >>> -Aleksey >>> > > From doug.simon at oracle.com Thu Apr 7 20:27:32 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 7 Apr 2016 22:27:32 +0200 Subject: RFR: 8153782: update JVMCI sources to Eclipse 4.5.2 format style Message-ID: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8153782 http://cr.openjdk.java.net/~dnsimon/8153782/ -Doug From christian.thalinger at oracle.com Thu Apr 7 20:51:04 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 7 Apr 2016 10:51:04 -1000 Subject: RFR: 8153782: update JVMCI sources to Eclipse 4.5.2 format style In-Reply-To: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com> References: <418A2221-7C57-4437-97A6-3EEE47B3D1CD@oracle.com> Message-ID: Looks good. > On Apr 7, 2016, at 10:27 AM, Doug Simon wrote: > > https://bugs.openjdk.java.net/browse/JDK-8153782 > http://cr.openjdk.java.net/~dnsimon/8153782/ > > -Doug From bharadwaj.yadavalli at oracle.com Thu Apr 7 23:36:25 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Thu, 7 Apr 2016 19:36:25 -0400 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options Message-ID: <5706EEF9.4030900@oracle.com> Backing out the change [1] that fixed [2]. Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ Testing: Ran the tests in bug report successfully using product build. Thanks, Bharadwaj [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b [2] https://bugs.openjdk.java.net/browse/JDK-8145348 From bharadwaj.yadavalli at oracle.com Fri Apr 8 00:01:59 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Thu, 7 Apr 2016 20:01:59 -0400 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options In-Reply-To: <5706F338.9010301@oracle.com> References: <5706EEF9.4030900@oracle.com> <5706F338.9010301@oracle.com> Message-ID: <5706F4F7.2050602@oracle.com> Thanks, Jesper! Bharadwaj On 04/07/2016 07:54 PM, Jesper Wilhelmsson wrote: > Looks good! > /Jesper > > Den 8/4/16 kl. 01:36, skrev S. Bharadwaj Yadavalli: >> Backing out the change [1] that fixed [2]. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 >> webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ >> >> Testing: Ran the tests in bug report successfully using product build. >> >> Thanks, >> >> Bharadwaj >> >> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b >> [2] https://bugs.openjdk.java.net/browse/JDK-8145348 >> >> From vladimir.kozlov at oracle.com Fri Apr 8 00:20:30 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Apr 2016 17:20:30 -0700 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options In-Reply-To: <5706EEF9.4030900@oracle.com> References: <5706EEF9.4030900@oracle.com> Message-ID: <5706F94E.2070803@oracle.com> No. This is very wrong change! The bug states that -XX:+UnlockDiagnosticVMOptions flag should be added to tests which miss it and not revert 8145348 changes. Vladimir On 4/7/16 4:36 PM, S. Bharadwaj Yadavalli wrote: > Backing out the change [1] that fixed [2]. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 > webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ > > Testing: Ran the tests in bug report successfully using product build. > > Thanks, > > Bharadwaj > > [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b > [2] https://bugs.openjdk.java.net/browse/JDK-8145348 > > From vladimir.kozlov at oracle.com Fri Apr 8 01:11:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Apr 2016 18:11:16 -0700 Subject: RFR: 8153655: TESTBUG: intrinsics tests must be updated to enable diagnostic options In-Reply-To: <5706F94E.2070803@oracle.com> References: <5706EEF9.4030900@oracle.com> <5706F94E.2070803@oracle.com> Message-ID: <57070534.80509@oracle.com> I talked with Bharadwaj and we decided to push backout with different bug: 8153816: Backout changes for JDK-8145348 till 8153655 is fixed and use 8153655 for real fix as its synopsis say. Thanks, Vladimir On 4/7/16 5:20 PM, Vladimir Kozlov wrote: > No. This is very wrong change! The bug states that -XX:+UnlockDiagnosticVMOptions flag should be added to tests which miss it and not > revert 8145348 changes. > > Vladimir > > On 4/7/16 4:36 PM, S. Bharadwaj Yadavalli wrote: >> Backing out the change [1] that fixed [2]. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153655 >> webrev: http://cr.openjdk.java.net/~bharadwaj/8153655/webrev/ >> >> Testing: Ran the tests in bug report successfully using product build. >> >> Thanks, >> >> Bharadwaj >> >> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b >> [2] https://bugs.openjdk.java.net/browse/JDK-8145348 >> >> From bharadwaj.yadavalli at oracle.com Fri Apr 8 02:39:51 2016 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Thu, 7 Apr 2016 22:39:51 -0400 Subject: RFR: 8153816: [BACKOUT] Make intrinsics flags diagnostic Message-ID: <570719F7.7010404@oracle.com> Backing out the change [1] that fixed [2]. This is a sub-task of [3]. Bug: https://bugs.openjdk.java.net/browse/JDK-8153816 webrev: http://cr.openjdk.java.net/~bharadwaj/8153816/webrev/ Testing: Ran the tests in bug report successfully using product build. Thanks, Bharadwaj [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b [2] https://bugs.openjdk.java.net/browse/JDK-8145348 [3] https://bugs.openjdk.java.net/browse/JDK-8153655 From vladimir.kozlov at oracle.com Fri Apr 8 02:48:00 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Apr 2016 19:48:00 -0700 Subject: RFR: 8153816: [BACKOUT] Make intrinsics flags diagnostic In-Reply-To: <570719F7.7010404@oracle.com> References: <570719F7.7010404@oracle.com> Message-ID: <57071BE0.5000708@oracle.com> Looks good. Thanks, Vladimir On 4/7/16 7:39 PM, S. Bharadwaj Yadavalli wrote: > Backing out the change [1] that fixed [2]. This is a sub-task of [3]. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153816 > webrev: http://cr.openjdk.java.net/~bharadwaj/8153816/webrev/ > > Testing: Ran the tests in bug report successfully using product build. > > Thanks, > > Bharadwaj > > [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/12b38ff7ad9b > [2] https://bugs.openjdk.java.net/browse/JDK-8145348 > [3] https://bugs.openjdk.java.net/browse/JDK-8153655 > From HORII at jp.ibm.com Fri Apr 8 10:53:48 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 8 Apr 2016 10:53:48 +0000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 Message-ID: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Dear all: Can I please request reviews for the following change? This change was created for JDK 9 and ppc64. Description: This change adds options of compare-and-exchange for POWER architecture. As described in atomic_linux_ppc.inline.hpp, the current implementation of cmpxchg is fence_cmpxchg_acquire. This implementation is useful for general purposes because twice calls of sync before and after cmpxchg will keep consistency. However, they sometimes cause overheads because sync instructions are very expensive in the current POWER chip design. With this change, callers can explicitly specify to run fence and acquire with two additional bool parameters. Because their default values are "true", it is not necessary to modify existing cmpxchg calls. In addition, with the new parameters of cmpxchg, this change improves performance of copy_to_survivor in the parallel GC. copy_to_survivor changes forward pointers by using cmpxchg. This operation doesn't require any sync instructions, in my understanding. A pointer is changed at most once in a GC and when cmpxchg fails, the latest pointer is available for the caller. When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly doesn't support new version format of Java 9), pause time of young GC was reduced from 10% to 20%. Summary of source code changes: * src/share/vm/runtime/atomic.hpp * src/share/vm/runtime/atomic.cpp * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp - Add two arguments of fence and acquire to cmpxchg only for PPC64. Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, they are reduced while inlining to callers. * src/share/vm/oops/oop.inline.hpp - Changed cas_set_mark to call cmpxchg without fence and acquire. cas_set_mark is called only by cas_forward_to that is called only by copy_to_survivor_space and oop_promotion_failed in psPromotionManager. Code change: Please see an attached diff file that was generated with "hg diff -g" under the latest hotspot directory. Passed test: SPECjbb2013 (customized) * I believe some other cmpxchg will be optimized by reducing fence or acquire because twice calls of sync are too conservative to implement Java memory model. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64_cmpxchg_opt.diff Type: application/octet-stream Size: 8837 bytes Desc: not available URL: From nils.eliasson at oracle.com Fri Apr 8 12:27:27 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 8 Apr 2016 14:27:27 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out Message-ID: <5707A3AF.3040807@oracle.com> Hi, Please review this small fix of the BlockingCompilation test. Summary: Add method enqueued for compilation with WB API may be removed from the compile queue as stale. Solution: Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets stale while the test is running. (Also added some extra checks that may spare us from waiting until timeout for failing.) This is an workaround but we should consider fixing something permanent for WB API compiles - like tagging the compile task with info about the origin of the compile. The comment field has this information - but then it needs to be converted to an enum. Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ Best regards, Nils Eliasson From rwestrel at redhat.com Fri Apr 8 13:54:32 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 8 Apr 2016 15:54:32 +0200 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <570680BC.7030305@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> <570680BC.7030305@oracle.com> Message-ID: <5707B818.6070405@redhat.com> > CompareAndSwap produces boolean result, and kills cmp_value and > new_value. CompareAndExchange produces the "old"/null value result, > which is stored at the same position as cmp_value. > > So, if you omit that line, LinearScan asserts when you are trying to use > the result of CompareAndExchange. AFAIU, removing the "input" property > from cmp_value, but leaving "temp" makes things back in order for > CompareAndExchange. That assert seems too restrictive but making that change to c1_LIR.cpp is asking for trouble, I think. I would suggest moving the final move to the result register to the platform dependent code (see attached patch). Also, I noticed you don't pass the result as the correct argument of cas_*. Roland. -------------- next part -------------- A non-text attachment was scrubbed... Name: cas.patch Type: text/x-patch Size: 2704 bytes Desc: not available URL: From nils.eliasson at oracle.com Fri Apr 8 13:47:22 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 8 Apr 2016 15:47:22 +0200 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <56BB7E47.4000703@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> <56A13176.804@oracle.com> <56A230D7.9060606@oracle.com> <56A27B55.6050502@oracle.com> <56BB095A.2090500@oracle.com> <56BB7E47.4000703@oracle.com> Message-ID: <5707B66A.4020006@oracle.com> Hi, Picking up this thread again. On 2016-02-10 19:15, Vladimir Kozlov wrote: > This looks almost good. > > There is " at the end but there is no first ": > That was just the end of mark of the cut out of the hs_err file. > MaxNodeLimit:80000" > > "Compiling with directive:" --> "Compiling with directives:". "No > inline rules in directive.", again "directives". > > Also the list of values is difficult to navigate. To have one per line > is definitely overkill but organizing them in some kind of patter > would be nice (3 per line with the same indent, for example). It could > be hard to do but at least order them alphabetically. The printout is generated by a macro for simplicity. Any sorting or formatting require a hand tuned print function or a macro that builds a list that is sorted and printed by the print function. I don't think it is worth the effort to have in the hs_err file. > > I asked before and I again forgot what it means "Enable:true > Exclude:false". This means you need to add more info "Enable > directives:true"? What is "Exclude" again? Enable - Is the directive ok to use (otherwise disabled as in not available) Exclude - this method can not be compiled, as in CompileCommand=exclude I'll add comments to the flag table: #define compilerdirectives_common_flags(cflags) \ cflags(Enable, bool, true, X) /* false -> directive disabled from use */ \ cflags(Exclude, bool, false, X) /* true -> don't compile method */ \ > > DisableIntrinsic: does not have value so it should not be on list. > Similar for others when they don't have values. DisableIntrinsic appear to be empty because it contains the empty list. I can add an "" for clarity. All options have a value. > > Again what * means in "*PrintInlining:true"? Is it because it is > present on command line? A * shows that it was set with a directive. Regards, Nils > > Thanks, > Vladimir > > On 2/10/16 1:56 AM, Nils Eliasson wrote: >> Hi, >> >> New webrev including Vladimirs suggestions: >> >> http://cr.openjdk.java.net/~neliasso/8138756/webrev.04/ >> >> Now it will look like this printing the directive when there are no >> compile commands for inlining: >> >> "--------------- T H R E A D --------------- >> >> Current thread (0x00007f53f0468000): JavaThread "C1 >> CompilerThread10" daemon [_thread_in_native, id=8398, >> stack(0x00007f52e6163000,0x00007f52e6264000)] >> >> Current CompileTask: >> C1: 228 1 3 java.lang.String::isLatin1 (19 bytes) >> >> Compiling with directive: >> No inline rules in directive. >> Enable:true Exclude:false BreakAtExecute:false >> BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >> PrintNMethods:false ReplayInline:false DumpReplay:false >> DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false >> PrintIntrinsics:false TraceOptoPipelining:false >> TraceOptoOutput:false TraceSpilling:false Vectorize:false >> VectorizeDebug:false CloneMapDebug:false >> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >> >> >> >> And like this when there are: >> >> >> "--------------- T H R E A D --------------- >> >> Current thread (0x00007feda4468800): JavaThread "C1 >> CompilerThread10" daemon [_thread_in_native, id=8314, >> stack(0x00007fec9a751000,0x00007fec9a852000)] >> >> Current CompileTask: >> C1: 227 1 3 java.lang.String::isLatin1 (19 bytes) >> >> Compiling with directive: >> No inline rules in directive. Following compile commands are in use: >> inline: b.b, a.a >> dontinline: c.c >> exclude: d.d >> compileonly: *.* >> Enable:true Exclude:false BreakAtExecute:false >> BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >> PrintNMethods:false ReplayInline:false DumpReplay:false >> DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false >> PrintIntrinsics:false TraceOptoPipelining:false >> TraceOptoOutput:false TraceSpilling:false Vectorize:false >> VectorizeDebug:false CloneMapDebug:false >> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >> >> Regards, >> Nils >> >> On 2016-01-22 19:56, Vladimir Kozlov wrote: >>> "no inline - compile commands may apply" is confusing to me (and for >>> others who not familiar with directives). What >>> does it mean? :) >>> Does it mean no 'inline' directives were used or opposite: >>> -XX:-Inline flag was specified (or corresponding directive). >>> >>> If it is switch off inlining then I think it should be "don't inline". >>> So what "compile commands may apply" means? >>> >>> > I updated the print output to mark all options in the directive >>> that are >>> > not default with a '*'. That makes it quicker to see if any special >>> >>> Yes, it is better but I still did not get this. I see that command >>> line has PrintInlining command and it is in the >>> list: *PrintInlining:true. >>> But I don't see PrintCompilation on the list but it is specified on >>> command line. On other hand PrintIntrinsics:false >>> is there. >>> >>> > It only prints the directive that is used for the current compile >>> task >>> > (that caused the crash). (Thats why I put them together in the >>> hs_err file) >>> >>> What do you mean "is used"? >>> >>> "Print *which* directive (and options) were in use if compiler crash. >>> Print *if* directives were used at some point if other crash?" >>> >>> Should we replace "in use"/"were used" with "were set"? >>> >>> Thanks, >>> Vladimir >>> >>> On 1/22/16 5:38 AM, Nils Eliasson wrote: >>>> Hi, Vladimir >>>> >>>> On 2016-01-21 20:28, Vladimir Kozlov wrote: >>>>> Passing directives through ciEnv is fine. >>>>> My question is about output in hs_err file. How those directives were >>>>> selected in your example? >>>> >>>> It only prints the directive that is used for the current compile task >>>> (that caused the crash). (Thats why I put them together in the >>>> hs_err file) >>>> >>>>> I found it strange to see mixed flags values and oracle commands. >>>>> "Enable:true Exclude:false" - which these correspond to, for example? >>>> >>>> These are all options from the directive - and they are set with >>>> directives (highest priority), compilecommmand or vmflags (lowest >>>> priority). >>>> >>>>> >>>>> Should we not print directives/flags which are not set explicitly? >>>> >>>> I updated the print output to mark all options in the directive >>>> that are >>>> not default with a '*'. That makes it quicker to see if any special >>>> options was applied. It will also print if the directive is the >>>> unmodified default directive. >>>> >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/ >>>> Example output: >>>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt >>>> >>>> Regards, >>>> Nils >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/21/16 2:31 AM, Nils Eliasson wrote: >>>>>> This is how it looks: >>>>>> >>>>>> [...] >>>>>> >>>>>> --------------- T H R E A D --------------- >>>>>> >>>>>> Current thread (0x00007f071046a000): JavaThread "C1 >>>>>> CompilerThread10" daemon [_thread_in_native, id=20033, >>>>>> stack(0x00007f05d7afb000,0x00007f05d7bfc000)] >>>>>> >>>>>> Current CompileTask: >>>>>> C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) >>>>>> >>>>>> Current compiler directive: >>>>>> inline: - >>>>>> Enable:true Exclude:false BreakAtExecute:false >>>>>> BreakAtCompile:false Log:false PrintAssembly:false >>>>>> PrintInlining:false PrintNMethods:false ReplayInline:false >>>>>> DumpReplay:false DumpInline:false >>>>>> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: >>>>>> BlockLayoutByFrequency:true PrintOptoAssembly:false >>>>>> PrintIntrinsics:false TraceOptoPipelining:false >>>>>> TraceOptoOutput:false >>>>>> TraceSpilling:false Vectorize:false VectorizeDebug:false >>>>>> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false >>>>>> IGVPrintLevel:0 MaxNodeLimit:80000 >>>>>> >>>>>> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], >>>>>> sp=0x00007f05d7bfa5d0, free space=1021k >>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>>>>> C=native code) >>>>>> V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, >>>>>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, >>>>>> char const*, int, unsigned long)+0x182 >>>>>> V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char >>>>>> const*, int, char const*, char const*, __va_list_tag*)+0x4a >>>>>> V [libjvm.so+0x908cca] report_vm_error(char const*, int, char >>>>>> const*, char const*, ...)+0xea >>>>>> V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, >>>>>> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 >>>>>> V [libjvm.so+0x88ec5a] >>>>>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a >>>>>> V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 >>>>>> V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 >>>>>> V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 >>>>>> V [libjvm.so+0x10189aa] java_start(Thread*)+0xca >>>>>> C [libpthread.so.0+0x8182] start_thread+0xc2 >>>>>> >>>>>> [...] >>>>>> >>>>>> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt >>>>>> >>>>>> Regards, >>>>>> Nils >>>>>> >>>>>> On 2016-01-21 11:25, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small change. The diff looks big but most of the >>>>>>> change is just changing how the directive are >>>>>>> passed to the compilers. Directives are set in the ciEnv and then >>>>>>> passed to the compilers. The compilers can then >>>>>>> choose to add it to any internal compilation object for >>>>>>> convenience. >>>>>>> The hs_err printing routine in vmError.cpp loads >>>>>>> the directive from the ciEnv. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >>>>>>> >>>>>>> Regards, >>>>>>> Nils >>>>>> >>>> >> From felix.yang at linaro.org Fri Apr 8 14:36:02 2016 From: felix.yang at linaro.org (Felix Yang) Date: Fri, 8 Apr 2016 22:36:02 +0800 Subject: RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode Message-ID: Hi, Please review webrev: *http://cr.openjdk.java.net/~fyang/8153837/webrev.00/ * JIRA Issue: *https://bugs.openjdk.java.net/browse/JDK-8153837 * Patch handles code generation for special cases where one arg is -1/0/1 of MaxINode & MinINode eliminating one extra mov instruction. As shown in the JIRA Issue, I saw some occurrences of the specail cases in specJBB2005 and other benchmarks. The patch does something like this: min(x, 1) => cmp x, 0 csinc x, x, zr, le min(x, -1) => cmp x, 0 csinv x, x, zr, lt max(x, 1) => cmp x, 0 csinc x, x, zr, gt max(x, -1) => cmp x, 0 csinv x, x, zr, ge Tested with JTreg hotspot, langtools. Is it OK? Thanks, Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.punegov at oracle.com Fri Apr 8 14:40:12 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Fri, 8 Apr 2016 17:40:12 +0300 Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package Message-ID: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> Hi, please review the following change to JITtester: - rewrite TypeUtil and move to utils package - add javadoc for each method in the class bug: https://bugs.openjdk.java.net/browse/JDK-8153852 webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/ ? Thanks, Pavel Punegov -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Fri Apr 8 14:42:34 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 8 Apr 2016 15:42:34 +0100 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: References: Message-ID: <5707C35A.2060000@redhat.com> On 04/08/2016 03:36 PM, Felix Yang wrote: > Is it OK? It looks good, but I'm surely going to need a jtreg test case which exercises all these patterns in C2. Thanks, Andrew. From vladimir.x.ivanov at oracle.com Fri Apr 8 16:47:49 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 8 Apr 2016 19:47:49 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes Message-ID: <5707E0B5.5080501@oracle.com> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8153540 Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken. The proposed fix is to perform necessary checks in Java code before calling the intrinsic. I did some performance measurements [1] and reflection (non-constant class) case (non-constant class) regressed ~5-10% due to new guards added. I also experimented with a hotspot-only fix [2], but it looks uglier. So, if you consider the regression in reflective case non-critical, I'd prefer to go with JDK checks. Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), microbenchmarks. Thanks! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java Baseline: AllocInstance.testConstant avgt 25 3.736 ? 0.054 ns/op AllocInstance.testReflective avgt 25 5.880 ? 0.080 ns/op JDK fix: AllocInstance.testConstant avgt 25 3.959 ? 0.205 ns/op AllocInstance.testReflective avgt 25 6.274 ? 0.180 ns/op [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op From vladimir.kozlov at oracle.com Fri Apr 8 17:04:29 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:04:29 -0700 Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package In-Reply-To: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> References: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> Message-ID: <5707E49D.4040400@oracle.com> Looks good. Thanks, Vladimir On 4/8/16 7:40 AM, Pavel Punegov wrote: > Hi, > > please review the following change to JITtester: > - rewrite TypeUtil and move to utils package > - add javadoc for each method in the class > > bug: https://bugs.openjdk.java.net/browse/JDK-8153852 > webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/ > > ? Thanks, > Pavel Punegov > From vladimir.kozlov at oracle.com Fri Apr 8 17:09:27 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:09:27 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5707A3AF.3040807@oracle.com> References: <5707A3AF.3040807@oracle.com> Message-ID: <5707E5C7.3000000@oracle.com> What do you mean "stale"? I would prefer to see the real fix as you suggested to avoid removing WB comp tasks from queue. Adding timeout is not reliable. Thanks, Vladimir On 4/8/16 5:27 AM, Nils Eliasson wrote: > Hi, > > Please review this small fix of the BlockingCompilation test. > > Summary: > Add method enqueued for compilation with WB API may be removed from the compile queue as stale. > > Solution: > Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets stale while the test is running. (Also added some extra > checks that may spare us from waiting until timeout for failing.) > > This is an workaround but we should consider fixing something permanent for WB API compiles - like tagging the compile > task with info about the origin of the compile. The comment field has this information - but then it needs to be > converted to an enum. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 > Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ > > Best regards, > Nils Eliasson > > > > From aleksey.shipilev at oracle.com Fri Apr 8 17:28:43 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 8 Apr 2016 20:28:43 +0300 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <5707B818.6070405@redhat.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> <570680BC.7030305@oracle.com> <5707B818.6070405@redhat.com> Message-ID: <5707EA4B.8030306@oracle.com> On 04/08/2016 04:54 PM, Roland Westrelin wrote: > >> CompareAndSwap produces boolean result, and kills cmp_value and >> new_value. CompareAndExchange produces the "old"/null value result, >> which is stored at the same position as cmp_value. >> >> So, if you omit that line, LinearScan asserts when you are trying to use >> the result of CompareAndExchange. AFAIU, removing the "input" property >> from cmp_value, but leaving "temp" makes things back in order for >> CompareAndExchange. > > That assert seems too restrictive but making that change to c1_LIR.cpp > is asking for trouble, I think. I would suggest moving the final move to > the result register to the platform dependent code (see attached patch). > Also, I noticed you don't pass the result as the correct argument of cas_*. D'oh. Thank you, Roland! Merged your changes here: http://cr.openjdk.java.net/~shade/8152753/webrev.02/ Re-spinning the RBT testing now. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Fri Apr 8 17:30:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:30:16 -0700 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707E0B5.5080501@oracle.com> References: <5707E0B5.5080501@oracle.com> Message-ID: <5707EAA8.5020005@oracle.com> > Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken. But it should not allocate arrays. Right? This is what your java changes do now. Should it be allocateInstance0 ?: + // public native Object Unsafe.allocateInstance(Class cls); You removed next stop check. Is it because java code will cat the NULL?: Node* cls = null_check(argument(1)); if (stopped()) return true; The test misses bug number @bug Thanks, Vladimir K On 4/8/16 9:47 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ > > https://bugs.openjdk.java.net/browse/JDK-8153540 > > Unsafe.allocateInstance intrinsic can instantiate arrays, but the allocation logic is broken. > > The proposed fix is to perform necessary checks in Java code before calling the intrinsic. > > I did some performance measurements [1] and reflection (non-constant class) case (non-constant class) regressed ~5-10% > due to new guards added. > > I also experimented with a hotspot-only fix [2], but it looks uglier. So, if you consider the regression in reflective > case non-critical, I'd prefer to go with JDK checks. > > Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), microbenchmarks. > > Thanks! > > Best regards, > Vladimir Ivanov > > [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java > > Baseline: > AllocInstance.testConstant avgt 25 3.736 ? 0.054 ns/op > AllocInstance.testReflective avgt 25 5.880 ? 0.080 ns/op > > JDK fix: > AllocInstance.testConstant avgt 25 3.959 ? 0.205 ns/op > AllocInstance.testReflective avgt 25 6.274 ? 0.180 ns/op > > [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path > > AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op > AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op > From vladimir.kozlov at oracle.com Fri Apr 8 17:39:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 10:39:20 -0700 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <5707B66A.4020006@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> <56A13176.804@oracle.com> <56A230D7.9060606@oracle.com> <56A27B55.6050502@oracle.com> <56BB095A.2090500@oracle.com> <56BB7E47.4000703@oracle.com> <5707B66A.4020006@oracle.com> Message-ID: <5707ECC8.7000107@oracle.com> On 4/8/16 6:47 AM, Nils Eliasson wrote: > Hi, > > Picking up this thread again. > > On 2016-02-10 19:15, Vladimir Kozlov wrote: >> This looks almost good. >> >> There is " at the end but there is no first ": >> > That was just the end of mark of the cut out of the hs_err file. > >> MaxNodeLimit:80000" >> >> "Compiling with directive:" --> "Compiling with directives:". "No inline rules in directive.", again "directives". >> >> Also the list of values is difficult to navigate. To have one per line is definitely overkill but organizing them in >> some kind of patter would be nice (3 per line with the same indent, for example). It could be hard to do but at least >> order them alphabetically. > > The printout is generated by a macro for simplicity. Any sorting or formatting require a hand tuned print function or a > macro that builds a list that is sorted and printed by the print function. I don't think it is worth the effort to have > in the hs_err file. > This is really unfortunate. :( >> >> I asked before and I again forgot what it means "Enable:true Exclude:false". This means you need to add more info >> "Enable directives:true"? What is "Exclude" again? > > Enable - Is the directive ok to use (otherwise disabled as in not available) > Exclude - this method can not be compiled, as in CompileCommand=exclude > > I'll add comments to the flag table: > > #define compilerdirectives_common_flags(cflags) \ > cflags(Enable, bool, true, X) /* false -> directive disabled from use */ \ > cflags(Exclude, bool, false, X) /* true -> don't compile method */ \ Should we rename these flags to make them more clear: EnableDirective ExcludeCompile > >> >> DisableIntrinsic: does not have value so it should not be on list. Similar for others when they don't have values. > DisableIntrinsic appear to be empty because it contains the empty list. I can add an "" for clarity. All options have a > value. Don't crate more noise when it is not needed. It is hs_err file - it is used to understand what happened. Useless information does not help. Even if DisableIntrinsic is specified in directive it should not be listed if it does not have value. Actually I think it is bug - we should not accept directive or flag with empty string value. > >> >> Again what * means in "*PrintInlining:true"? Is it because it is present on command line? > > A * shows that it was set with a directive. Okay. Thanks, Vladimir > > Regards, > Nils > >> >> Thanks, >> Vladimir >> >> On 2/10/16 1:56 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev including Vladimirs suggestions: >>> >>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.04/ >>> >>> Now it will look like this printing the directive when there are no compile commands for inlining: >>> >>> "--------------- T H R E A D --------------- >>> >>> Current thread (0x00007f53f0468000): JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=8398, >>> stack(0x00007f52e6163000,0x00007f52e6264000)] >>> >>> Current CompileTask: >>> C1: 228 1 3 java.lang.String::isLatin1 (19 bytes) >>> >>> Compiling with directive: >>> No inline rules in directive. >>> Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >>> PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false >>> TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false >>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >>> >>> >>> >>> And like this when there are: >>> >>> >>> "--------------- T H R E A D --------------- >>> >>> Current thread (0x00007feda4468800): JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=8314, >>> stack(0x00007fec9a751000,0x00007fec9a852000)] >>> >>> Current CompileTask: >>> C1: 227 1 3 java.lang.String::isLatin1 (19 bytes) >>> >>> Compiling with directive: >>> No inline rules in directive. Following compile commands are in use: >>> inline: b.b, a.a >>> dontinline: c.c >>> exclude: d.d >>> compileonly: *.* >>> Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false *PrintInlining:true >>> PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false >>> DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false >>> TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false >>> DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000" >>> >>> Regards, >>> Nils >>> >>> On 2016-01-22 19:56, Vladimir Kozlov wrote: >>>> "no inline - compile commands may apply" is confusing to me (and for others who not familiar with directives). What >>>> does it mean? :) >>>> Does it mean no 'inline' directives were used or opposite: -XX:-Inline flag was specified (or corresponding directive). >>>> >>>> If it is switch off inlining then I think it should be "don't inline". >>>> So what "compile commands may apply" means? >>>> >>>> > I updated the print output to mark all options in the directive that are >>>> > not default with a '*'. That makes it quicker to see if any special >>>> >>>> Yes, it is better but I still did not get this. I see that command line has PrintInlining command and it is in the >>>> list: *PrintInlining:true. >>>> But I don't see PrintCompilation on the list but it is specified on command line. On other hand PrintIntrinsics:false >>>> is there. >>>> >>>> > It only prints the directive that is used for the current compile task >>>> > (that caused the crash). (Thats why I put them together in the hs_err file) >>>> >>>> What do you mean "is used"? >>>> >>>> "Print *which* directive (and options) were in use if compiler crash. >>>> Print *if* directives were used at some point if other crash?" >>>> >>>> Should we replace "in use"/"were used" with "were set"? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/22/16 5:38 AM, Nils Eliasson wrote: >>>>> Hi, Vladimir >>>>> >>>>> On 2016-01-21 20:28, Vladimir Kozlov wrote: >>>>>> Passing directives through ciEnv is fine. >>>>>> My question is about output in hs_err file. How those directives were >>>>>> selected in your example? >>>>> >>>>> It only prints the directive that is used for the current compile task >>>>> (that caused the crash). (Thats why I put them together in the hs_err file) >>>>> >>>>>> I found it strange to see mixed flags values and oracle commands. >>>>>> "Enable:true Exclude:false" - which these correspond to, for example? >>>>> >>>>> These are all options from the directive - and they are set with >>>>> directives (highest priority), compilecommmand or vmflags (lowest >>>>> priority). >>>>> >>>>>> >>>>>> Should we not print directives/flags which are not set explicitly? >>>>> >>>>> I updated the print output to mark all options in the directive that are >>>>> not default with a '*'. That makes it quicker to see if any special >>>>> options was applied. It will also print if the directive is the >>>>> unmodified default directive. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/ >>>>> Example output: >>>>> http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 1/21/16 2:31 AM, Nils Eliasson wrote: >>>>>>> This is how it looks: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> --------------- T H R E A D --------------- >>>>>>> >>>>>>> Current thread (0x00007f071046a000): JavaThread "C1 >>>>>>> CompilerThread10" daemon [_thread_in_native, id=20033, >>>>>>> stack(0x00007f05d7afb000,0x00007f05d7bfc000)] >>>>>>> >>>>>>> Current CompileTask: >>>>>>> C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) >>>>>>> >>>>>>> Current compiler directive: >>>>>>> inline: - >>>>>>> Enable:true Exclude:false BreakAtExecute:false >>>>>>> BreakAtCompile:false Log:false PrintAssembly:false >>>>>>> PrintInlining:false PrintNMethods:false ReplayInline:false >>>>>>> DumpReplay:false DumpInline:false >>>>>>> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: >>>>>>> BlockLayoutByFrequency:true PrintOptoAssembly:false >>>>>>> PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false >>>>>>> TraceSpilling:false Vectorize:false VectorizeDebug:false >>>>>>> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false >>>>>>> IGVPrintLevel:0 MaxNodeLimit:80000 >>>>>>> >>>>>>> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], >>>>>>> sp=0x00007f05d7bfa5d0, free space=1021k >>>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>>>>>> C=native code) >>>>>>> V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, >>>>>>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, >>>>>>> char const*, int, unsigned long)+0x182 >>>>>>> V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char >>>>>>> const*, int, char const*, char const*, __va_list_tag*)+0x4a >>>>>>> V [libjvm.so+0x908cca] report_vm_error(char const*, int, char >>>>>>> const*, char const*, ...)+0xea >>>>>>> V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, >>>>>>> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 >>>>>>> V [libjvm.so+0x88ec5a] >>>>>>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a >>>>>>> V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 >>>>>>> V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 >>>>>>> V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 >>>>>>> V [libjvm.so+0x10189aa] java_start(Thread*)+0xca >>>>>>> C [libpthread.so.0+0x8182] start_thread+0xc2 >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt >>>>>>> >>>>>>> Regards, >>>>>>> Nils >>>>>>> >>>>>>> On 2016-01-21 11:25, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please review this small change. The diff looks big but most of the >>>>>>>> change is just changing how the directive are >>>>>>>> passed to the compilers. Directives are set in the ciEnv and then >>>>>>>> passed to the compilers. The compilers can then >>>>>>>> choose to add it to any internal compilation object for convenience. >>>>>>>> The hs_err printing routine in vmError.cpp loads >>>>>>>> the directive from the ciEnv. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nils >>>>>>> >>>>> >>> > From vladimir.kozlov at oracle.com Fri Apr 8 18:37:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 11:37:40 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. Message-ID: <5707FA74.5060207@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8153818 webrev: http://cr.openjdk.java.net/~kvn/8153818/ Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. Regular testing. Thanks, Vladimir From aleksey.shipilev at oracle.com Fri Apr 8 18:42:14 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 8 Apr 2016 21:42:14 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707E0B5.5080501@oracle.com> References: <5707E0B5.5080501@oracle.com> Message-ID: <5707FB86.1020408@oracle.com> On 04/08/2016 07:47 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ > > https://bugs.openjdk.java.net/browse/JDK-8153540 > > Unsafe.allocateInstance intrinsic can instantiate arrays, but the > allocation logic is broken. > > The proposed fix is to perform necessary checks in Java code before > calling the intrinsic. I like Java-side fix better. > I did some performance measurements [1] and reflection (non-constant > class) case (non-constant class) regressed ~5-10% due to new guards added. My quick perfasm runs seems to show this is because a subtle difference: http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm http://cr.openjdk.java.net/~shade/8153540/patched.perfasm If you compare these, then the difference seems to be the instruction scheduling and a branch in the guards code. Baseline: 0.60% 0.53% 0x00007fafafd34dac: mov 0xc(%rsi),%r10d 3.43% 4.47% 0x00007fafafd34db0: mov 0x50(%r10),%rsi 0.30% 0.24% 0x00007fafafd34db4: movzbl 0x172(%rsi),%r8d 1.40% 2.09% 0x00007fafafd34dbc: mov 0x8(%rsi),%r10d 0.98% 1.48% 0x00007fafafd34dc0: add $0xfffffffc,%r8d 2.71% 4.28% 0x00007fafafd34dc4: mov %r10d,%r11d 0.03% 0.04% 0x00007fafafd34dc7: and $0x1,%r11d 1.02% 1.41% 0x00007fafafd34dcb: or %r11d,%r8d 2.51% 2.49% 0x00007fafafd34dce: test %r8d,%r8d Patched: 0.59% 0.76% 0x00007fd1c4de1c2c: mov 0xc(%rsi),%r11d 3.48% 3.83% 0x00007fd1c4de1c30: mov 0x50(%r11),%rsi 0.35% 0.36% 0x00007fd1c4de1c34: mov 0x8(%rsi),%r10d 1.47% 1.69% 0x00007fd1c4de1c38: test %r10d,%r10d 0x00007fd1c4de1c3b: jl 0x00007fd1c4de1ce5 1.18% 1.48% 0x00007fd1c4de1c41: movzbl 0x172(%rsi),%r11d 2.82% 3.93% 0x00007fd1c4de1c49: mov %r10d,%r9d 0.01% 0.03% 0x00007fd1c4de1c4c: and $0x1,%r9d 0.33% 0.59% 0x00007fd1c4de1c50: add $0xfffffffc,%r11d 0.93% 0.92% 0x00007fd1c4de1c54: or %r9d,%r11d 2.65% 2.47% 0x00007fd1c4de1c57: test %r11d,%r11d Unfortunately, a simple fix of replacing "||" with "|" explodes the generated code. Maybe something else is doable there. > [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java Suggestions to improve fidelity: * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on @Benchmarks if you want to use -prof perfasm Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From igor.veresov at oracle.com Fri Apr 8 19:18:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 8 Apr 2016 12:18:48 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <5707FA74.5060207@oracle.com> References: <5707FA74.5060207@oracle.com> Message-ID: <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com> Looks good. igor > On Apr 8, 2016, at 11:37 AM, Vladimir Kozlov wrote: > > https://bugs.openjdk.java.net/browse/JDK-8153818 > webrev: > http://cr.openjdk.java.net/~kvn/8153818/ > > Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. > > Regular testing. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Fri Apr 8 19:21:31 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 12:21:31 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com> References: <5707FA74.5060207@oracle.com> <36E0E502-6BAD-4AAE-9097-04EB180A2168@oracle.com> Message-ID: <570804BB.5080001@oracle.com> Thank you Igor for reviews. Vladimir On 4/8/16 12:18 PM, Igor Veresov wrote: > Looks good. > > igor > >> On Apr 8, 2016, at 11:37 AM, Vladimir Kozlov wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8153818 >> webrev: >> http://cr.openjdk.java.net/~kvn/8153818/ >> >> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. >> >> Regular testing. >> >> Thanks, >> Vladimir > From christian.thalinger at oracle.com Fri Apr 8 20:14:30 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 8 Apr 2016 10:14:30 -1000 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <5707FA74.5060207@oracle.com> References: <5707FA74.5060207@oracle.com> Message-ID: <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com> Looks good. > On Apr 8, 2016, at 8:37 AM, Vladimir Kozlov wrote: > > https://bugs.openjdk.java.net/browse/JDK-8153818 > webrev: > http://cr.openjdk.java.net/~kvn/8153818/ > > Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. > > Regular testing. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Fri Apr 8 20:17:46 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Apr 2016 13:17:46 -0700 Subject: [9] RFR(S) 8153818: Move similar CompiledIC platform specific code to shared code. In-Reply-To: <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com> References: <5707FA74.5060207@oracle.com> <348A9280-30FF-48FB-8474-84C5E7DCA967@oracle.com> Message-ID: <570811EA.3040203@oracle.com> Thank you, Chris, for reviews Vladimir On 4/8/16 1:14 PM, Christian Thalinger wrote: > Looks good. > >> On Apr 8, 2016, at 8:37 AM, Vladimir Kozlov wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8153818 >> webrev: >> http://cr.openjdk.java.net/~kvn/8153818/ >> >> Two CompiledIC methods cleanup_call_site() and is_icholder_call_site() have the same code on all platforms. Move them to shared code in compiledIC.cpp. >> >> Regular testing. >> >> Thanks, >> Vladimir > From rwestrel at redhat.com Mon Apr 11 07:57:10 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 11 Apr 2016 09:57:10 +0200 Subject: RFR (M) 8152753: C1 intrinsics for CompareAndExchange (x86) In-Reply-To: <5707EA4B.8030306@oracle.com> References: <56F5676D.7020401@oracle.com> <56FE87C2.50002@oracle.com> <57060C9C.4000000@oracle.com> <57067FA7.7030907@redhat.com> <570680BC.7030305@oracle.com> <5707B818.6070405@redhat.com> <5707EA4B.8030306@oracle.com> Message-ID: <570B58D6.90408@redhat.com> > Merged your changes here: > http://cr.openjdk.java.net/~shade/8152753/webrev.02/ That looks good to me. Roland. From vladimir.x.ivanov at oracle.com Mon Apr 11 11:07:26 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 11 Apr 2016 14:07:26 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707EAA8.5020005@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707EAA8.5020005@oracle.com> Message-ID: <570B856E.3000206@oracle.com> Thanks for the feedback, Vladimir. > > Unsafe.allocateInstance intrinsic can instantiate arrays, but the > allocation logic is broken. > > But it should not allocate arrays. Right? This is what your java changes > do now. Yes, instance allocation logic doesn't work for arrays. It allocates broken array instances. That's why I decided to filter out arrays. > > Should it be allocateInstance0 ?: > > + // public native Object Unsafe.allocateInstance(Class cls); It is: @HotSpotIntrinsicCandidate - public native Object allocateInstance(Class cls) - throws InstantiationException; + private native Object allocateInstance0(Class cls) throws InstantiationException; > > You removed next stop check. Is it because java code will cat the NULL?: > Node* cls = null_check(argument(1)); > if (stopped()) return true; Yes, cls.isPrimitive() does null checks on both cls and klass. > > The test misses bug number @bug Ok, will add. Best regards, Vladimir Ivanov > > Thanks, > Vladimir K > > On 4/8/16 9:47 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/hotspot/ >> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00/jdk/ >> >> https://bugs.openjdk.java.net/browse/JDK-8153540 >> >> Unsafe.allocateInstance intrinsic can instantiate arrays, but the >> allocation logic is broken. >> >> The proposed fix is to perform necessary checks in Java code before >> calling the intrinsic. >> >> I did some performance measurements [1] and reflection (non-constant >> class) case (non-constant class) regressed ~5-10% >> due to new guards added. >> >> I also experimented with a hotspot-only fix [2], but it looks uglier. >> So, if you consider the regression in reflective >> case non-critical, I'd prefer to go with JDK checks. >> >> Testing: regression test, JPRT, RBT (pit-hs-comp; in progress), >> microbenchmarks. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >> >> Baseline: >> AllocInstance.testConstant avgt 25 3.736 ? 0.054 ns/op >> AllocInstance.testReflective avgt 25 5.880 ? 0.080 ns/op >> >> JDK fix: >> AllocInstance.testConstant avgt 25 3.959 ? 0.205 ns/op >> AllocInstance.testReflective avgt 25 6.274 ? 0.180 ns/op >> >> [2] http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path >> >> AllocInstance.testConstant avgt 25 3.957 ? 0.159 ns/op >> AllocInstance.testReflective avgt 25 5.901 ? 0.057 ns/op >> From felix.yang at linaro.org Mon Apr 11 11:47:32 2016 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 11 Apr 2016 19:47:32 +0800 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: <5707C35A.2060000@redhat.com> References: <5707C35A.2060000@redhat.com> Message-ID: Hi, Thanks for reviewing the patch. I find that the cases the patch tries to catch here are the result of loop transformations. And it's hard to produce a test case to for it simply using the Math.min/max API. (Seems C2 will not create a MaxINode/MinINode for a call for these APIs) But I do noticed some JTReg hotspot test cases that already generates the pattern. Example JTReg test cases: hotspot/test/compiler/rangechecks/TestExplicitRangeChecks.java hotspot/test/compiler/rangechecks/TestBadFoldCompare.java hotspot/test/compiler/rangechecks/PowerOf2SizedArraysChecks.java hotspot/test/compiler/rangechecks/TestRangeCheckSmearing.java For the first test, I saw the following instructions in C1 JIT code: $ grep "csel" JTwork/compiler/rangechecks/TestExplicitRangeChecks.jtr | grep zr | grep gt 0x0000007f9e127e8c: csel w14, w10, wzr, gt 0x0000007f9e129568: csel w11, w12, wzr, gt ;*aload_1 {reexecute=0 rethrow=0 return_oop=0} 0x0000007f9e147c78: csel w12, w11, wzr, gt 0x0000007f9e15c75c: csel w12, w13, wzr, gt 0x0000007f9e1689e4: csel w16, w22, wzr, gt 0x0000007f9e18a570: csel w13, w11, wzr, gt 0x0000007f7e13e278: csel w12, w11, wzr, gt 0x0000007f7e14f55c: csel w13, w13, wzr, gt $ grep "csinc" JTwork/compiler/rangechecks/TestExplicitRangeChecks.jtr 0x0000007f9e112860: csinc w12, w12, wzr, gt 0x0000007f9e120e40: csinc w11, w11, wzr, gt 0x0000007f7e114de0: csinc w12, w12, wzr, gt 0x0000007f7e123440: csinc w11, w11, wzr, gt I also searched the C2 JIT code of specJBB2005 & Spark Terasort and I saw the following csel/csinc/csinv generated with the patch: 1. Spark Terasort: 0x0000007f990ec898: csinc w14, w0, wzr, le 0x0000007f990f3a40: csinv w13, w11, wzr, ge 0x0000007f990f3a94: csinv w11, w13, wzr, ge 0x0000007f9912c1a8: csinc w14, w13, wzr, le 0x0000007f9912afe8: csinc w12, w10, wzr, le ;*aload_1 0x0000007f99137f90: csinv w12, w12, wzr, ge 0x0000007f99137fe4: csinv w13, w12, wzr, ge 0x0000007f99132ff8: csinc w11, w10, wzr, le ;*aload_1 0x0000007f9917fdfc: csinc w12, w15, wzr, le 0x0000007f991a5e3c: csinv w11, w11, wzr, ge 0x0000007f991a5e90: csinv w12, w11, wzr, ge 0x0000007f991133bc: csinc w0, w12, wzr, le 0x0000007f9918e548: csinc w12, w18, wzr, le ;*aload_0 0x0000007f991639f8: csinc w16, w12, wzr, le 0x0000007f99115508: csinc w3, w13, wzr, le 0x0000007f991d7e38: csinc w13, w14, wzr, le ;*aload_1 0x0000007f992f7e48: csinc w12, w13, wzr, le ;*aload_0 0x0000007f992dd578: csinv w13, w10, wzr, ge 0x0000007f992e7370: csinv w17, w14, wzr, ge 0x0000007f99222ec8: csinc w12, w13, wzr, le ;*aload_0 0x0000007f993ae208: csinc w10, w15, wzr, le 0x0000007f99405604: csinc w10, w13, wzr, le 0x0000007f9931da84: csinc w11, w13, wzr, le 0x0000007f9941eb04: csinv w15, w11, wzr, ge ;*lload_0 0x0000007f9941ec4c: csinv w15, w14, wzr, ge ;*iload 0x0000007f994a3110: csinc w11, w13, wzr, le 0x0000007f990e78d8: csel w11, w11, wzr, gt 0x0000007f990dc8e0: csel w11, w12, wzr, gt 0x0000007f990dc8f0: csel w11, w11, wzr, gt 0x0000007f990f5f00: csel w12, w12, wzr, gt 0x0000007f990ff83c: csel w11, w10, wzr, gt 0x0000007f9914e0dc: csel w13, w12, wzr, gt 0x0000007f991504f0: csel w10, w11, wzr, gt 0x0000007f9918cbac: csel w20, w20, wzr, gt 0x0000007f991a32dc: csel w10, w10, wzr, gt 0x0000007f990f3a64: csel w13, w13, wzr, gt 0x0000007f990f3ad0: csel w10, w10, wzr, gt 0x0000007f99114d98: csel w16, w14, wzr, gt 0x0000007f991f0434: csel w0, w10, wzr, gt 0x0000007f9920753c: csel w10, w11, wzr, gt 0x0000007f9920754c: csel w10, w10, wzr, gt 0x0000007f99213270: csel w12, w12, wzr, gt 0x0000007f9923a9f8: csel w18, w15, wzr, gt 0x0000007f99210ad4: csel w11, w11, wzr, gt 0x0000007f9926a524: csel w12, w11, wzr, gt 0x0000007f9929e3d0: csel w10, w11, wzr, gt 0x0000007f9929e3ec: csel w11, w11, wzr, gt 0x0000007f992a6c90: csel w11, w12, wzr, gt 0x0000007f99214044: csel w13, w11, wzr, gt 0x0000007f99242d04: csel w11, w12, wzr, gt 0x0000007f99420260: csel w10, w11, wzr, gt ;*checkcast 0x0000007f993b1d14: csel w12, w12, wzr, gt 0x0000007f993fe15c: csel w11, w11, wzr, gt ;*checkcast 0x0000007f99460688: csel w10, w11, wzr, gt 2. specJBB2005: 0x0000007f7e1233e0: csinc w12, w12, wzr, gt 0x0000007f7e1632f8: csinc w10, w10, wzr, gt 0x0000007f7e1c8a1c: csinv w10, w2, wzr, ge 0x0000007f7e14d2cc: csel w0, w12, wzr, gt 0x0000007f7e19982c: csel w10, w10, wzr, gt ;*baload {reexecute=0 rethrow=0 return_oop=0} 0x0000007f7e1bad4c: csel w12, w3, wzr, gt 0x0000007f7e27e25c: csel w10, w10, wzr, gt ;*baload {reexecute=0 rethrow=0 return_oop=0} So the patch got tested for the most part and is not causing us trouble. On 8 April 2016 at 22:42, Andrew Haley wrote: > On 04/08/2016 03:36 PM, Felix Yang wrote: > > Is it OK? > > It looks good, but I'm surely going to need a jtreg test case > which exercises all these patterns in C2. > > Thanks, > > Andrew. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Mon Apr 11 11:50:03 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 11 Apr 2016 14:50:03 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5707FB86.1020408@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> Message-ID: <570B8F6B.2030206@oracle.com> Thanks for the feedback, Aleksey. >> I did some performance measurements [1] and reflection (non-constant >> class) case (non-constant class) regressed ~5-10% due to new guards added. > > My quick perfasm runs seems to show this is because a subtle difference: > http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm > http://cr.openjdk.java.net/~shade/8153540/patched.perfasm > > If you compare these, then the difference seems to be the instruction > scheduling and a branch in the guards code. > > Baseline: ... > Patched: ... > > Unfortunately, a simple fix of replacing "||" with "|" explodes the > generated code. Maybe something else is doable there. Yes, C2 can't fuse Class.isArray with slow bit check from Klass::layout_helper. (Partly, because they dispatch to different places: !Class.isArray() case dispatches to explicit exception instantiation and slow path calls into runtime). Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort. Ideally, something like [1] (which requires 2 new intrinsics): * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers; * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object. Best regards, Vladimir Ivanov [1] @ForceInline public Object allocateInstance(Class cls) throws InstantiationException { // Interfaces and abstract classes are handled by the intrinsic. if (isFastAllocatable(cls)) { return allocateInstance0(cls); } else { return allocateInstanceSlow(cls); } } @HotSpotIntrinsicCandidate private native boolean isFastAllocatable(Class cls); @HotSpotIntrinsicCandidate private native Object allocateInstance0(Class cls) throws InstantiationException; // Calls into modified OptoRuntime::new_instance_C @HotSpotIntrinsicCandidate private native Object allocateInstanceSlow(Class cls) throws InstantiationException; > >> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java > > Suggestions to improve fidelity: > * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance > * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on > @Benchmarks if you want to use -prof perfasm > > Thanks, > -Aleksey > > From aph at redhat.com Mon Apr 11 11:56:02 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Apr 2016 12:56:02 +0100 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: References: <5707C35A.2060000@redhat.com> Message-ID: <570B90D2.5020900@redhat.com> Hi, On 04/11/2016 12:47 PM, Felix Yang wrote: > > Thanks for reviewing the patch. > I find that the cases the patch tries to catch here are the > result of loop transformations. > And it's hard to produce a test case to for it simply using the > Math.min/max API. (Seems C2 will not create a MaxINode/MinINode > for a call for these APIs) But I do noticed some JTReg hotspot > test cases that already generates the pattern. > > So the patch got tested for the most part and is not causing us trouble. Sure, but that doesn't necessarily give us test coverage of the changes you've made. Is the problem that you don't know how to write test cases to cover your changes? Andrew. From vladimir.kempik at oracle.com Mon Apr 11 12:00:28 2016 From: vladimir.kempik at oracle.com (Vladimir Kempik) Date: Mon, 11 Apr 2016 15:00:28 +0300 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space Message-ID: <570B91DC.2040904@oracle.com> Hello Please review this backport of 8130309 to jdk8u. Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. Testing: jprt, failing test. Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ Thanks -Vladimir From nils.eliasson at oracle.com Mon Apr 11 12:09:29 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 11 Apr 2016 14:09:29 +0200 Subject: RFR(S): 8153885: [TESTBUG] few regression tests failed after 8151880 changes Message-ID: <570B93F9.5030005@oracle.com> Hi, Please review this fix. Summary: 1) Add -XX:-UseCounterDecay to three tests that where using the compile()-method through the getBCI method. 2) Fix CompileCommand patterns and comments still using the old SimpleTestCase$Helper pattern. Testing: Running all compiler JTREG tests on linux-x64. Bug: https://bugs.openjdk.java.net/browse/JDK-8153885 Webrev: http://cr.openjdk.java.net/~neliasso/8153885/webrev.01/ Regards, Nils Eliasson From tobias.hartmann at oracle.com Mon Apr 11 12:26:43 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Apr 2016 14:26:43 +0200 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <570B91DC.2040904@oracle.com> References: <570B91DC.2040904@oracle.com> Message-ID: <570B9803.2030509@oracle.com> Hi Vladimir, On 11.04.2016 14:00, Vladimir Kempik wrote: > Hello > > Please review this backport of 8130309 to jdk8u. > > Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. > > Testing: jprt, failing test. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 > Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ Looks good to me. Thanks for backporting this! Best regards, Tobias > > Thanks > -Vladimir > From felix.yang at linaro.org Mon Apr 11 12:48:02 2016 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 11 Apr 2016 20:48:02 +0800 Subject: [aarch64-port-dev ] RFR: 8153837 : aarch64: handle special cases for MaxINode & MinINode In-Reply-To: <570B90D2.5020900@redhat.com> References: <5707C35A.2060000@redhat.com> <570B90D2.5020900@redhat.com> Message-ID: Hi, Yes, I haven't looked into the details of the loop transformation code and find it hard to produce a test case at least for now. And it seems to me necessary to do so in order to produce a good JTReg test case for the patch. But I am not quite sure if I can come up with a test case which covers all the templates added. Any suggestions? Thanks, Felix On 11 April 2016 at 19:56, Andrew Haley wrote: > Hi, > > On 04/11/2016 12:47 PM, Felix Yang wrote: > > > > Thanks for reviewing the patch. > > > I find that the cases the patch tries to catch here are the > > result of loop transformations. > > And it's hard to produce a test case to for it simply using the > > Math.min/max API. (Seems C2 will not create a MaxINode/MinINode > > for a call for these APIs) But I do noticed some JTReg hotspot > > test cases that already generates the pattern. > > > > So the patch got tested for the most part and is not causing us > trouble. > > Sure, but that doesn't necessarily give us test coverage of the > changes you've made. Is the problem that you don't know how to write > test cases to cover your changes? > > Andrew. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.punegov at oracle.com Mon Apr 11 14:02:06 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Mon, 11 Apr 2016 17:02:06 +0300 Subject: RFR (S): 8153852: [jittester] move TypeUtil to utils package In-Reply-To: <5707E49D.4040400@oracle.com> References: <4DA46F96-62BA-463A-8DCA-933B6A343B3D@oracle.com> <5707E49D.4040400@oracle.com> Message-ID: <35C50286-0D4C-4F3E-8862-FFDA32334390@oracle.com> Thanks for review, Vladimir > On 08 Apr 2016, at 20:04, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 4/8/16 7:40 AM, Pavel Punegov wrote: >> Hi, >> >> please review the following change to JITtester: >> - rewrite TypeUtil and move to utils package >> - add javadoc for each method in the class >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8153852 >> webrev: http://cr.openjdk.java.net/~ppunegov/8153852/webrev.00/ >> >> ? Thanks, >> Pavel Punegov >> From pavel.punegov at oracle.com Mon Apr 11 14:02:57 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Mon, 11 Apr 2016 17:02:57 +0300 Subject: RFR(XS): 8140354: Unquarantine tests that failed with OutOfMemoryError In-Reply-To: <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com> References: <711E7E99-59F9-4012-B273-336AC86E3A6E@oracle.com> <2DA4203F-E141-4A5E-B41A-1B0DD06629A3@oracle.com> Message-ID: <1940774F-D052-4995-97A7-3E2485229CA3@oracle.com> Thanks for review, Igor > On 07 Apr 2016, at 16:43, Igor Ignatyev wrote: > > Pavel, > > looks good to me > > ? Igor >> On Apr 7, 2016, at 4:32 PM, Pavel Punegov wrote: >> >> Hi, >> >> please review this fix to unquarantine CompilerControl tests after the JDK-8140354 [1] is closed as a duplicate of the JDK-8144621 [2] >> The second one has fixed main issue that caused OOME in tests. It disabled generation of patterns *.* for compile commands like ?print" that made a lot of output from the tests VM. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8140354 >> [2] https://bugs.openjdk.java.net/browse/JDK-8144621 >> -- >> webrev http://cr.openjdk.java.net/~ppunegov/8153661/webrev.00/ >> bug https://bugs.openjdk.java.net/browse/JDK-8153661 >> >> ? Pavel. >> > From vladimir.kozlov at oracle.com Mon Apr 11 19:13:42 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Apr 2016 12:13:42 -0700 Subject: RFR(S): 8153885: [TESTBUG] few regression tests failed after 8151880 changes In-Reply-To: <570B93F9.5030005@oracle.com> References: <570B93F9.5030005@oracle.com> Message-ID: <570BF766.9020300@oracle.com> Looks good. thanks, Vladimir On 4/11/16 5:09 AM, Nils Eliasson wrote: > Hi, > > Please review this fix. > > Summary: > 1) Add -XX:-UseCounterDecay to three tests that where using the > compile()-method through the getBCI method. > 2) Fix CompileCommand patterns and comments still using the old > SimpleTestCase$Helper pattern. > > Testing: > Running all compiler JTREG tests on linux-x64. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153885 > Webrev: http://cr.openjdk.java.net/~neliasso/8153885/webrev.01/ > > Regards, > Nils Eliasson From christian.thalinger at oracle.com Mon Apr 11 22:09:24 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Apr 2016 12:09:24 -1000 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570B8F6B.2030206@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> Message-ID: <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> > On Apr 11, 2016, at 1:50 AM, Vladimir Ivanov wrote: > > Thanks for the feedback, Aleksey. > >>> I did some performance measurements [1] and reflection (non-constant >>> class) case (non-constant class) regressed ~5-10% due to new guards added. >> >> My quick perfasm runs seems to show this is because a subtle difference: >> http://cr.openjdk.java.net/~shade/8153540/baseline.perfasm >> http://cr.openjdk.java.net/~shade/8153540/patched.perfasm >> >> If you compare these, then the difference seems to be the instruction >> scheduling and a branch in the guards code. >> >> Baseline: > ... >> Patched: > ... >> >> Unfortunately, a simple fix of replacing "||" with "|" explodes the >> generated code. Maybe something else is doable there. > > Yes, C2 can't fuse Class.isArray with slow bit check from Klass::layout_helper. (Partly, because they dispatch to different places: !Class.isArray() case dispatches to explicit exception instantiation and slow path calls into runtime). > > Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort. > > Ideally, something like [1] (which requires 2 new intrinsics): I would advise against that. We are fixing a long-standing bug here and although we see a regression the code we produced before was just wrong. Comparing against something that was wrong in the first place is moot. Take the hit; I doubt it will show up at customer applications. > > * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers; > > * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object. > > Best regards, > Vladimir Ivanov > > [1] > @ForceInline > public Object allocateInstance(Class cls) throws InstantiationException { > // Interfaces and abstract classes are handled by the intrinsic. > if (isFastAllocatable(cls)) { > return allocateInstance0(cls); > } else { > return allocateInstanceSlow(cls); > } > } > > @HotSpotIntrinsicCandidate > private native boolean isFastAllocatable(Class cls); > > @HotSpotIntrinsicCandidate > private native Object allocateInstance0(Class cls) throws InstantiationException; > > // Calls into modified OptoRuntime::new_instance_C > @HotSpotIntrinsicCandidate > private native Object allocateInstanceSlow(Class cls) throws InstantiationException; > > >> >>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >> >> Suggestions to improve fidelity: >> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance >> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >> @Benchmarks if you want to use -prof perfasm >> >> Thanks, >> -Aleksey >> >> From igor.ignatyev at oracle.com Tue Apr 12 00:03:30 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 11 Apr 2016 17:03:30 -0700 Subject: RFR(XS) : 8152376 : [TESTBUG] compiler/floatingpoint/Test15FloatJNIArgs should use run main/othervm/native Message-ID: <53A08B91-81FD-4C36-9FF2-780AAE3D99CB@oracle.com> http://cr.openjdk.java.net/~iignatyev/8152376/webrev.00/ > 3 lines changed: 0 ins; 0 del; 3 mod; Hi all, could you please review this small fix which adds '/native? option to all main actions? the test uses native library, all such tests should be marked w/ ?/native? option so jtreg would be able to handle them accordingly. JBS: https://bugs.openjdk.java.net/browse/JDK-8152376 webrev: http://cr.openjdk.java.net/~iignatyev/8152376/webrev.00/ Thanks, ? Igor From martin.doerr at sap.com Tue Apr 12 09:45:54 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 12 Apr 2016 09:45:54 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> Message-ID: <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> Hi, I think we have come to a common understanding and there was no complaint about my latest webrev: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ Can I consider it reviewed? Can somebody sponsor, please? Thanks and best regards, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin Sent: Donnerstag, 7. April 2016 12:52 To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Hi Andrew, Jamsheed and all, thank you very much for your input. As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also correct). My change still contains a releasing store for newly created ExceptionCache instances. As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms. I think having the release doesn't hurt too much and makes the design a little cleaner. I also added comments based on your input. The new webrev is here: http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ Please review. I will also need a sponsor from Oracle, please. Thanks again and best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 7. April 2016 12:14 To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe On 07/04/16 10:08, Doerr, Martin wrote: > atomic update for the _count would only be required if there were > multiply threads which attempt to increment it > concurrently. However, updates are under lock, so we only have > concurrent readers which is ok. > > I still think "volatile" does what we need here. Especially the xlC > compiler on AIX tends to reload variables from memory. Exactly this > can be prevented by making the field volatile. I think your latest patch is OK. Whether volatile is really good enough, I don't know. The new(ish) C++ memory model treats this as a race, and therefore undefined behaviour. Old C++ didn't have a memory model, so the best we can do with racy code is guess about what our compilers might do. I certainly much prefer a release_store to the storestore fence used in the fix for 8143897. Andrew. From felix.yang at linaro.org Tue Apr 12 10:49:33 2016 From: felix.yang at linaro.org (Felix Yang) Date: Tue, 12 Apr 2016 18:49:33 +0800 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair In-Reply-To: <57067E32.3010403@redhat.com> References: <57067E32.3010403@redhat.com> Message-ID: Done. New webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.01 Tested with JTreg hotspot, langtools and jdk. Thanks, Felix On 7 April 2016 at 23:35, Andrew Haley wrote: > On 04/07/2016 04:01 PM, Felix Yang wrote: > > Please review webrev: > http://cr.openjdk.java.net/~fyang/8153713/webrev.00/ > > JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8153713 > > > > Currently, C2 compiler generate independent stores to clear > > short arrays whose size is no bigger than parameter > > InitArrayShortSize (refer to ClearArrayNode::Ideal function). > > For the aarch64 port, we have store pair instruction which can > > zero two memory words at a time and this will be good for code > > size and maybe performance for some micro-archs. > > > > For Spark Terasort, an extra of 550 stp (xzr, xzr) instructions > > are generated with the patch, which mean about 2KB reduction in > > codesize. > > > > Tested with JTreg hotspot, langtools and jdk. Is it OK? > > It looks reasonable. It's rather a big slab of code for aarch64.ad, > and I think that it should be in MacroAssembler. Long Chen created > MacroAssembler::zero_words, and you should create an overload of > zero_words which takes a constant int as an argument. > > Andrew. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Apr 12 11:07:46 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Apr 2016 14:07:46 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> Message-ID: <570CD702.4070909@oracle.com> >> Additional flag in a mirror (j.l.Class) which marks instance klasses could help here, but I'm still not sure it's worth the effort. >> >> Ideally, something like [1] (which requires 2 new intrinsics): > > I would advise against that. We are fixing a long-standing bug here and although we see a regression the code we produced before was just wrong. Comparing against something that was wrong in the first place is moot. It wasn't intended as a call for action, but more like a backup plan if there's a need to speed up the reflection case. I'd like to keep the fix simple and current version looks good enough: http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00 Any Reviews, please? Best regards, Vladimir Ivanov > > Take the hit; I doubt it will show up at customer applications. > >> >> * isFastAllocatable() performs all necessary checks: null checks on cls, not primitive, not array, not interface, not abstract, fully initialized, no finalizers; >> >> * allocateInstanceSlow() handles all cases the intrisic doesn't handle: either throws IE or does necessary operations (e.g., initialize the class or register a finalizer) when instantiating an object. >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> @ForceInline >> public Object allocateInstance(Class cls) throws InstantiationException { >> // Interfaces and abstract classes are handled by the intrinsic. >> if (isFastAllocatable(cls)) { >> return allocateInstance0(cls); >> } else { >> return allocateInstanceSlow(cls); >> } >> } >> >> @HotSpotIntrinsicCandidate >> private native boolean isFastAllocatable(Class cls); >> >> @HotSpotIntrinsicCandidate >> private native Object allocateInstance0(Class cls) throws InstantiationException; >> >> // Calls into modified OptoRuntime::new_instance_C >> @HotSpotIntrinsicCandidate >> private native Object allocateInstanceSlow(Class cls) throws InstantiationException; >> >> >>> >>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >>> >>> Suggestions to improve fidelity: >>> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves variance >>> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >>> @Benchmarks if you want to use -prof perfasm >>> >>> Thanks, >>> -Aleksey >>> >>> > From aph at redhat.com Tue Apr 12 12:30:22 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Apr 2016 13:30:22 +0100 Subject: RFR: 8153713 : aarch64: improve short array clearing using store pair In-Reply-To: References: <57067E32.3010403@redhat.com> Message-ID: <570CEA5E.2040208@redhat.com> On 04/12/2016 11:49 AM, Felix Yang wrote: > Done. New webrev: http://cr.openjdk.java.net/~fyang/8153713/webrev.01 That looks fine. Thanks. Andrew. From nils.eliasson at oracle.com Tue Apr 12 13:30:19 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 12 Apr 2016 15:30:19 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5707E5C7.3000000@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> Message-ID: <570CF86B.3050804@oracle.com> Tasks get evicted from the compile_queue if their invocation counter hasn't increased during TieredCompileTaskTimeout. (AdvancedThresholdPolicy::is_stale(...)). I'll do a proper fix, it is the right thing to do and should be pretty quick. I'll change the comment to an enum that represent who submitted the compile, and add a table for the comments. This could be useful in other settings to. Regards, Nils On 2016-04-08 19:09, Vladimir Kozlov wrote: > What do you mean "stale"? > I would prefer to see the real fix as you suggested to avoid removing > WB comp tasks from queue. Adding timeout is not reliable. > > Thanks, > Vladimir > > On 4/8/16 5:27 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this small fix of the BlockingCompilation test. >> >> Summary: >> Add method enqueued for compilation with WB API may be removed from >> the compile queue as stale. >> >> Solution: >> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >> stale while the test is running. (Also added some extra >> checks that may spare us from waiting until timeout for failing.) >> >> This is an workaround but we should consider fixing something >> permanent for WB API compiles - like tagging the compile >> task with info about the origin of the compile. The comment field has >> this information - but then it needs to be >> converted to an enum. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >> >> Best regards, >> Nils Eliasson >> >> >> >> From vladimir.kozlov at oracle.com Tue Apr 12 16:33:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Apr 2016 09:33:12 -0700 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570CD702.4070909@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> Message-ID: <570D2348.5000204@oracle.com> You did not fix comment: + // public native Object Unsafe.allocateInstance(Class cls); should be: + // private native Object allocateInstance0(Class cls) throws InstantiationException; An other question: does it really throw InstantiationException? Thanks, Vladimir On 4/12/16 4:07 AM, Vladimir Ivanov wrote: > >>> Additional flag in a mirror (j.l.Class) which marks instance klasses >>> could help here, but I'm still not sure it's worth the effort. >>> >>> Ideally, something like [1] (which requires 2 new intrinsics): >> >> I would advise against that. We are fixing a long-standing bug here >> and although we see a regression the code we produced before was just >> wrong. Comparing against something that was wrong in the first place >> is moot. > > It wasn't intended as a call for action, but more like a backup plan if > there's a need to speed up the reflection case. > > I'd like to keep the fix simple and current version looks good enough: > http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00 > > Any Reviews, please? > > Best regards, > Vladimir Ivanov > >> >> Take the hit; I doubt it will show up at customer applications. >> >>> >>> * isFastAllocatable() performs all necessary checks: null checks on >>> cls, not primitive, not array, not interface, not abstract, fully >>> initialized, no finalizers; >>> >>> * allocateInstanceSlow() handles all cases the intrisic doesn't >>> handle: either throws IE or does necessary operations (e.g., >>> initialize the class or register a finalizer) when instantiating an >>> object. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] >>> @ForceInline >>> public Object allocateInstance(Class cls) throws >>> InstantiationException { >>> // Interfaces and abstract classes are handled by the intrinsic. >>> if (isFastAllocatable(cls)) { >>> return allocateInstance0(cls); >>> } else { >>> return allocateInstanceSlow(cls); >>> } >>> } >>> >>> @HotSpotIntrinsicCandidate >>> private native boolean isFastAllocatable(Class cls); >>> >>> @HotSpotIntrinsicCandidate >>> private native Object allocateInstance0(Class cls) throws >>> InstantiationException; >>> >>> // Calls into modified OptoRuntime::new_instance_C >>> @HotSpotIntrinsicCandidate >>> private native Object allocateInstanceSlow(Class cls) throws >>> InstantiationException; >>> >>> >>>> >>>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >>>> >>>> Suggestions to improve fidelity: >>>> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves >>>> variance >>>> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >>>> @Benchmarks if you want to use -prof perfasm >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> >> From vladimir.kozlov at oracle.com Tue Apr 12 16:55:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Apr 2016 09:55:05 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570CF86B.3050804@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> Message-ID: <570D2869.5030206@oracle.com> On 4/12/16 6:30 AM, Nils Eliasson wrote: > Tasks get evicted from the compile_queue if their invocation counter > hasn't increased during TieredCompileTaskTimeout. > (AdvancedThresholdPolicy::is_stale(...)). > > I'll do a proper fix, it is the right thing to do and should be pretty > quick. I'll change the comment to an enum that represent who submitted > the compile, and add a table for the comments. This could be useful in > other settings to. Sounds good. Thanks, Vladimir > > Regards, > Nils > > On 2016-04-08 19:09, Vladimir Kozlov wrote: >> What do you mean "stale"? >> I would prefer to see the real fix as you suggested to avoid removing >> WB comp tasks from queue. Adding timeout is not reliable. >> >> Thanks, >> Vladimir >> >> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this small fix of the BlockingCompilation test. >>> >>> Summary: >>> Add method enqueued for compilation with WB API may be removed from >>> the compile queue as stale. >>> >>> Solution: >>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>> stale while the test is running. (Also added some extra >>> checks that may spare us from waiting until timeout for failing.) >>> >>> This is an workaround but we should consider fixing something >>> permanent for WB API compiles - like tagging the compile >>> task with info about the origin of the compile. The comment field has >>> this information - but then it needs to be >>> converted to an enum. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>> >>> Best regards, >>> Nils Eliasson >>> >>> >>> >>> > From vladimir.x.ivanov at oracle.com Tue Apr 12 16:55:41 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Apr 2016 19:55:41 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570D2348.5000204@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> Message-ID: <570D288D.8020106@oracle.com> On 4/12/16 7:33 PM, Vladimir Kozlov wrote: > You did not fix comment: > > + // public native Object Unsafe.allocateInstance(Class cls); > > should be: > > + // private native Object allocateInstance0(Class cls) throws > InstantiationException; Ok, finally found where it is :-) Incorporated (will update the webrev shortly). > An other question: does it really throw InstantiationException? Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper). I didn't move the check into Java, because I didn't want to add yet another guard on fast path. Best regards, Vladimir Ivanov > > On 4/12/16 4:07 AM, Vladimir Ivanov wrote: >> >>>> Additional flag in a mirror (j.l.Class) which marks instance klasses >>>> could help here, but I'm still not sure it's worth the effort. >>>> >>>> Ideally, something like [1] (which requires 2 new intrinsics): >>> >>> I would advise against that. We are fixing a long-standing bug here >>> and although we see a regression the code we produced before was just >>> wrong. Comparing against something that was wrong in the first place >>> is moot. >> >> It wasn't intended as a call for action, but more like a backup plan if >> there's a need to speed up the reflection case. >> >> I'd like to keep the fix simple and current version looks good enough: >> http://cr.openjdk.java.net/~vlivanov/8153540/webrev.00 >> >> Any Reviews, please? >> >> Best regards, >> Vladimir Ivanov >> >>> >>> Take the hit; I doubt it will show up at customer applications. >>> >>>> >>>> * isFastAllocatable() performs all necessary checks: null checks on >>>> cls, not primitive, not array, not interface, not abstract, fully >>>> initialized, no finalizers; >>>> >>>> * allocateInstanceSlow() handles all cases the intrisic doesn't >>>> handle: either throws IE or does necessary operations (e.g., >>>> initialize the class or register a finalizer) when instantiating an >>>> object. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] >>>> @ForceInline >>>> public Object allocateInstance(Class cls) throws >>>> InstantiationException { >>>> // Interfaces and abstract classes are handled by the >>>> intrinsic. >>>> if (isFastAllocatable(cls)) { >>>> return allocateInstance0(cls); >>>> } else { >>>> return allocateInstanceSlow(cls); >>>> } >>>> } >>>> >>>> @HotSpotIntrinsicCandidate >>>> private native boolean isFastAllocatable(Class cls); >>>> >>>> @HotSpotIntrinsicCandidate >>>> private native Object allocateInstance0(Class cls) throws >>>> InstantiationException; >>>> >>>> // Calls into modified OptoRuntime::new_instance_C >>>> @HotSpotIntrinsicCandidate >>>> private native Object allocateInstanceSlow(Class cls) throws >>>> InstantiationException; >>>> >>>> >>>>> >>>>>> [1] http://cr.openjdk.java.net/~vlivanov/8153540/AllocInstance.java >>>>> >>>>> Suggestions to improve fidelity: >>>>> * Run allocation benchmarks with -Xmx1g -Xms1g; this improves >>>>> variance >>>>> * Add @CompilerControl(CompilerControl.Mode.DONT_INLINE) on >>>>> @Benchmarks if you want to use -prof perfasm >>>>> >>>>> Thanks, >>>>> -Aleksey >>>>> >>>>> >>> From karen.kinnear at oracle.com Tue Apr 12 17:18:32 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 12 Apr 2016 13:18:32 -0400 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> Message-ID: <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com> Igor, My apologies, I thought you had already decided to push. Yes, I am good with the changes. Sorry to keep you waiting. thanks, Karen > On Apr 6, 2016, at 1:49 PM, Igor Veresov wrote: > > Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately. > > Thanks, > igor > >> On Apr 5, 2016, at 4:22 PM, Igor Veresov wrote: >> >> >>> On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: >>> >>> Igor, >>> >>> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter >>> for instance? >> >> Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. >> >>> >>> If so, I am ok with checking this in - further notes below. >>> >>>> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >>>> >>>> >>>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>>>> >>>>> I am in agreement with Lois that the JVMS looks good with moving the exception. >>>> >>>> Thanks! >>>>> >>>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>>>> meeting I will check one more time. It might be worth adding a comment. >>>> >>>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >>>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >>>> >>>>> >>>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>>>> >>>> >>>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >>>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). >>> >>> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match >>> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. >>> That is ok with me - I will add a note to the bug. >> >> Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? >> >>> >>> Also: I see a ciMethod::check_call that has a comment - >>> IT appears to fail when applied to an invoke interface call site. >>> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. >>> >> >> This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? >> >> igor >> >>> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take >>> the subtleties of invoke interface and invoke special into account. >>>> >>>> igor >>>> >>>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>>>> so that you get the correct behavior depending on the requesting byte code. >>>>> >>>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>>>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>>>> >>>>> thanks, >>>>> Karen >>>>> >>>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>>>> >>>>>> >>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>> Hi Lois, >>>>>>> >>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>> >>>>>>> igor >>>>>> Hi Igor, >>>>>> >>>>>> Thanks for waiting on this. A couple of comments: >>>>>> >>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>> >>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>> >>>>>> Just curious did you also run the testbase default methods tests? >>>>>> Lois >>>>>> >>>>>>> >>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>> >>>>>>>> Hi Igor, >>>>>>>> >>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Lois >>>>>>>> >>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>> >>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> igor >> > From igor.veresov at oracle.com Tue Apr 12 18:54:33 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 12 Apr 2016 11:54:33 -0700 Subject: RFR(S) 8153115: Move private interface check to linktime In-Reply-To: <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com> References: <7C1EDB2F-3203-4699-861F-0C57BAF16E9D@oracle.com> <5703E823.8050400@oracle.com> <86D93F15-9D23-4BD1-A2C7-141548096024@oracle.com> <5703F717.702@oracle.com> <1EAD5675-6903-4C48-A778-05A99F591D1C@oracle.com> <4C106849-365D-42E8-B4A5-ADC7297BEA1B@oracle.com> <9F99FE47-E84D-4C4A-B854-F6432D10BC6C@oracle.com> <713E9C18-274A-41F5-AA40-56A78A608763@oracle.com> <2CAC294F-89AF-441B-B9BE-9CCD76C0B5B4@oracle.com> Message-ID: Thanks, Karen! igor > On Apr 12, 2016, at 10:18 AM, Karen Kinnear wrote: > > Igor, > > My apologies, I thought you had already decided to push. Yes, I am good with the changes. > Sorry to keep you waiting. > > thanks, > Karen > >> On Apr 6, 2016, at 1:49 PM, Igor Veresov wrote: >> >> Karen, am I correct to assume I can consider the current change reviewed? I?d like to push it. We can discuss how to harden/refactor other dimensions of the use of LinkResolver by compilers separately. >> >> Thanks, >> igor >> >>> On Apr 5, 2016, at 4:22 PM, Igor Veresov wrote: >>> >>> >>>> On Apr 5, 2016, at 3:33 PM, Karen Kinnear wrote: >>>> >>>> Igor, >>>> >>>> Do you run all the tests with -Xcomp or whatever flag ensures you test JVMCI vs. interpreter >>>> for instance? >>> >>> Yes, I ran our RBT round of testing that does that -Xcomp and -Xmixed. >>> >>>> >>>> If so, I am ok with checking this in - further notes below. >>>> >>>>> On Apr 5, 2016, at 3:43 PM, Igor Veresov > wrote: >>>>> >>>>> >>>>>> On Apr 5, 2016, at 12:04 PM, Karen Kinnear > wrote: >>>>>> >>>>>> I am in agreement with Lois that the JVMS looks good with moving the exception. >>>>> >>>>> Thanks! >>>>>> >>>>>> With the checking I have done so far, I believe that linktime_resolve_static_method is only called with an invoke static - but after my next >>>>>> meeting I will check one more time. It might be worth adding a comment. >>>>> >>>>> Ok, I added a comment to resolve_interface_method(): http://cr.openjdk.java.net/~iveresov/8153115/webrev.01/ >>>>> Again, the bytecode argument here indicates the context, not the actual bytecode. Of course, for example, the method may be invoked with a method handle. >>>>> >>>>>> >>>>>> My concern is specific to the code in jvmciCompilerToVM.cpp there is code called resolveMethod that checks >>>>>> if holder_klass->is_interface -> LR::linktime_resolve_interface_method_or_null. >>>>>> >>>>> >>>>> That code needs fixing as well. We have the following issue filed for that: https://bugs.openjdk.java.net/browse/JDK-8152903 >>>>> In the nutshell, resolveMethod() sort of emulates what happens during a virtual call (it is only called for invokevirtual and invokeinterface). It should do both linktime and runtime resolutions. The JVMCI version should work the same way as the CI version (see ciMethod::resolve_invoke() in ciMethod.cpp). >>>> >>>> Hmmm - I see the ciMethod::resolve_invoke has the same problem that I just called out, so making the JVMCI one match >>>> the CI version makes me wonder if we have sufficient test cases. But I hear that you would like to address that as a followup. >>>> That is ok with me - I will add a note to the bug. >>> >>> Could you please explain what is the problem again? Are you concerned that the bytecode is not passed to resolve_invoke, so we may call linktime_resolve_interface_or_null, for an interface holder when in reality it was an invokevirtual instruction and vice versa? >>> >>>> >>>> Also: I see a ciMethod::check_call that has a comment - >>>> IT appears to fail when applied to an invoke interface call site. >>>> FIXME: Remove this method and resolve_method_statically; refactor to use the other LinkResolver entry points. >>>> >>> >>> This comment is odd. I don?t see why it would fail for invokeinterface. The code certainly seems to account for it. May be the comment is wrong? Any ideas? >>> >>> igor >>> >>>> Could you possibly file a bug on this one? What I am seeing is a conditional for invoke static vs. invoke virtual that does not take >>>> the subtleties of invoke interface and invoke special into account. >>>>> >>>>> igor >>>>> >>>>>> I think we need to study this one more closely - I suspect that you need a set of detailed tests that cover the >>>>>> corner cases here. I don?t know the code paths, but I would suggest following this approach of passing in the byte code, >>>>>> so that you get the correct behavior depending on the requesting byte code. >>>>>> >>>>>> I am not sure what resolveMethod is doing here - since you seem to already have the resolved method and the receiver - so >>>>>> I could use help studying this a bit more to understand if this really is resolution or is really selection. >>>>>> >>>>>> thanks, >>>>>> Karen >>>>>> >>>>>>> On Apr 5, 2016, at 1:34 PM, Lois Foltan > wrote: >>>>>>> >>>>>>> >>>>>>> On 4/5/2016 12:50 PM, Igor Veresov wrote: >>>>>>>> Hi Lois, >>>>>>>> >>>>>>>> Thanks for looking at it. Yes, it passes all hotspot jtreg tests. >>>>>>>> >>>>>>>> igor >>>>>>> Hi Igor, >>>>>>> >>>>>>> Thanks for waiting on this. A couple of comments: >>>>>>> >>>>>>> - Section 6.5 Instructions for invokeinterface does indicate that a "Linking Exceptions" the VM can throw an ICCE if the resolved method is static or private. So I think moving this exception from runtime to linktime is okay. >>>>>>> >>>>>>> - I'm concerned about the change on line #998, #1030, #1316. I don't think you are necessarily guaranteed to have the bytecodes that you are now passing to resolve_interface_method. For example, line #998 within linktime_resolve_static_method, you may not have an invokestatic here, you may have another invoke* bytecode trying to invoke a static interface method. Passing in Bytecodes::_invokestatic seems wrong, because even if the resolved method is static, "nostatics" was set to false. >>>>>>> >>>>>>> Just curious did you also run the testbase default methods tests? >>>>>>> Lois >>>>>>> >>>>>>>> >>>>>>>>> On Apr 5, 2016, at 9:30 AM, Lois Foltan > wrote: >>>>>>>>> >>>>>>>>> Hi Igor, >>>>>>>>> >>>>>>>>> I know you have two reviews for this but could you hold off committing until I or Karen Kinnear have a chance to review. We both worked in this area a lot to support default methods in JDK 8. Also, have you run the hotspot/test/runtime/SelectionResolution tests on this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Lois >>>>>>>>> >>>>>>>>> On 4/1/2016 2:28 PM, Igor Veresov wrote: >>>>>>>>>> When invoking private interface methods with invokeinterface we throw ICCE. The check for that happens in the runtime part of the resolution, however, doing it at linktime seems like a better place, since the check doesn't depend on the receiver type. It also allows compiler interfaces that rely on linktime resolution to detect inconsistencies during parsing (see ciEnv::lookup_method() (CI) and ConstantPool.lookupMethod() (JVMCI) that are affected). >>>>>>>>>> >>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8153115 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8153115/webrev.00/ >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> igor >>> >> > From john.r.rose at oracle.com Tue Apr 12 20:09:23 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 12 Apr 2016 13:09:23 -0700 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <570D288D.8020106@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> <570D288D.8020106@oracle.com> Message-ID: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov wrote: > > On 4/12/16 7:33 PM, Vladimir Kozlov wrote: >> You did not fix comment: >> >> + // public native Object Unsafe.allocateInstance(Class cls); >> >> should be: >> >> + // private native Object allocateInstance0(Class cls) throws >> InstantiationException; > > Ok, finally found where it is :-) > Incorporated (will update the webrev shortly). > >> An other question: does it really throw InstantiationException? > > Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper). > > I didn't move the check into Java, because I didn't want to add yet another guard on fast path. A fix is necessary, but I'm not comfortable with the shape of the checking logic. The C-coded JNI function (not the intrinsic) just surfaces the function JNIEnv::AllocObject. This function calls some complicated C++ logic in Klass::check_valid_for_instantiation to check for various things, including arrays and abstracts. (There's also a primitive check.) So the problem is that the JIT intrinsic doesn't mimic all these checks. And a good tactic is to lift such checks into Java code, since Java is maintainable. But, there is still a maintenance problem: The checks in the proposed chance overlap with, but do not cover, the checks performed by JNIEnv::AllocObject. Thus, it is difficult to prove that they are correct. Some additional checks are performed (in an ad hoc manner) by the JIT intrinsic. Thus, the checking for a valid class is now in three places: 1. JNIEnv::AllocObject (when the intrinsic is not used), 2. the new Java code (whether the intrinsic is used or not), and 3. the partial checks in the intrinsic code (library_call.cpp). The unit test will prevent regressions, but the code is still messy and hard to work with. Can we make it better at this point? Maybe not; maybe this is the least-bad point fix. But it seems to me that a less-bad fix would put the required logic in two places rather than three. Two ways to do that are 1. push the prim and array checks from Java down into the JIT intrinsic, next to the pre-existing checks, or 2. pull the pre-existing JIT intrinsic tests up into Java. Option 2 seems to require a new intrinsic to capture the pre-existing intrinsic tests. On the whole, since this Unsafe API point simply exposes JNIEnv::AllocObject, I suggest doing the necessary work in library_call.cpp to make the intrinsic accurately reflect that JNI function. That will make the checks easier to verify and maintain. I don't think (AM I REALLY SAYING THIS?) the Java-based checks help much in this particular case. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Apr 12 20:40:14 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 12 Apr 2016 10:40:14 -1000 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> <570D288D.8020106@oracle.com> <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> Message-ID: <4D7414C9-9D41-4939-A7FE-95CDAC9E865C@oracle.com> > On Apr 12, 2016, at 10:09 AM, John Rose wrote: > > On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov > wrote: >> >> On 4/12/16 7:33 PM, Vladimir Kozlov wrote: >>> You did not fix comment: >>> >>> + // public native Object Unsafe.allocateInstance(Class cls); >>> >>> should be: >>> >>> + // private native Object allocateInstance0(Class cls) throws >>> InstantiationException; >> >> Ok, finally found where it is :-) >> Incorporated (will update the webrev shortly). >> >>> An other question: does it really throw InstantiationException? >> >> Yes, it does throw IE from runtime call on slow path for abstract classes & interfaces (they have slow bit set in layout_helper). >> >> I didn't move the check into Java, because I didn't want to add yet another guard on fast path. > > A fix is necessary, but I'm not comfortable with the shape of the checking logic. > The C-coded JNI function (not the intrinsic) just surfaces the function JNIEnv::AllocObject. > This function calls some complicated C++ logic in Klass::check_valid_for_instantiation > to check for various things, including arrays and abstracts. (There's also a primitive check.) > > So the problem is that the JIT intrinsic doesn't mimic all these checks. > And a good tactic is to lift such checks into Java code, since Java is maintainable. > But, there is still a maintenance problem: The checks in the proposed chance > overlap with, but do not cover, the checks performed by JNIEnv::AllocObject. > Thus, it is difficult to prove that they are correct. Some additional checks are > performed (in an ad hoc manner) by the JIT intrinsic. > > Thus, the checking for a valid class is now in three places: 1. JNIEnv::AllocObject > (when the intrinsic is not used), 2. the new Java code (whether the intrinsic is used > or not), and 3. the partial checks in the intrinsic code (library_call.cpp). > > The unit test will prevent regressions, but the code is still messy and hard to work with. > > Can we make it better at this point? Maybe not; maybe this is the least-bad point fix. > But it seems to me that a less-bad fix would put the required logic in two places > rather than three. Two ways to do that are 1. push the prim and array checks from > Java down into the JIT intrinsic, next to the pre-existing checks, or 2. pull the > pre-existing JIT intrinsic tests up into Java. Option 2 seems to require a new > intrinsic to capture the pre-existing intrinsic tests. > > On the whole, since this Unsafe API point simply exposes JNIEnv::AllocObject, > I suggest doing the necessary work in library_call.cpp to make the intrinsic > accurately reflect that JNI function. That will make the checks easier to verify > and maintain. I don't think (AM I REALLY SAYING THIS?) the Java-based checks > help much in this particular case. -1 (for obvious reasons) -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Wed Apr 13 06:26:24 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 13 Apr 2016 06:26:24 +0000 Subject: CR for RFR 8153998 Message-ID: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Wed Apr 13 08:53:01 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 13 Apr 2016 10:53:01 +0200 Subject: [9] RFR(S): 8154073: Several compiler tests fail when are executed with C1 only Message-ID: <570E08ED.4010207@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8154073 http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/ TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests. TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message. Tested with RBT (running). Thanks, Tobias From nils.eliasson at oracle.com Wed Apr 13 12:59:30 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 13 Apr 2016 14:59:30 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570D2869.5030206@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> Message-ID: <570E42B2.2090306@oracle.com> Hi, New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ Summary Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change. Only new logic is the CompileTask::can_become_stale() method. Testing: Running Testset hotspot on all platforms and hotspot_all on one platform Regards, Nils Eliawsson On 2016-04-12 18:55, Vladimir Kozlov wrote: > On 4/12/16 6:30 AM, Nils Eliasson wrote: >> Tasks get evicted from the compile_queue if their invocation counter >> hasn't increased during TieredCompileTaskTimeout. >> (AdvancedThresholdPolicy::is_stale(...)). >> >> I'll do a proper fix, it is the right thing to do and should be pretty >> quick. I'll change the comment to an enum that represent who submitted >> the compile, and add a table for the comments. This could be useful in >> other settings to. > > Sounds good. > > Thanks, > Vladimir > >> >> Regards, >> Nils >> >> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>> What do you mean "stale"? >>> I would prefer to see the real fix as you suggested to avoid removing >>> WB comp tasks from queue. Adding timeout is not reliable. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this small fix of the BlockingCompilation test. >>>> >>>> Summary: >>>> Add method enqueued for compilation with WB API may be removed from >>>> the compile queue as stale. >>>> >>>> Solution: >>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>> stale while the test is running. (Also added some extra >>>> checks that may spare us from waiting until timeout for failing.) >>>> >>>> This is an workaround but we should consider fixing something >>>> permanent for WB API compiles - like tagging the compile >>>> task with info about the origin of the compile. The comment field has >>>> this information - but then it needs to be >>>> converted to an enum. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>> >>>> Best regards, >>>> Nils Eliasson >>>> >>>> >>>> >>>> >> From vladimir.x.ivanov at oracle.com Wed Apr 13 16:01:52 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 13 Apr 2016 19:01:52 +0300 Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error when invoking nonexistent method Message-ID: <570E6D70.40904@oracle.com> http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8154172 C1 unconditionally inserts null check before doing a call, even if it throws an error during linkage. It contradicts JVMS which requires that linking errors precede run-time errors. The fix is to detect non-resolvable cases and avoid null checks / profiling altogether letting the runtime to throw a linkage error. Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck). Some clarifications: - klass->is_loaded() && !target->is_loaded() is true when method resolution fails; - static vs non-static checks aren't needed because stream()->get_method already returns unloaded method in such case; Thanks! Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Apr 13 16:47:19 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 13 Apr 2016 19:47:19 +0300 Subject: [9] RFR (S): 8153540: C2 intrinsic for Unsafe.allocateInstance doesn't properly filter out array classes In-Reply-To: <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> References: <5707E0B5.5080501@oracle.com> <5707FB86.1020408@oracle.com> <570B8F6B.2030206@oracle.com> <06E637DC-33DD-48E6-9A92-1ACB848DCB19@oracle.com> <570CD702.4070909@oracle.com> <570D2348.5000204@oracle.com> <570D288D.8020106@oracle.com> <5FED50BA-0C4E-490E-8489-685151F57BB8@oracle.com> Message-ID: <570E7817.1040106@oracle.com> Thanks for the feedback, John. I see your point. Actually, after looking at check_valid_for_instantiation more carefully, I found one more missing check in the intrinsic: Class instantiation is forbidden in InstanceKlass::check_valid_for_instantiation, but the intrinsic allows it. So I agree it would be desirable to minimize duplication in the code. Am I right that you are in favor of the following approach? http://cr.openjdk.java.net/~vlivanov/8153540/webrev.slow_path/ I'll experiment to see how does it shape out in both cases. Best regards, Vladimir Ivanov On 4/12/16 11:09 PM, John Rose wrote: > On Apr 12, 2016, at 9:55 AM, Vladimir Ivanov > > wrote: >> >> On 4/12/16 7:33 PM, Vladimir Kozlov wrote: >>> You did not fix comment: >>> >>> + // public native Object Unsafe.allocateInstance(Class cls); >>> >>> should be: >>> >>> + // private native Object allocateInstance0(Class cls) throws >>> InstantiationException; >> >> Ok, finally found where it is :-) >> Incorporated (will update the webrev shortly). >> >>> An other question: does it really throw InstantiationException? >> >> Yes, it does throw IE from runtime call on slow path for abstract >> classes & interfaces (they have slow bit set in layout_helper). >> >> I didn't move the check into Java, because I didn't want to add yet >> another guard on fast path. > > A fix is necessary, but I'm not comfortable with the shape of the > checking logic. > The C-coded JNI function (not the intrinsic) just surfaces the function > JNIEnv::AllocObject. > This function calls some complicated C++ logic > in Klass::check_valid_for_instantiation > to check for various things, including arrays and abstracts. (There's > also a primitive check.) > > So the problem is that the JIT intrinsic doesn't mimic all these checks. > And a good tactic is to lift such checks into Java code, since Java is > maintainable. > But, there is still a maintenance problem: The checks in the proposed > chance > overlap with, but do not cover, the checks performed by JNIEnv::AllocObject. > Thus, it is difficult to prove that they are correct. Some additional > checks are > performed (in an ad hoc manner) by the JIT intrinsic. > > Thus, the checking for a valid class is now in three places: 1. > JNIEnv::AllocObject > (when the intrinsic is not used), 2. the new Java code (whether the > intrinsic is used > or not), and 3. the partial checks in the intrinsic code (library_call.cpp). > > The unit test will prevent regressions, but the code is still messy and > hard to work with. > > Can we make it better at this point? Maybe not; maybe this is the > least-bad point fix. > But it seems to me that a less-bad fix would put the required logic in > two places > rather than three. Two ways to do that are 1. push the prim and array > checks from > Java down into the JIT intrinsic, next to the pre-existing checks, or 2. > pull the > pre-existing JIT intrinsic tests up into Java. Option 2 seems to > require a new > intrinsic to capture the pre-existing intrinsic tests. > > On the whole, since this Unsafe API point simply exposes > JNIEnv::AllocObject, > I suggest doing the necessary work in library_call.cpp to make the intrinsic > accurately reflect that JNI function. That will make the checks easier > to verify > and maintain. I don't think (AM I REALLY SAYING THIS?) the Java-based > checks > help much in this particular case. > > ? John From vladimir.kozlov at oracle.com Wed Apr 13 21:02:07 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 14:02:07 -0700 Subject: [9] RFR(S): 8154073: Several compiler tests fail when are executed with C1 only In-Reply-To: <570E08ED.4010207@oracle.com> References: <570E08ED.4010207@oracle.com> Message-ID: <570EB3CF.5010408@oracle.com> Looks good. Thanks, Vladimir On 4/13/16 1:53 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8154073 > http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/ > > TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests. > > TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message. > > Tested with RBT (running). > > Thanks, > Tobias > From christian.thalinger at oracle.com Wed Apr 13 21:08:02 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 13 Apr 2016 11:08:02 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: > On Apr 12, 2016, at 8:26 PM, Berg, Michael C wrote: > > <>Hi Folks, > > I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. > This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. > Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } > > Thanks, > Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 13 21:34:49 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 14:34:49 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570E42B2.2090306@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> Message-ID: <570EBB79.7060805@oracle.com> Very nice, I like it. One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. Thanks, Vladimir On 4/13/16 5:59 AM, Nils Eliasson wrote: > Hi, > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ > > Summary > Introduced an enum CompileReason with members matching all the old > variants, and a table containing all the unchanged strings. I see the > possibility of removing/changing/simplifying some CompileReasons but > have choosen not to do so in this change. > > Only new logic is the CompileTask::can_become_stale() method. > > Testing: > Running Testset hotspot on all platforms and hotspot_all on one platform > > Regards, > Nils Eliawsson > > On 2016-04-12 18:55, Vladimir Kozlov wrote: >> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>> Tasks get evicted from the compile_queue if their invocation counter >>> hasn't increased during TieredCompileTaskTimeout. >>> (AdvancedThresholdPolicy::is_stale(...)). >>> >>> I'll do a proper fix, it is the right thing to do and should be pretty >>> quick. I'll change the comment to an enum that represent who submitted >>> the compile, and add a table for the comments. This could be useful in >>> other settings to. >> >> Sounds good. >> >> Thanks, >> Vladimir >> >>> >>> Regards, >>> Nils >>> >>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>> What do you mean "stale"? >>>> I would prefer to see the real fix as you suggested to avoid removing >>>> WB comp tasks from queue. Adding timeout is not reliable. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this small fix of the BlockingCompilation test. >>>>> >>>>> Summary: >>>>> Add method enqueued for compilation with WB API may be removed from >>>>> the compile queue as stale. >>>>> >>>>> Solution: >>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>> stale while the test is running. (Also added some extra >>>>> checks that may spare us from waiting until timeout for failing.) >>>>> >>>>> This is an workaround but we should consider fixing something >>>>> permanent for WB API compiles - like tagging the compile >>>>> task with info about the origin of the compile. The comment field has >>>>> this information - but then it needs to be >>>>> converted to an enum. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>> >>>>> Best regards, >>>>> Nils Eliasson >>>>> >>>>> >>>>> >>>>> >>> > From michael.c.berg at intel.com Wed Apr 13 21:35:39 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 13 Apr 2016 21:35:39 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: See below for context. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Wednesday, April 13, 2016 2:08 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Ok, that?s easy enough. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 13 21:52:36 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 14:52:36 -0700 Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error when invoking nonexistent method In-Reply-To: <570E6D70.40904@oracle.com> References: <570E6D70.40904@oracle.com> Message-ID: <570EBFA4.4060005@oracle.com> Looks good to me. ciEnv.cpp cahnges looks empty. If it is only spacing changes we don't need to include them into bug fix. thanks, Vladimir K On 4/13/16 9:01 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8154172 > > C1 unconditionally inserts null check before doing a call, even if it > throws an error during linkage. It contradicts JVMS which requires that > linking errors precede run-time errors. > > The fix is to detect non-resolvable cases and avoid null checks / > profiling altogether letting the runtime to throw a linkage error. > > Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck). > > Some clarifications: > > - klass->is_loaded() && !target->is_loaded() is true when method > resolution fails; > > - static vs non-static checks aren't needed because > stream()->get_method already returns unloaded method in such case; > > Thanks! > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Thu Apr 14 00:40:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Apr 2016 17:40:44 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: <570EE70C.8000906@oracle.com> Hi Michael, Please, split changes. _rex_vex_w_reverted (and other assembler) changes can be pushed first. evmovdqul -> evmovdquq and Vectors element_size() changes could be pushed separately too. You don't need MachMskNode place holder methods in other platforms .ad. I think Matcher::has_predicated_vectors() will be enough since MachMskNode is generated only when has_predicated_vectors() is true. This is how we usually do. macroAssembler_x86.cpp Why you use table and not instructions to generate mask value? Looking on table it very easy to generate (you would need additional instruction but it is better than load from memory I think): (1 << src) - 1 src == 0 could be treated specially. You can leave the table as comment to see which values are expected. x86.ad names should be consistent: MaskCreateINode -> CreateMaskINode, set_mask -> createMask. You can also use Matcher::has_predicated_vectors() in predicate: +instruct createMask(rRegI dst, rRegI src) %{ + predicate(Matcher::has_predicated_vectors()); + match(Set dst (CreateMaskI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} May be it should setMask as reverse to restoreMask. And more precisely setvectmask/restorevectmask. MaskCreateINode or SetVectMaskINode should be defined in vector.hpp and not in subnode.hpp. block.cpp Matcher::has_predicated_vectors() should be checked with if (found_fixup_loops) to avoid useless looping. I don't like how you inject MachMskNode. It should be generated on exit from loop where you created MaskCreateINode. Will need additional review after you clean up above comments. Thanks, Vladimir On 4/12/16 11:26 PM, Berg, Michael C wrote: > Hi Folks, > > I would like to contribute Programmable SIMD as implemented on > multi-versioned post loops. See: > https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of > the implementation. > > This component delivers mask programmed post loops which execute in a > single iteration in place of fixup scalar loops which used to take many > iterations to complete work for user loops. > > Currently I have enabled this optimization for x86 only, specifically > for machines with masked data predication implemented as per fully > enabled EVEX targets. It delivers up to 2x performance and has been > modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows(see jbs entry below): > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ > > Thanks, > > Michael > From tobias.hartmann at oracle.com Thu Apr 14 06:31:10 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Apr 2016 08:31:10 +0200 Subject: [9] RFR(S): 8154073: Several compiler tests fail when are executed with C1 only In-Reply-To: <570EB3CF.5010408@oracle.com> References: <570E08ED.4010207@oracle.com> <570EB3CF.5010408@oracle.com> Message-ID: <570F392E.40608@oracle.com> Thanks, Vladimir! Best regards, Tobias On 13.04.2016 23:02, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/13/16 1:53 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8154073 >> http://cr.openjdk.java.net/~thartmann/8154073/webrev.00/ >> >> TestArrayCopyNoInitDeopt and TestExplicitRangeChecks fail with -XX:TieredStopAtLevel=1 because they expect methods to be compiled with C2. I added the corresponding checks to the tests. >> >> TieredLevelsTest causes the VM to crash with assert "heap is null" in CodeCache::allocate() because a compilation at level 2 is triggered via the Whitebox API and the corresponding code heap for profiled nmethods is not available with -XX:TieredStopAtLevel=2. I added a check to WhiteBox::compile_method() and also added an assert to CompileBroker::compile_method() to get a more meaningful error message. >> >> Tested with RBT (running). >> >> Thanks, >> Tobias >> From rwestrel at redhat.com Thu Apr 14 06:46:49 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 14 Apr 2016 08:46:49 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body Message-ID: When running scimark on aarch64: ;; B16: # B17 <- B21 top-of-loop Freq: 2305.21 0x000003ffa126f710: add w17, w11, w12 ;*iadd {reexecute=0 rethrow=0 return_oop=0} ; - jnt.scimark2.FFT::transform_internal at 243 (line 129) 0x000003ffa126f714: nop 0x000003ffa126f718: nop 0x000003ffa126f71c: nop ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0} ; - jnt.scimark2.FFT::transform_internal at 238 (line 129) ;; B17: # B32 B18 <- B25 B16 Loop: B17-B16 inner Freq: 3056.06 0x000003ffa126f720: lsl w16, w17, #1 ;*imul {reexecute=0 rethrow=0 return_oop=0} ; - jnt.scimark2.FFT::transform_internal at 244 (line 129) The 3 nops are added by the code that aligns loop entries: the top of loop block is first encountered and its alignment is set, the loop head is later encountered through the backbranch of an outer loop and its alignment is set. I propose that the code that aligns loop entries verifies that a loop top doesn't exist before it sets the alignment: http://cr.openjdk.java.net/~roland/8154135/webrev.00/ Roland. From igor.veresov at oracle.com Thu Apr 14 07:01:28 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 14 Apr 2016 00:01:28 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570E42B2.2090306@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> Message-ID: Sorry for nitpicking, but can?t compile_reason argument be of type CompileReason instead of int everywhere? It?d be also nice to place reason_name close to the enum. igor > On Apr 13, 2016, at 5:59 AM, Nils Eliasson wrote: > > Hi, > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ > > Summary > Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change. > > Only new logic is the CompileTask::can_become_stale() method. > > Testing: > Running Testset hotspot on all platforms and hotspot_all on one platform > > Regards, > Nils Eliawsson > > On 2016-04-12 18:55, Vladimir Kozlov wrote: >> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>> Tasks get evicted from the compile_queue if their invocation counter >>> hasn't increased during TieredCompileTaskTimeout. >>> (AdvancedThresholdPolicy::is_stale(...)). >>> >>> I'll do a proper fix, it is the right thing to do and should be pretty >>> quick. I'll change the comment to an enum that represent who submitted >>> the compile, and add a table for the comments. This could be useful in >>> other settings to. >> >> Sounds good. >> >> Thanks, >> Vladimir >> >>> >>> Regards, >>> Nils >>> >>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>> What do you mean "stale"? >>>> I would prefer to see the real fix as you suggested to avoid removing >>>> WB comp tasks from queue. Adding timeout is not reliable. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this small fix of the BlockingCompilation test. >>>>> >>>>> Summary: >>>>> Add method enqueued for compilation with WB API may be removed from >>>>> the compile queue as stale. >>>>> >>>>> Solution: >>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>> stale while the test is running. (Also added some extra >>>>> checks that may spare us from waiting until timeout for failing.) >>>>> >>>>> This is an workaround but we should consider fixing something >>>>> permanent for WB API compiles - like tagging the compile >>>>> task with info about the origin of the compile. The comment field has >>>>> this information - but then it needs to be >>>>> converted to an enum. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>> >>>>> Best regards, >>>>> Nils Eliasson >>>>> >>>>> >>>>> >>>>> >>> > From zoltan.majo at oracle.com Thu Apr 14 11:47:28 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 13:47:28 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap Message-ID: <570F8350.5080209@oracle.com> Hi, please review the patch for 8151708. https://bugs.openjdk.java.net/browse/JDK-8151708 Problem: On solaris_sparc, the VM can set the TLAB's top pointer to a value past the end of the Java heap. The problem appears with large values of MinTLABSize.The reason for the problem is that the 'brcs' instruction at http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3260 http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3265 checks the condition codes in 'icc' (32-bit), but not in 'xcc' (64-bit). Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. Webrev: http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ Testing: - JPRT - reproducer on solaris_sparc. Thank you! Best regards, Zoltan From tobias.hartmann at oracle.com Thu Apr 14 12:15:19 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Apr 2016 14:15:19 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8350.5080209@oracle.com> References: <570F8350.5080209@oracle.com> Message-ID: <570F89D7.7080809@oracle.com> Hi Zoltan, On 14.04.2016 13:47, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8151708. > > https://bugs.openjdk.java.net/browse/JDK-8151708 > > Problem: On solaris_sparc, the VM can set the TLAB's top pointer to a value past the end of the Java heap. The problem appears with large values of MinTLABSize.The reason for the problem is that the 'brcs' instruction at > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3260 > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/cpu/sparc/vm/macroAssembler_sparc.cpp#l3265 > > checks the condition codes in 'icc' (32-bit), but not in 'xcc' (64-bit). I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. Best regards, Tobias > Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ > > Testing: > - JPRT > - reproducer on solaris_sparc. > > Thank you! > > Best regards, > > > Zoltan > From zoltan.majo at oracle.com Thu Apr 14 12:26:55 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 14:26:55 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F89D7.7080809@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> Message-ID: <570F8C8F.4080003@oracle.com> Hi Tobias, thank you for the feedback! On 04/14/2016 02:15 PM, Tobias Hartmann wrote: > [...] > I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. Yes, that simplifies the code a bit. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ Tests are running. Thank you! Best regards, Zoltan > > Best regards, > Tobias > >> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >> >> Testing: >> - JPRT >> - reproducer on solaris_sparc. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> From nils.eliasson at oracle.com Thu Apr 14 12:32:47 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Apr 2016 14:32:47 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> Message-ID: <570F8DEF.7000504@oracle.com> Yes, good feedback - New webrev including your and Vladimirs suggestions: http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ Thanks for having a look! Nils On 2016-04-14 09:01, Igor Veresov wrote: > Sorry for nitpicking, but can?t compile_reason argument be of type CompileReason instead of int everywhere? It?d be also nice to place reason_name close to the enum. > > igor > > >> On Apr 13, 2016, at 5:59 AM, Nils Eliasson wrote: >> >> Hi, >> >> New webrev: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >> >> Summary >> Introduced an enum CompileReason with members matching all the old variants, and a table containing all the unchanged strings. I see the possibility of removing/changing/simplifying some CompileReasons but have choosen not to do so in this change. >> >> Only new logic is the CompileTask::can_become_stale() method. >> >> Testing: >> Running Testset hotspot on all platforms and hotspot_all on one platform >> >> Regards, >> Nils Eliawsson >> >> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>> Tasks get evicted from the compile_queue if their invocation counter >>>> hasn't increased during TieredCompileTaskTimeout. >>>> (AdvancedThresholdPolicy::is_stale(...)). >>>> >>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>> quick. I'll change the comment to an enum that represent who submitted >>>> the compile, and add a table for the comments. This could be useful in >>>> other settings to. >>> Sounds good. >>> >>> Thanks, >>> Vladimir >>> >>>> Regards, >>>> Nils >>>> >>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>> What do you mean "stale"? >>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this small fix of the BlockingCompilation test. >>>>>> >>>>>> Summary: >>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>> the compile queue as stale. >>>>>> >>>>>> Solution: >>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>> stale while the test is running. (Also added some extra >>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>> >>>>>> This is an workaround but we should consider fixing something >>>>>> permanent for WB API compiles - like tagging the compile >>>>>> task with info about the origin of the compile. The comment field has >>>>>> this information - but then it needs to be >>>>>> converted to an enum. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>> >>>>>> Best regards, >>>>>> Nils Eliasson >>>>>> >>>>>> >>>>>> >>>>>> From tobias.hartmann at oracle.com Thu Apr 14 12:33:48 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Apr 2016 14:33:48 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8C8F.4080003@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> Message-ID: <570F8E2C.20605@oracle.com> Hi Zoltan, On 14.04.2016 14:26, Zolt?n Maj? wrote: > Hi Tobias, > > > thank you for the feedback! > > On 04/14/2016 02:15 PM, Tobias Hartmann wrote: >> [...] >> I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. > > Yes, that simplifies the code a bit. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ Looks good! Best regards, Tobias > > Tests are running. > > Thank you! > > Best regards, > > > Zoltan > >> >> Best regards, >> Tobias >> >>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>> >>> Testing: >>> - JPRT >>> - reproducer on solaris_sparc. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> > From nils.eliasson at oracle.com Thu Apr 14 12:43:06 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Apr 2016 14:43:06 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570EBB79.7060805@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> Message-ID: <570F905A.4050202@oracle.com> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ Thanks! Nils On 2016-04-13 23:34, Vladimir Kozlov wrote: > Very nice, I like it. > > One note. CompileReason (and its names) should be CompileTask class > where it is recorded. Then CompileTask::can_become_stale() can be in > header file so it is inlinined on all platforms. > > Thanks, > Vladimir > > On 4/13/16 5:59 AM, Nils Eliasson wrote: >> Hi, >> >> New webrev: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >> >> Summary >> Introduced an enum CompileReason with members matching all the old >> variants, and a table containing all the unchanged strings. I see the >> possibility of removing/changing/simplifying some CompileReasons but >> have choosen not to do so in this change. >> >> Only new logic is the CompileTask::can_become_stale() method. >> >> Testing: >> Running Testset hotspot on all platforms and hotspot_all on one platform >> >> Regards, >> Nils Eliawsson >> >> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>> Tasks get evicted from the compile_queue if their invocation counter >>>> hasn't increased during TieredCompileTaskTimeout. >>>> (AdvancedThresholdPolicy::is_stale(...)). >>>> >>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>> quick. I'll change the comment to an enum that represent who submitted >>>> the compile, and add a table for the comments. This could be useful in >>>> other settings to. >>> >>> Sounds good. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Regards, >>>> Nils >>>> >>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>> What do you mean "stale"? >>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this small fix of the BlockingCompilation test. >>>>>> >>>>>> Summary: >>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>> the compile queue as stale. >>>>>> >>>>>> Solution: >>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>> stale while the test is running. (Also added some extra >>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>> >>>>>> This is an workaround but we should consider fixing something >>>>>> permanent for WB API compiles - like tagging the compile >>>>>> task with info about the origin of the compile. The comment field >>>>>> has >>>>>> this information - but then it needs to be >>>>>> converted to an enum. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>> >>>>>> Best regards, >>>>>> Nils Eliasson >>>>>> >>>>>> >>>>>> >>>>>> >>>> >> From nils.eliasson at oracle.com Thu Apr 14 13:17:47 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Apr 2016 15:17:47 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" Message-ID: <570F987B.2070202@oracle.com> Hi, Please review this fix. Summary: In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. Solution: We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any logging or warning because this is really a corner case. Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ (Ignore the extra tags in the webrev) Best regards, Nils Eliasson -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Thu Apr 14 13:21:00 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 15:21:00 +0200 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input Message-ID: <570F993C.3040509@oracle.com> Hi, please review the patch for 8153357. https://bugs.openjdk.java.net/browse/JDK-8153357 Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its unique input. http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast node feeds from the unique input. To be able to remove the phi node, the C2 compiler must to determine the type of cast to add in place of the phi node (CastII, CastPP, or CheckCastPP). http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a CheckCastPP instead of a CastPP when casting between two klass pointers). Please find more details about the cause of the failure in the bug description: https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 Solution: Refine C2's logic to determine the type of cast node added. Webrev: http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ Testing: - JPRT; - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs). Thank you and best regards, Zoltan From zoltan.majo at oracle.com Thu Apr 14 13:24:41 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Apr 2016 15:24:41 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8E2C.20605@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> <570F8E2C.20605@oracle.com> Message-ID: <570F9A19.4090500@oracle.com> Hi Tobias, On 04/14/2016 02:33 PM, Tobias Hartmann wrote: > [...] > Looks good! Thank you! For the record: Testing with the reproducer was successful. Best regards, Zoltan > > Best regards, > Tobias > >> Tests are running. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> Best regards, >>> Tobias >>> >>>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>>> >>>> Testing: >>>> - JPRT >>>> - reproducer on solaris_sparc. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> From anton.ivanov at oracle.com Thu Apr 14 13:30:48 2016 From: anton.ivanov at oracle.com (Anton Ivanov) Date: Thu, 14 Apr 2016 16:30:48 +0300 Subject: RFR(XS): 8154174: improve JitTester performance Message-ID: <570F9B88.2040102@oracle.com> Hi, Please review small patch that improves JitTester performance In current implementation JitTester has exception based logic, which is not good by itself, but changing this is quite expensive and there is simple way to decrease exception overhead - turn off stack trace in ProductionFailedException constructor ( this exception is created very often and stack trace is never need, as it only used to control program flow ) Also small improvement was done in code that does deep copy of SymbolTable element ( Map iteration was rewritten to get rid of multiple redundant Map.get() which cost 0(1) only in average case and could be worse potentially ) Testing: local webrev: http://cr.openjdk.java.net/~aaivanov/8154174/webrev bug: https://bugs.openjdk.java.net/browse/JDK-8154174 -- Best regards, Anton Ivanov From michael.c.berg at intel.com Thu Apr 14 15:23:31 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 14 Apr 2016 15:23:31 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: The code has been updated with the change from below: webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.02a/ Regards, Michael From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Wednesday, April 13, 2016 2:36 PM To: Christian Thalinger Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: CR for RFR 8153998 See below for context. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Wednesday, April 13, 2016 2:08 PM To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Ok, that?s easy enough. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Apr 14 16:53:22 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 14 Apr 2016 19:53:22 +0300 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses Message-ID: <570FCB02.6000507@oracle.com> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8134918 Type speculation can produce mismatched unsafe accesses. It injects a guard based on profile data and then propagate type info down to the users. If there's an unsafe access, it can become mismatched w.r.t. profile data being used. It happens even for valid usages. If an unsafe access always matches memory location at runtime, the code produced by type speculation in that case is effectively dead. What cause problems are unsafe OOP accesses (U.putObject()/getObject() on non-OOP locations). The fix is to avoid intrinsification of problematic accesses. Type speculation injects precise type information, which is available during intrinsification. We could try to support mismatched unsafe object accesses instead, but I don't see any value in that. Testing: JPRT, pit-hs-comp (in progress). Thanks! Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Apr 14 16:54:09 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 14 Apr 2016 19:54:09 +0300 Subject: [9] RFR (S): 8154172: C1: NPE is thrown instead of linkage error when invoking nonexistent method In-Reply-To: <570EBFA4.4060005@oracle.com> References: <570E6D70.40904@oracle.com> <570EBFA4.4060005@oracle.com> Message-ID: <570FCB31.5080709@oracle.com> Thanks, Vladimir. Best regards, Vladimir Ivanov On 4/14/16 12:52 AM, Vladimir Kozlov wrote: > Looks good to me. ciEnv.cpp cahnges looks empty. If it is only spacing > changes we don't need to include them into bug fix. > > thanks, > Vladimir K > > On 4/13/16 9:01 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8154172/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8154172 >> >> C1 unconditionally inserts null check before doing a call, even if it >> throws an error during linkage. It contradicts JVMS which requires that >> linking errors precede run-time errors. >> >> The fix is to detect non-resolvable cases and avoid null checks / >> profiling altogether letting the runtime to throw a linkage error. >> >> Testing: regression test, JPRT, RBT (pit-hs-comp.js + jck). >> >> Some clarifications: >> >> - klass->is_loaded() && !target->is_loaded() is true when method >> resolution fails; >> >> - static vs non-static checks aren't needed because >> stream()->get_method already returns unloaded method in such case; >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From christian.thalinger at oracle.com Mon Apr 11 17:59:42 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Apr 2016 07:59:42 -1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Message-ID: [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: > > Dear all: > > Can I please request reviews for the following change? > This change was created for JDK 9 and ppc64. > > Description: > This change adds options of compare-and-exchange for POWER architecture. > As described in atomic_linux_ppc.inline.hpp, the current implementation of > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > general purposes because twice calls of sync before and after cmpxchg will > keep consistency. However, they sometimes cause overheads because > sync instructions are very expensive in the current POWER chip design. > With this change, callers can explicitly specify to run fence and acquire with > two additional bool parameters. Because their default values are "true", > it is not necessary to modify existing cmpxchg calls. > > In addition, with the new parameters of cmpxchg, this change improves > performance of copy_to_survivor in the parallel GC. > copy_to_survivor changes forward pointers by using cmpxchg. This > operation doesn't require any sync instructions, in my understanding. > A pointer is changed at most once in a GC and when cmpxchg fails, > the latest pointer is available for the caller. > > When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly > doesn't support new version format of Java 9), pause time of young GC was > reduced from 10% to 20%. > > Summary of source code changes: > > * src/share/vm/runtime/atomic.hpp > * src/share/vm/runtime/atomic.cpp > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > - Add two arguments of fence and acquire to cmpxchg only for PPC64. > Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, > they are reduced while inlining to callers. > > * src/share/vm/oops/oop.inline.hpp > - Changed cas_set_mark to call cmpxchg without fence and acquire. > cas_set_mark is called only by cas_forward_to that is called only by > copy_to_survivor_space and oop_promotion_failed in > psPromotionManager. > > Code change: > > Please see an attached diff file that was generated with "hg diff -g" > under the latest hotspot directory. > > Passed test: > SPECjbb2013 (customized) > > * I believe some other cmpxchg will be optimized by reducing fence > or acquire because twice calls of sync are too conservative to implement > Java memory model. > > > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64_cmpxchg_opt.diff Type: application/octet-stream Size: 8837 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Thu Apr 14 18:15:53 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 08:15:53 -1000 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: References: Message-ID: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com> > On Apr 13, 2016, at 8:46 PM, Roland Westrelin wrote: > > > When running scimark on aarch64: > > ;; B16: # B17 <- B21 top-of-loop Freq: 2305.21 > > 0x000003ffa126f710: add w17, w11, w12 ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 243 (line 129) > > 0x000003ffa126f714: nop > 0x000003ffa126f718: nop > 0x000003ffa126f71c: nop ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 238 (line 129) > > ;; B17: # B32 B18 <- B25 B16 Loop: B17-B16 inner Freq: 3056.06 > > 0x000003ffa126f720: lsl w16, w17, #1 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 244 (line 129) > > The 3 nops are added by the code that aligns loop entries: the top of > loop block is first encountered and its alignment is set, the loop head > is later encountered through the backbranch of an outer loop and its > alignment is set. > > I propose that the code that aligns loop entries verifies that a loop > top doesn't exist before it sets the alignment: > > http://cr.openjdk.java.net/~roland/8154135/webrev.00/ I wonder if this has any performance implications (good or bad). This alignment is not aarch64 specific so we were doing it all the time. > > Roland. From christian.thalinger at oracle.com Thu Apr 14 18:19:46 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 08:19:46 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: > On Apr 13, 2016, at 11:35 AM, Berg, Michael C wrote: > > See below for context. > > Regards, > Michael > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Wednesday, April 13, 2016 2:08 PM > To: Berg, Michael C > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > > On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: > > <>Hi Folks, > > I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. > This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. > Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ > > +//------------------------------MachMskNode----------------------------------- > +// Machine function Msk Node > +class MachMskNode : public MachIdealNode { > Does ?Msk? mean mask? Then we should call it MachMaskNode. > > Ok, that?s easy enough. > > Also, I don?t quite understand why we have: > +instruct set_mask(rRegI dst, rRegI src) %{ > + predicate(VM_Version::supports_avx512vl()); > + match(Set dst (MaskCreateI src)); > + effect(TEMP dst); > + format %{ "createmsk $dst, $src" %} > + ins_encode %{ > + __ createmsk($dst$$Register, $src$$Register); > + %} > but: > + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { > + MacroAssembler _masm(&cbuf); > + __ restoremsk(); > + } > > The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. > The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. > The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. > The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Hmm. So, there is no way we can have a RestoreMaskINode? > > Thanks, > Michael > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Thu Apr 14 18:44:06 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 14 Apr 2016 18:44:06 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: Christian, There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Thursday, April 14, 2016 11:20 AM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: See below for context. Regards, Michael From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Wednesday, April 13, 2016 2:08 PM To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: Hi Folks, I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. This code was tested as follows (see jbs entry below): Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ +//------------------------------MachMskNode----------------------------------- +// Machine function Msk Node +class MachMskNode : public MachIdealNode { Does ?Msk? mean mask? Then we should call it MachMaskNode. Ok, that?s easy enough. Also, I don?t quite understand why we have: +instruct set_mask(rRegI dst, rRegI src) %{ + predicate(VM_Version::supports_avx512vl()); + match(Set dst (MaskCreateI src)); + effect(TEMP dst); + format %{ "createmsk $dst, $src" %} + ins_encode %{ + __ createmsk($dst$$Register, $src$$Register); + %} but: + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { + MacroAssembler _masm(&cbuf); + __ restoremsk(); + } The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. Hmm. So, there is no way we can have a RestoreMaskINode? Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Thu Apr 14 18:45:59 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 08:45:59 -1000 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570F905A.4050202@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: > On Apr 14, 2016, at 2:43 AM, Nils Eliasson wrote: > > I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. > > It gets verbose in the method declarations in compileBroker Don?t worry about this. > and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. Yes, that?s the right place. > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ + bool can_become_stale() const { + return !_is_blocking && (_compile_reason < Reason_Whitebox); + } I?m not a fan of implicit contracts just defined by comments. This method doesn?t seem to be performance critical so I would suggest to use a switch-case. An attribute on the enum would be much better but we all know this isn?t Java. > > Thanks! > Nils > > On 2016-04-13 23:34, Vladimir Kozlov wrote: >> Very nice, I like it. >> >> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >> >> Thanks, >> Vladimir >> >> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>> >>> Summary >>> Introduced an enum CompileReason with members matching all the old >>> variants, and a table containing all the unchanged strings. I see the >>> possibility of removing/changing/simplifying some CompileReasons but >>> have choosen not to do so in this change. >>> >>> Only new logic is the CompileTask::can_become_stale() method. >>> >>> Testing: >>> Running Testset hotspot on all platforms and hotspot_all on one platform >>> >>> Regards, >>> Nils Eliawsson >>> >>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>> hasn't increased during TieredCompileTaskTimeout. >>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>> >>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>> quick. I'll change the comment to an enum that represent who submitted >>>>> the compile, and add a table for the comments. This could be useful in >>>>> other settings to. >>>> >>>> Sounds good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>> What do you mean "stale"? >>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>> >>>>>>> Summary: >>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>> the compile queue as stale. >>>>>>> >>>>>>> Solution: >>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>> stale while the test is running. (Also added some extra >>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>> >>>>>>> This is an workaround but we should consider fixing something >>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>> task with info about the origin of the compile. The comment field has >>>>>>> this information - but then it needs to be >>>>>>> converted to an enum. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>> >>>>>>> Best regards, >>>>>>> Nils Eliasson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 14 18:57:02 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 11:57:02 -0700 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570F8C8F.4080003@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> Message-ID: <570FE7FE.2000001@oracle.com> Good. Thanks, Vladimir On 4/14/16 5:26 AM, Zolt?n Maj? wrote: > Hi Tobias, > > > thank you for the feedback! > > On 04/14/2016 02:15 PM, Tobias Hartmann wrote: >> [...] >> I would simply replace the 'br' by 'brx' which tests either xcc or icc depending on the architecture. > > Yes, that simplifies the code a bit. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ > > Tests are running. > > Thank you! > > Best regards, > > > Zoltan > >> >> Best regards, >> Tobias >> >>> Solution: As the VM is handling addresses at the above-mentioned locations, the appropriate condition codes are supposed to be checked. Use 'BPcc' instead of 'Bicc' at these locations. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>> >>> Testing: >>> - JPRT >>> - reproducer on solaris_sparc. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> > From vladimir.kozlov at oracle.com Thu Apr 14 19:02:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 12:02:12 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570F905A.4050202@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: <570FE934.5000800@oracle.com> Looks good. Thanks, Vladimir On 4/14/16 5:43 AM, Nils Eliasson wrote: > I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. > > It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is > the keeper of the CompileReason so it makes sense too. > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ > > Thanks! > Nils > > On 2016-04-13 23:34, Vladimir Kozlov wrote: >> Very nice, I like it. >> >> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >> >> Thanks, >> Vladimir >> >> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>> >>> Summary >>> Introduced an enum CompileReason with members matching all the old >>> variants, and a table containing all the unchanged strings. I see the >>> possibility of removing/changing/simplifying some CompileReasons but >>> have choosen not to do so in this change. >>> >>> Only new logic is the CompileTask::can_become_stale() method. >>> >>> Testing: >>> Running Testset hotspot on all platforms and hotspot_all on one platform >>> >>> Regards, >>> Nils Eliawsson >>> >>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>> hasn't increased during TieredCompileTaskTimeout. >>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>> >>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>> quick. I'll change the comment to an enum that represent who submitted >>>>> the compile, and add a table for the comments. This could be useful in >>>>> other settings to. >>>> >>>> Sounds good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>> What do you mean "stale"? >>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>> >>>>>>> Summary: >>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>> the compile queue as stale. >>>>>>> >>>>>>> Solution: >>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>> stale while the test is running. (Also added some extra >>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>> >>>>>>> This is an workaround but we should consider fixing something >>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>> task with info about the origin of the compile. The comment field has >>>>>>> this information - but then it needs to be >>>>>>> converted to an enum. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>> >>>>>>> Best regards, >>>>>>> Nils Eliasson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>> > From igor.veresov at oracle.com Thu Apr 14 22:15:20 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 14 Apr 2016 15:15:20 -0700 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <570F905A.4050202@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: <790C0A6F-8E06-4891-A771-9112606C6812@oracle.com> Looks good. Thanks! igor > On Apr 14, 2016, at 5:43 AM, Nils Eliasson wrote: > > I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. > > It gets verbose in the method declarations in compileBroker and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. > > New webrev: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ > > Thanks! > Nils > > On 2016-04-13 23:34, Vladimir Kozlov wrote: >> Very nice, I like it. >> >> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >> >> Thanks, >> Vladimir >> >> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>> Hi, >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>> >>> Summary >>> Introduced an enum CompileReason with members matching all the old >>> variants, and a table containing all the unchanged strings. I see the >>> possibility of removing/changing/simplifying some CompileReasons but >>> have choosen not to do so in this change. >>> >>> Only new logic is the CompileTask::can_become_stale() method. >>> >>> Testing: >>> Running Testset hotspot on all platforms and hotspot_all on one platform >>> >>> Regards, >>> Nils Eliawsson >>> >>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>> hasn't increased during TieredCompileTaskTimeout. >>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>> >>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>> quick. I'll change the comment to an enum that represent who submitted >>>>> the compile, and add a table for the comments. This could be useful in >>>>> other settings to. >>>> >>>> Sounds good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Regards, >>>>> Nils >>>>> >>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>> What do you mean "stale"? >>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>> >>>>>>> Summary: >>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>> the compile queue as stale. >>>>>>> >>>>>>> Solution: >>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>> stale while the test is running. (Also added some extra >>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>> >>>>>>> This is an workaround but we should consider fixing something >>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>> task with info about the origin of the compile. The comment field has >>>>>>> this information - but then it needs to be >>>>>>> converted to an enum. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>> >>>>>>> Best regards, >>>>>>> Nils Eliasson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>> > From christian.thalinger at oracle.com Thu Apr 14 22:35:03 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Apr 2016 12:35:03 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: Message-ID: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> > On Apr 14, 2016, at 8:44 AM, Berg, Michael C wrote: > > Christian, > > There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. That?s unfortunate but I understand. I?m fine with it then. > > Regards, > Michael > > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Thursday, April 14, 2016 11:20 AM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > > On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: > > See below for context. > > Regards, > Michael > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Wednesday, April 13, 2016 2:08 PM > To: Berg, Michael C > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > > On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: > > <>Hi Folks, > > I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See: https://bugs.openjdk.java.net/browse/JDK-8151573 for the first half of the implementation. > This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. > Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x performance and has been modeled over a large number of loop lengths and forms of loops. > > This code was tested as follows (see jbs entry below): > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8153998 > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ > > +//------------------------------MachMskNode----------------------------------- > +// Machine function Msk Node > +class MachMskNode : public MachIdealNode { > Does ?Msk? mean mask? Then we should call it MachMaskNode. > > Ok, that?s easy enough. > > Also, I don?t quite understand why we have: > +instruct set_mask(rRegI dst, rRegI src) %{ > + predicate(VM_Version::supports_avx512vl()); > + match(Set dst (MaskCreateI src)); > + effect(TEMP dst); > + format %{ "createmsk $dst, $src" %} > + ins_encode %{ > + __ createmsk($dst$$Register, $src$$Register); > + %} > but: > + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { > + MacroAssembler _masm(&cbuf); > + __ restoremsk(); > + } > > The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. > The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. > The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. > The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. > > Hmm. So, there is no way we can have a RestoreMaskINode? > > Thanks, > Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 14 23:12:56 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 16:12:56 -0700 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: References: Message-ID: <571023F8.5090903@oracle.com> I agree with optimization but I am not sure about changes. Can we check only one previous block to be more conservative?: Block* b = prev(targ_block) bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment() && !b->head()->is_Loop() Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be for RISC cpus (with fixed instruction size) we should change them. Thanks, Vladimir On 4/13/16 11:46 PM, Roland Westrelin wrote: > > When running scimark on aarch64: > > ;; B16: # B17 <- B21 top-of-loop Freq: 2305.21 > > 0x000003ffa126f710: add w17, w11, w12 ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 243 (line 129) > > 0x000003ffa126f714: nop > 0x000003ffa126f718: nop > 0x000003ffa126f71c: nop ;*iconst_2 {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 238 (line 129) > > ;; B17: # B32 B18 <- B25 B16 Loop: B17-B16 inner Freq: 3056.06 > > 0x000003ffa126f720: lsl w16, w17, #1 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - jnt.scimark2.FFT::transform_internal at 244 (line 129) > > The 3 nops are added by the code that aligns loop entries: the top of > loop block is first encountered and its alignment is set, the loop head > is later encountered through the backbranch of an outer loop and its > alignment is set. > > I propose that the code that aligns loop entries verifies that a loop > top doesn't exist before it sets the alignment: > > http://cr.openjdk.java.net/~roland/8154135/webrev.00/ > > Roland. > From vladimir.kozlov at oracle.com Thu Apr 14 23:26:59 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 16:26:59 -0700 Subject: CR for RFR 8153998 In-Reply-To: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> Message-ID: <57102743.8080508@oracle.com> On 4/14/16 3:35 PM, Christian Thalinger wrote: > >> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >> >> Christian, >> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. > > That?s unfortunate but I understand. I?m fine with it then. You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. Vladimir > >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Thursday, April 14, 2016 11:20 AM >> *To:*Berg, Michael C > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >> See below for context. >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Wednesday, April 13, 2016 2:08 PM >> *To:*Berg, Michael C > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >> Hi Folks, >> >> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >> performance and has been modeled over a large number of loop lengths and forms of loops. >> This code was tested as follows(see jbs entry below): >> >> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >> >> webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >> >> +//------------------------------MachMskNode----------------------------------- >> >> +// Machine function Msk Node >> >> +class MachMskNode : public MachIdealNode { >> >> Does ?Msk? mean mask? Then we should call it MachMaskNode. >> Ok, that?s easy enough. >> Also, I don?t quite understand why we have: >> >> +instruct set_mask(rRegI dst, rRegI src) %{ >> >> + predicate(VM_Version::supports_avx512vl()); >> >> + match(Set dst (MaskCreateI src)); >> >> + effect(TEMP dst); >> >> + format %{ "createmsk $dst, $src" %} >> >> + ins_encode %{ >> >> + __ createmsk($dst$$Register, $src$$Register); >> >> + %} >> >> but: >> >> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const { >> >> + MacroAssembler _masm(&cbuf); >> >> + __ restoremsk(); >> >> + } >> >> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >> >> Hmm. So, there is no way we can have a RestoreMaskINode? >> >> Thanks, >> Michael > From michael.c.berg at intel.com Thu Apr 14 23:38:48 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 14 Apr 2016 23:38:48 +0000 Subject: CR for RFR 8153998 In-Reply-To: <57102743.8080508@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> Message-ID: Vladimir, Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. I tried something like that early on with CountedLoopEnd. The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 4:27 PM To: Christian Thalinger ; Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 3:35 PM, Christian Thalinger wrote: > >> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >> >> Christian, >> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. > > That?s unfortunate but I understand. I?m fine with it then. You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. Vladimir > >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >> > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >> See below for context. >> Regards, >> Michael >> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >> *Sent:*Wednesday, April 13, 2016 2:08 PM >> *To:*Berg, Michael C > >> *Cc:*hotspot-compiler-dev at openjdk.java.net >> *Subject:*Re: CR for RFR 8153998 >> >> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >> Hi Folks, >> >> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >> performance and has been modeled over a large number of loop lengths and forms of loops. >> This code was tested as follows(see jbs entry below): >> >> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >> >> webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >> >> >> +//------------------------------MachMskNode------------------------- >> ---------- >> >> +// Machine function Msk Node >> >> +class MachMskNode : public MachIdealNode { >> >> Does ?Msk? mean mask? Then we should call it MachMaskNode. >> Ok, that?s easy enough. >> Also, I don?t quite understand why we have: >> >> +instruct set_mask(rRegI dst, rRegI src) %{ >> >> + predicate(VM_Version::supports_avx512vl()); >> >> + match(Set dst (MaskCreateI src)); >> >> + effect(TEMP dst); >> >> + format %{ "createmsk $dst, $src" %} >> >> + ins_encode %{ >> >> + __ createmsk($dst$$Register, $src$$Register); >> >> + %} >> >> but: >> >> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const >> { >> >> + MacroAssembler _masm(&cbuf); >> >> + __ restoremsk(); >> >> + } >> >> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >> >> Hmm. So, there is no way we can have a RestoreMaskINode? >> >> Thanks, >> Michael > From vladimir.kozlov at oracle.com Thu Apr 14 23:41:39 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 16:41:39 -0700 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <570F987B.2070202@oracle.com> References: <570F987B.2070202@oracle.com> Message-ID: <57102AB3.30709@oracle.com> I agree with this simple change as the fix. Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. I don't see a PIT link in the bug report. Thanks, Vladimir On 4/14/16 6:17 AM, Nils Eliasson wrote: > Hi, > > Please review this fix. > > Summary: > In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. > > Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some > essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. > > Solution: > We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any > logging or warning because this is really a corner case. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 > Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ > (Ignore the extra tags in the webrev) > > Best regards, > Nils Eliasson From vladimir.kozlov at oracle.com Fri Apr 15 00:02:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 17:02:05 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> Message-ID: <57102F7D.2090303@oracle.com> On 4/14/16 4:38 PM, Berg, Michael C wrote: > Vladimir, > > Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. > > I tried something like that early on with CountedLoopEnd. In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). I don't see any side effects for restoremask in your code. What are you talking about? I am suggesting something like next: instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ predicate(n->has_vect_mask_set()); match(CountedLoopEnd cop cr); effect(USE labl); ins_cost(400); format %{ "j$cop $labl\t# loop end\n\t" "restoremask \t# vector mask restore for loops" %} ins_encode %{ Label* L = $labl$$label; __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump __ restoremask(); %} ins_pipe(pipe_jcc); %} Vladimir > The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 4:27 PM > To: Christian Thalinger ; Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 3:35 PM, Christian Thalinger wrote: >> >>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>> >>> Christian, >>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >> >> That?s unfortunate but I understand. I?m fine with it then. > > You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. > > Vladimir > >> >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>> > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>> See below for context. >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>> *To:*Berg, Michael C > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>> Hi Folks, >>> >>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>> performance and has been modeled over a large number of loop lengths and forms of loops. >>> This code was tested as follows(see jbs entry below): >>> >>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>> >>> webrev: >>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>> >>> >>> +//------------------------------MachMskNode------------------------- >>> ---------- >>> >>> +// Machine function Msk Node >>> >>> +class MachMskNode : public MachIdealNode { >>> >>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>> Ok, that?s easy enough. >>> Also, I don?t quite understand why we have: >>> >>> +instruct set_mask(rRegI dst, rRegI src) %{ >>> >>> + predicate(VM_Version::supports_avx512vl()); >>> >>> + match(Set dst (MaskCreateI src)); >>> >>> + effect(TEMP dst); >>> >>> + format %{ "createmsk $dst, $src" %} >>> >>> + ins_encode %{ >>> >>> + __ createmsk($dst$$Register, $src$$Register); >>> >>> + %} >>> >>> but: >>> >>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) const >>> { >>> >>> + MacroAssembler _masm(&cbuf); >>> >>> + __ restoremsk(); >>> >>> + } >>> >>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>> >>> Hmm. So, there is no way we can have a RestoreMaskINode? >>> >>> Thanks, >>> Michael >> From michael.c.berg at intel.com Fri Apr 15 00:12:30 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 15 Apr 2016 00:12:30 +0000 Subject: CR for RFR 8153998 In-Reply-To: <57102F7D.2090303@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> Message-ID: The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. Ok, I will try the pattern match method. Thanks -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:02 PM To: Berg, Michael C ; Christian Thalinger Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 4:38 PM, Berg, Michael C wrote: > Vladimir, > > Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. > > I tried something like that early on with CountedLoopEnd. In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). I don't see any side effects for restoremask in your code. What are you talking about? I am suggesting something like next: instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ predicate(n->has_vect_mask_set()); match(CountedLoopEnd cop cr); effect(USE labl); ins_cost(400); format %{ "j$cop $labl\t# loop end\n\t" "restoremask \t# vector mask restore for loops" %} ins_encode %{ Label* L = $labl$$label; __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump __ restoremask(); %} ins_pipe(pipe_jcc); %} Vladimir > The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 4:27 PM > To: Christian Thalinger ; Berg, > Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 3:35 PM, Christian Thalinger wrote: >> >>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>> >>> Christian, >>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >> >> That?s unfortunate but I understand. I?m fine with it then. > > You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. > > Vladimir > >> >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>> > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>> See below for context. >>> Regards, >>> Michael >>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>> *To:*Berg, Michael C > >>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>> *Subject:*Re: CR for RFR 8153998 >>> >>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>> Hi Folks, >>> >>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>> performance and has been modeled over a large number of loop lengths and forms of loops. >>> This code was tested as follows(see jbs entry below): >>> >>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>> >>> webrev: >>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>> >>> >>> +//------------------------------MachMskNode------------------------ >>> +- >>> ---------- >>> >>> +// Machine function Msk Node >>> >>> +class MachMskNode : public MachIdealNode { >>> >>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>> Ok, that?s easy enough. >>> Also, I don?t quite understand why we have: >>> >>> +instruct set_mask(rRegI dst, rRegI src) %{ >>> >>> + predicate(VM_Version::supports_avx512vl()); >>> >>> + match(Set dst (MaskCreateI src)); >>> >>> + effect(TEMP dst); >>> >>> + format %{ "createmsk $dst, $src" %} >>> >>> + ins_encode %{ >>> >>> + __ createmsk($dst$$Register, $src$$Register); >>> >>> + %} >>> >>> but: >>> >>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>> const { >>> >>> + MacroAssembler _masm(&cbuf); >>> >>> + __ restoremsk(); >>> >>> + } >>> >>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>> >>> Hmm. So, there is no way we can have a RestoreMaskINode? >>> >>> Thanks, >>> Michael >> From vladimir.kozlov at oracle.com Fri Apr 15 00:46:52 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 17:46:52 -0700 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <570F993C.3040509@oracle.com> References: <570F993C.3040509@oracle.com> Message-ID: <571039FC.1000603@oracle.com> I think check should use !isa_oopptr() since one of nodes could be ConP NULL ptr which is not klassptr. Thanks, Vladimir On 4/14/16 6:21 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8153357. > > https://bugs.openjdk.java.net/browse/JDK-8153357 > > Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its unique input. > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 > > Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast node feeds from the unique input. > > To be able to remove the phi node, the C2 compiler must to determine the type of cast to add in place of the phi node (CastII, CastPP, or CheckCastPP). > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 > > The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a CheckCastPP instead of a CastPP when casting between two klass pointers). > > Please find more details about the cause of the failure in the bug description: > https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 > > > Solution: Refine C2's logic to determine the type of cast node added. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ > > Testing: > - JPRT; > - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); > - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs). > > Thank you and best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Fri Apr 15 00:51:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 17:51:44 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> Message-ID: <57103B20.1040207@oracle.com> On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode------------------------ >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From michael.c.berg at intel.com Fri Apr 15 00:54:01 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 15 Apr 2016 00:54:01 +0000 Subject: CR for RFR 8153998 In-Reply-To: <57103B20.1040207@oracle.com> References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. It will be clean when next you see the code. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:52 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode----------------------- >>>> +- >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From vladimir.kozlov at oracle.com Fri Apr 15 01:12:18 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 18:12:18 -0700 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses In-Reply-To: <570FCB02.6000507@oracle.com> References: <570FCB02.6000507@oracle.com> Message-ID: <57103FF2.5060907@oracle.com> Next assert should be at the beginning of method: + assert(type != T_OBJECT || !unaligned, "unaligned access not supported with object type"); Fix Copyright year in the test. There is no PIT link in the bug report. Thanks, Vladimir On 4/14/16 9:53 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8134918 > > Type speculation can produce mismatched unsafe accesses. > > It injects a guard based on profile data and then propagate type info down to the users. If there's an unsafe access, it can become mismatched w.r.t. profile data being used. > > It happens even for valid usages. If an unsafe access always matches memory location at runtime, the code produced by type speculation in that case is effectively dead. > > What cause problems are unsafe OOP accesses (U.putObject()/getObject() on non-OOP locations). > > The fix is to avoid intrinsification of problematic accesses. Type speculation injects precise type information, which is available during intrinsification. > > We could try to support mismatched unsafe object accesses instead, but I don't see any value in that. > > Testing: JPRT, pit-hs-comp (in progress). > > Thanks! > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Fri Apr 15 01:44:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Apr 2016 18:44:44 -0700 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> Message-ID: <5710478C.8050200@oracle.com> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results. Thanks, Vladimir On 4/12/16 2:45 AM, Doerr, Martin wrote: > Hi, > > I think we have come to a common understanding and there was no complaint about my latest webrev: > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ > > Can I consider it reviewed? > Can somebody sponsor, please? > > Thanks and best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Donnerstag, 7. April 2016 12:52 > To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe > > Hi Andrew, Jamsheed and all, > > thank you very much for your input. > > As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). > Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also correct). > > My change still contains a releasing store for newly created ExceptionCache instances. > As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce additional false negatives on weak memory model platforms. > I think having the release doesn't hurt too much and makes the design a little cleaner. > > I also added comments based on your input. > > The new webrev is here: > http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ > > Please review. I will also need a sponsor from Oracle, please. > > Thanks again and best regards, > Martin > > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Donnerstag, 7. April 2016 12:14 > To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe > > On 07/04/16 10:08, Doerr, Martin wrote: > >> atomic update for the _count would only be required if there were >> multiply threads which attempt to increment it >> concurrently. However, updates are under lock, so we only have >> concurrent readers which is ok. >> >> I still think "volatile" does what we need here. Especially the xlC >> compiler on AIX tends to reload variables from memory. Exactly this >> can be prevented by making the field volatile. > > I think your latest patch is OK. Whether volatile is really good > enough, I don't know. The new(ish) C++ memory model treats this as a > race, and therefore undefined behaviour. Old C++ didn't have a memory > model, so the best we can do with racy code is guess about what our > compilers might do. > > I certainly much prefer a release_store to the storestore fence used > in the fix for 8143897. > > Andrew. > From zoltan.majo at oracle.com Fri Apr 15 06:46:20 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Apr 2016 08:46:20 +0200 Subject: [9] RFR (XS): 8151708: C1 FastTLABRefill can allocate TLABs past the end of the heap In-Reply-To: <570FE7FE.2000001@oracle.com> References: <570F8350.5080209@oracle.com> <570F89D7.7080809@oracle.com> <570F8C8F.4080003@oracle.com> <570FE7FE.2000001@oracle.com> Message-ID: <57108E3C.4030209@oracle.com> Thank you, Vladimir and Tobias, for the review! Best regards, Zoltan On 04/14/2016 08:57 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 4/14/16 5:26 AM, Zolt?n Maj? wrote: >> Hi Tobias, >> >> >> thank you for the feedback! >> >> On 04/14/2016 02:15 PM, Tobias Hartmann wrote: >>> [...] >>> I would simply replace the 'br' by 'brx' which tests either xcc or >>> icc depending on the architecture. >> >> Yes, that simplifies the code a bit. Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8151708/webrev.01/ >> >> Tests are running. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Best regards, >>> Tobias >>> >>>> Solution: As the VM is handling addresses at the above-mentioned >>>> locations, the appropriate condition codes are supposed to be >>>> checked. Use 'BPcc' instead of 'Bicc' at these locations. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8151708/webrev.00/ >>>> >>>> Testing: >>>> - JPRT >>>> - reproducer on solaris_sparc. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >> From jamsheed.c.m at oracle.com Fri Apr 15 07:44:34 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 15 Apr 2016 13:14:34 +0530 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <5710478C.8050200@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> <5710478C.8050200@oracle.com> Message-ID: <57109BE2.1090602@oracle.com> Hi Vladimir, PIT testing is in progress, link is available in bug report. Best Regards, Jamsheed On 4/15/2016 7:14 AM, Vladimir Kozlov wrote: > Looks fine to me. Jamsheed, please, run our PIT testing with these > changes and analyze results. > > Thanks, > Vladimir > > On 4/12/16 2:45 AM, Doerr, Martin wrote: >> Hi, >> >> I think we have come to a common understanding and there was no >> complaint about my latest webrev: >> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >> >> Can I consider it reviewed? >> Can somebody sponsor, please? >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >> Doerr, Martin >> Sent: Donnerstag, 7. April 2016 12:52 >> To: Andrew Haley ; Jamsheed C m >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR(S): 8153267: nmethod's exception cache not >> multi-thread safe >> >> Hi Andrew, Jamsheed and all, >> >> thank you very much for your input. >> >> As Andrew, Jamsheed and I think, it's better to have a releasing >> store in increment_count(). >> Therefore, I have replaced the storestore barrier introduced with >> JDK-8143897 (even though this barrier was also correct). >> >> My change still contains a releasing store for newly created >> ExceptionCache instances. >> As Jamsheed has pointed out, this should not be strictly required as >> we have the other barrier. It may only produce additional false >> negatives on weak memory model platforms. >> I think having the release doesn't hurt too much and makes the design >> a little cleaner. >> >> I also added comments based on your input. >> >> The new webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >> >> Please review. I will also need a sponsor from Oracle, please. >> >> Thanks again and best regards, >> Martin >> >> >> -----Original Message----- >> From: Andrew Haley [mailto:aph at redhat.com] >> Sent: Donnerstag, 7. April 2016 12:14 >> To: Doerr, Martin ; Jamsheed C m >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): 8153267: nmethod's exception cache not >> multi-thread safe >> >> On 07/04/16 10:08, Doerr, Martin wrote: >> >>> atomic update for the _count would only be required if there were >>> multiply threads which attempt to increment it >>> concurrently. However, updates are under lock, so we only have >>> concurrent readers which is ok. >>> >>> I still think "volatile" does what we need here. Especially the xlC >>> compiler on AIX tends to reload variables from memory. Exactly this >>> can be prevented by making the field volatile. >> >> I think your latest patch is OK. Whether volatile is really good >> enough, I don't know. The new(ish) C++ memory model treats this as a >> race, and therefore undefined behaviour. Old C++ didn't have a memory >> model, so the best we can do with racy code is guess about what our >> compilers might do. >> >> I certainly much prefer a release_store to the storestore fence used >> in the fix for 8143897. >> >> Andrew. >> From michael.c.berg at intel.com Fri Apr 15 09:04:25 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 15 Apr 2016 09:04:25 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Vladimir, the code has been updated and is available at: webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ Thanks, Michael -----Original Message----- From: Berg, Michael C Sent: Thursday, April 14, 2016 5:54 PM To: Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: CR for RFR 8153998 Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. It will be clean when next you see the code. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:52 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode----------------------- >>>> +- >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From nils.eliasson at oracle.com Fri Apr 15 09:39:41 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 11:39:41 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <57102AB3.30709@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> Message-ID: <5710B6DD.9090009@oracle.com> Thanks Vladimir! On 2016-04-15 01:41, Vladimir Kozlov wrote: > I agree with this simple change as the fix. > Note, -Xcomp does not switch off Interpreter (we can run without > Interpreter). We use !UseInterpreter as indication if Xcomp was used. > I don't see a PIT link in the bug report. There was none, Tobias found this regression testing something else. Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ Regards, Nils > > Thanks, > Vladimir > > On 4/14/16 6:17 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this fix. >> >> Summary: >> In JDK-8150646 I added an assert in compile_method that the compiler >> must not be NULL. Before there was a return there that just ignored >> the compile. >> >> Running the VM with the flag combination -Xcomp and >> -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter >> is set to false (but the interpreter it is still available) and then >> some >> essential methods are forced to be compiled, but the initial >> complevel becomes 0 and hits the assert in compileBroker. >> >> Solution: >> We could discuss if it should be allowed to submit compiles on level >> 0, a change that would become a bit larger. This time I choose to >> extend the _initalized check in compile_method. I didn't add any >> logging or warning because this is really a corner case. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >> (Ignore the extra tags in the webrev) >> >> Best regards, >> Nils Eliasson From rwestrel at redhat.com Fri Apr 15 11:14:41 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 13:14:41 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com> References: <05C0DC10-1543-4F70-AC88-CA9AD4004140@oracle.com> Message-ID: <5710CD21.4060902@redhat.com> Hi Christian, Thanks for looking at this. > I wonder if this has any performance implications (good or bad). > This alignment is not aarch64 specific so we were doing it all the > time. Unless I'm missing something nops in the body of a loop can't really help performance. This looks like a bug to me. Roland. From tobias.hartmann at oracle.com Fri Apr 15 11:15:42 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Apr 2016 13:15:42 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710B6DD.9090009@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> Message-ID: <5710CD5E.5070103@oracle.com> Hi Nils, On 15.04.2016 11:39, Nils Eliasson wrote: > Thanks Vladimir! > On 2016-04-15 01:41, Vladimir Kozlov wrote: >> I agree with this simple change as the fix. >> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. >> I don't see a PIT link in the bug report. > > There was none, Tobias found this regression testing something else. > > Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java > > Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. Otherwise looks good to me. Best regards, Tobias > > Regards, > Nils > >> >> Thanks, >> Vladimir >> >> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this fix. >>> >>> Summary: >>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. >>> >>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some >>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>> >>> Solution: >>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any >>> logging or warning because this is really a corner case. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>> (Ignore the extra tags in the webrev) >>> >>> Best regards, >>> Nils Eliasson > From nils.eliasson at oracle.com Fri Apr 15 11:22:59 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 13:22:59 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710CD5E.5070103@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> Message-ID: <5710CF13.5090404@oracle.com> Hi Tobias, Thanks for your feedback! New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 Regards, Nils On 2016-04-15 13:15, Tobias Hartmann wrote: > Hi Nils, > > On 15.04.2016 11:39, Nils Eliasson wrote: >> Thanks Vladimir! >> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>> I agree with this simple change as the fix. >>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. >>> I don't see a PIT link in the bug report. >> There was none, Tobias found this regression testing something else. >> >> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ > Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. > > Otherwise looks good to me. > > Best regards, > Tobias > >> Regards, >> Nils >> >>> Thanks, >>> Vladimir >>> >>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this fix. >>>> >>>> Summary: >>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. >>>> >>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some >>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>>> >>>> Solution: >>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any >>>> logging or warning because this is really a corner case. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>> (Ignore the extra tags in the webrev) >>>> >>>> Best regards, >>>> Nils Eliasson From tobias.hartmann at oracle.com Fri Apr 15 11:24:10 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Apr 2016 13:24:10 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710CF13.5090404@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> <5710CF13.5090404@oracle.com> Message-ID: <5710CF5A.3090409@oracle.com> Hi Nils, On 15.04.2016 13:22, Nils Eliasson wrote: > Hi Tobias, > > Thanks for your feedback! > > New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 Looks good! Best regards, Tobias > > Regards, > Nils > > On 2016-04-15 13:15, Tobias Hartmann wrote: >> Hi Nils, >> >> On 15.04.2016 11:39, Nils Eliasson wrote: >>> Thanks Vladimir! >>> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>>> I agree with this simple change as the fix. >>>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication if Xcomp was used. >>>> I don't see a PIT link in the bug report. >>> There was none, Tobias found this regression testing something else. >>> >>> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ >> Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. >> >> Otherwise looks good to me. >> >> Best regards, >> Tobias >> >>> Regards, >>> Nils >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this fix. >>>>> >>>>> Summary: >>>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return there that just ignored the compile. >>>>> >>>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter is set to false (but the interpreter it is still available) and then some >>>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>>>> >>>>> Solution: >>>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. This time I choose to extend the _initalized check in compile_method. I didn't add any >>>>> logging or warning because this is really a corner case. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>>> (Ignore the extra tags in the webrev) >>>>> >>>>> Best regards, >>>>> Nils Eliasson > From vladimir.x.ivanov at oracle.com Fri Apr 15 11:42:40 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Apr 2016 14:42:40 +0300 Subject: RFR(m): 8145468 deprecations for java.lang In-Reply-To: <57107B6B.5070100@javaspecialists.eu> References: <570EF756.2090602@oracle.com> <570FAB0C.7070505@javaspecialists.eu> <57101D11.6060109@oracle.com> <5710234A.2060001@oracle.com> <57107B6B.5070100@javaspecialists.eu> Message-ID: <5710D3B0.6010903@oracle.com> >>> I had a sidebar with Shipilev on this, and this is indeed still >>> potentially an issue. Alexey's example was: >>> >>> set.contains(new Integer(i)) // 1 >>> >>> vs >>> >>> set.contains(Integer.valueOf(i)) // 2 >>> >>> EA is able to optimize away the allocation in line 1, but the additional >>> complexity of dealing with the Integer cache in valueOf() defeats EA for >>> line 2. (Autoboxing pretty much desugars to line 2.) >> >> I'd say it's a motivating example to improve EA implementation in C2, >> but not to avoid deprecation of public constructors in primitive type >> boxes. It shouldn't matter for C2 whether Integer.valueOf() or >> Integer:: is used. If it does, it's a bug. >> > To do that would probably require a change to the Java Language > Specification to allow us to get rid of the IntegerCache. Unfortunately > it is defined to have a range of -128 to 127 at least in the cache, so > this probably makes it really hard or impossible to optimize this with > EA. I always found it amusing that the killer application for EA, > getting rid of autoboxed Integer objects, didn't really work :-) Still, I'd separate optimization and specification aspects. This case is neither "really hard" nor impossible to optimize. What is hard is to ensure the optimization covers all interesting cases :-) AFAIK C2 should already do a pretty decent job of eliminating box/unbox pairs (e.g., Integer.valueOf().intValue()) and the cache is not a problem here. What can cause problems is when box identity intervenes. For example, even for non-escaping objects the runtime has to be able to materialize them at safepoints. In order to preserve identity invariants, the runtime has to take into account how the box is created (constructor vs factory method). Probably, that's the missing case right now. But there's nothing insurmountable to fix it - the runtime should just consult the cache in the rare cases when rematerialization is necessary. Best regards, Vladimir Ivanov From rwestrel at redhat.com Fri Apr 15 12:24:30 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 14:24:30 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <571023F8.5090903@oracle.com> References: <571023F8.5090903@oracle.com> Message-ID: <5710DD7E.1060105@redhat.com> Hi Vladimir, Thanks for looking at this. > I agree with optimization but I am not sure about changes. Is this an optimization? It looks more like a bug to me. > Can we check only one previous block to be more conservative?: > > Block* b = prev(targ_block) > bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment() > && !b->head()->is_Loop() That would be good enough as far as I can tell. Here is a new webrev: http://cr.openjdk.java.net/~roland/8154135/webrev.01/ > Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be > for RISC cpus (with fixed instruction size) we should change them. Thanks for the pointer. This said, I don't see what could prevent the problem I see from happening on x86 so to me it looks like a bug, rather than a tuning problem. Roland. From nils.eliasson at oracle.com Fri Apr 15 13:06:58 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 15:06:58 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> Message-ID: <5710E772.5050801@oracle.com> Hi, On 2016-04-14 20:45, Christian Thalinger wrote: > >> On Apr 14, 2016, at 2:43 AM, Nils Eliasson > > wrote: >> >> I moved the reasons to CompileTask.hpp and put it together with the >> names list. Also changed the type from int to CompileReason as Igor >> suggested. >> >> It gets verbose in the method declarations in compileBroker > > Don?t worry about this. > >> and sometimes I think CompileReason should be declared in >> CompileBroker because it is mostly used there. On the other hand, >> CompileTask is the keeper of the CompileReason so it makes sense too. > > Yes, that?s the right place. > >> >> New webrev: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >> > > *+ bool can_become_stale() const {* > *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* > *+ }* > I?m not a fan of implicit contracts just defined by comments. This > method doesn?t seem to be performance critical so I would suggest to > use a switch-case. An attribute on the enum would be much better but > we all know this isn?t Java. As you suggested: http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 Also made reasons CTW and Replay not stale-able. Thanks! Nils > >> >> Thanks! >> Nils >> >> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>> Very nice, I like it. >>> >>> One note. CompileReason (and its names) should be CompileTask class >>> where it is recorded. Then CompileTask::can_become_stale() can be in >>> header file so it is inlinined on all platforms. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>> >>>> >>>> Summary >>>> Introduced an enum CompileReason with members matching all the old >>>> variants, and a table containing all the unchanged strings. I see the >>>> possibility of removing/changing/simplifying some CompileReasons but >>>> have choosen not to do so in this change. >>>> >>>> Only new logic is the CompileTask::can_become_stale() method. >>>> >>>> Testing: >>>> Running Testset hotspot on all platforms and hotspot_all on one >>>> platform >>>> >>>> Regards, >>>> Nils Eliawsson >>>> >>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>> >>>>>> I'll do a proper fix, it is the right thing to do and should be >>>>>> pretty >>>>>> quick. I'll change the comment to an enum that represent who >>>>>> submitted >>>>>> the compile, and add a table for the comments. This could be >>>>>> useful in >>>>>> other settings to. >>>>> >>>>> Sounds good. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Regards, >>>>>> Nils >>>>>> >>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>> What do you mean "stale"? >>>>>>> I would prefer to see the real fix as you suggested to avoid >>>>>>> removing >>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>> >>>>>>>> Summary: >>>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>>> the compile queue as stale. >>>>>>>> >>>>>>>> Solution: >>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>> stale while the test is running. (Also added some extra >>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>> >>>>>>>> This is an workaround but we should consider fixing something >>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>> task with info about the origin of the compile. The comment >>>>>>>> field has >>>>>>>> this information - but then it needs to be >>>>>>>> converted to an enum. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Nils Eliasson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Fri Apr 15 13:30:29 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 15:30:29 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <57102AB3.30709@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> Message-ID: <5710ECF5.80201@oracle.com> I forgot the link to the test job: This is both for this and JDK-8153013 BlockingCompilation test times out https://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1309-10848 Regards, //Nils On 2016-04-15 01:41, Vladimir Kozlov wrote: > I agree with this simple change as the fix. > Note, -Xcomp does not switch off Interpreter (we can run without > Interpreter). We use !UseInterpreter as indication if Xcomp was used. > I don't see a PIT link in the bug report. > > Thanks, > Vladimir > > On 4/14/16 6:17 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this fix. >> >> Summary: >> In JDK-8150646 I added an assert in compile_method that the compiler >> must not be NULL. Before there was a return there that just ignored >> the compile. >> >> Running the VM with the flag combination -Xcomp and >> -XX:TieredStopAtLevel=0 creates a special situation: UseInterpreter >> is set to false (but the interpreter it is still available) and then >> some >> essential methods are forced to be compiled, but the initial >> complevel becomes 0 and hits the assert in compileBroker. >> >> Solution: >> We could discuss if it should be allowed to submit compiles on level >> 0, a change that would become a bit larger. This time I choose to >> extend the _initalized check in compile_method. I didn't add any >> logging or warning because this is really a corner case. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >> (Ignore the extra tags in the webrev) >> >> Best regards, >> Nils Eliasson From nils.eliasson at oracle.com Fri Apr 15 15:10:35 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 15 Apr 2016 17:10:35 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log Message-ID: <5711046B.9080808@oracle.com> Hi, Please review this fix of print opto_assembly. Summary: The compilelog can get corrupted and the VM may assert on "failed: bad tag in log". When printing assembly in output.cpp we first take the ttylock, print the head and then the method metadata. However the metadata printing makes a vm entry and may block for a safepoint and will then release the lock (break_tty_lock_for_safepoint). After that some of the other compiler thread that haven't safepointed will take the lock and the broken log will be a fact when the safepoint is over and the first thread starts logging again. Solution: Print the method metadata to a temporary buffer, then take the tty lock. Testing: Repro from bug stops failing. Running :hotspot_all (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ Regards, Nils Eliasson -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Fri Apr 15 15:11:37 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Apr 2016 17:11:37 +0200 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <571039FC.1000603@oracle.com> References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com> Message-ID: <571104A9.7060208@oracle.com> Hi Vladimir, thank you for the feedback! On 04/15/2016 02:46 AM, Vladimir Kozlov wrote: > I think check should use !isa_oopptr() since one of nodes could be > ConP NULL ptr which is not klassptr. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/ RBT testing passes. I did ~70 runs with the reproducer, no problems have shown up so far. I'll do ~900 more runs, though. Thank you! Best regards, Zoltan > > Thanks, > Vladimir > > On 4/14/16 6:21 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8153357. >> >> https://bugs.openjdk.java.net/browse/JDK-8153357 >> >> Problem: When determining the unique input of a phi, the C2 compiler >> removes cast nodes connecting the phi to its unique input. >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 >> >> >> Then (if the phi has indeed a unique input), the C2 compiler attempts >> replace the phi with a cast node. The new cast node feeds from the >> unique input. >> >> To be able to remove the phi node, the C2 compiler must to determine >> the type of cast to add in place of the phi node (CastII, CastPP, or >> CheckCastPP). >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 >> >> >> The failure in the bug report appears because the C2 compiler adds a >> cast node of unexpected type to the graph (a CheckCastPP instead of a >> CastPP when casting between two klass pointers). >> >> Please find more details about the cause of the failure in the bug >> description: >> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 >> >> >> >> Solution: Refine C2's logic to determine the type of cast node added. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ >> >> Testing: >> - JPRT; >> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); >> - 500 non-failing runs with the reproducer (the problem reproduces >> with < 100 runs). >> >> Thank you and best regards, >> >> >> Zoltan >> From zoltan.majo at oracle.com Fri Apr 15 15:25:01 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Apr 2016 17:25:01 +0200 Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if on-stack-replacement is enabled Message-ID: <571107CD.8070205@oracle.com> Hi, please review the patch for 8072428. https://bugs.openjdk.java.net/browse/JDK-8072428 Problem: On-stack-replacement requires loop counters; disabling loop counters with on-stack-replacement enabled triggers an assert. Solution: Set UseLoopCounter ergonomically if on-stack-replacement is enabled. Print warning. Webrev: http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/ Tested with locally-built VM (linux_x64). Thank you! Best regards, Zoltan From long.chen at linaro.org Fri Apr 15 12:45:44 2016 From: long.chen at linaro.org (Long Chen) Date: Fri, 15 Apr 2016 20:45:44 +0800 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' Message-ID: Hi Please review this patch making use of DC ZVA to do block zeroing. http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.patch I?m sorry that I can?t produce a test case matching the ?clear_array? pattern showing obvious improvement. However, generating ?DC ZVA? should be the right thing to do as it usually has better cache behaviors. Besides, gcc and linux?s memset have been using ?DC ZVA?. The ArrayFillByte case benefits from ?DC ZVA? when the array length is large. Test, http://people.linaro.org/~long.chen/block_zeroing/ArrayFillByte.java Performance result, http://people.linaro.org/~long.chen/block_zeroing/BlockZeroing.html Tested with jtreg hotspot and langtools. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Fri Apr 15 16:50:57 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 15 Apr 2016 09:50:57 -0700 Subject: RFR(XS): 8154174: improve JitTester performance In-Reply-To: <570F9B88.2040102@oracle.com> References: <570F9B88.2040102@oracle.com> Message-ID: Hi Anton, looks good to me, thanks for doing that. ? Igor > On Apr 14, 2016, at 6:30 AM, Anton Ivanov wrote: > > Hi, > Please review small patch that improves JitTester performance > > In current implementation JitTester has exception based logic, which is not good by itself, but changing this is quite expensive and there is simple way to decrease exception overhead - turn off stack trace in ProductionFailedException constructor ( this exception is created very often and stack trace is never need, as it only used to control program flow ) > Also small improvement was done in code that does deep copy of SymbolTable element ( Map iteration was rewritten to get rid of multiple redundant Map.get() which cost 0(1) only in average case and could be worse potentially ) > > Testing: local > > webrev: http://cr.openjdk.java.net/~aaivanov/8154174/webrev > bug: https://bugs.openjdk.java.net/browse/JDK-8154174 > > -- > Best regards, > Anton Ivanov > From vladimir.kozlov at oracle.com Fri Apr 15 17:14:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:14:05 -0700 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5710CF13.5090404@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> <5710CF13.5090404@oracle.com> Message-ID: <5711215D.4060202@oracle.com> Looks good. Make sure the test is executed in JPRT. Thanks, Vladimir On 4/15/16 4:22 AM, Nils Eliasson wrote: > Hi Tobias, > > Thanks for your feedback! > > New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 > > Regards, > Nils > > On 2016-04-15 13:15, Tobias Hartmann wrote: >> Hi Nils, >> >> On 15.04.2016 11:39, Nils Eliasson wrote: >>> Thanks Vladimir! >>> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>>> I agree with this simple change as the fix. >>>> Note, -Xcomp does not switch off Interpreter (we can run without Interpreter). We use !UseInterpreter as indication >>>> if Xcomp was used. >>>> I don't see a PIT link in the bug report. >>> There was none, Tobias found this regression testing something else. >>> >>> Now I have added a regression test: hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >>> >>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ >> Please set the test copyright date to 2016. I would maybe also change the test summary to what you wrote in line 30 >> ("Sanity test flag combo..") because this has nothing to do without support for blocking compiles. >> >> Otherwise looks good to me. >> >> Best regards, >> Tobias >> >>> Regards, >>> Nils >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this fix. >>>>> >>>>> Summary: >>>>> In JDK-8150646 I added an assert in compile_method that the compiler must not be NULL. Before there was a return >>>>> there that just ignored the compile. >>>>> >>>>> Running the VM with the flag combination -Xcomp and -XX:TieredStopAtLevel=0 creates a special situation: >>>>> UseInterpreter is set to false (but the interpreter it is still available) and then some >>>>> essential methods are forced to be compiled, but the initial complevel becomes 0 and hits the assert in compileBroker. >>>>> >>>>> Solution: >>>>> We could discuss if it should be allowed to submit compiles on level 0, a change that would become a bit larger. >>>>> This time I choose to extend the _initalized check in compile_method. I didn't add any >>>>> logging or warning because this is really a corner case. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>>> (Ignore the extra tags in the webrev) >>>>> >>>>> Best regards, >>>>> Nils Eliasson > From vladimir.kozlov at oracle.com Fri Apr 15 17:27:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:27:40 -0700 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <571104A9.7060208@oracle.com> References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com> <571104A9.7060208@oracle.com> Message-ID: <5711248C.7000503@oracle.com> Looks good to me. thanks, Vladimir On 4/15/16 8:11 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! > > On 04/15/2016 02:46 AM, Vladimir Kozlov wrote: >> I think check should use !isa_oopptr() since one of nodes could be ConP NULL ptr which is not klassptr. > > Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/ > > RBT testing passes. I did ~70 runs with the reproducer, no problems have shown up so far. I'll do ~900 more runs, though. > > Thank you! > > Best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 4/14/16 6:21 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the patch for 8153357. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153357 >>> >>> Problem: When determining the unique input of a phi, the C2 compiler removes cast nodes connecting the phi to its >>> unique input. >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 >>> >>> Then (if the phi has indeed a unique input), the C2 compiler attempts replace the phi with a cast node. The new cast >>> node feeds from the unique input. >>> >>> To be able to remove the phi node, the C2 compiler must to determine the type of cast to add in place of the phi >>> node (CastII, CastPP, or CheckCastPP). >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 >>> >>> The failure in the bug report appears because the C2 compiler adds a cast node of unexpected type to the graph (a >>> CheckCastPP instead of a CastPP when casting between two klass pointers). >>> >>> Please find more details about the cause of the failure in the bug description: >>> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 >>> >>> >>> >>> Solution: Refine C2's logic to determine the type of cast node added. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ >>> >>> Testing: >>> - JPRT; >>> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); >>> - 500 non-failing runs with the reproducer (the problem reproduces with < 100 runs). >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> > From vladimir.kozlov at oracle.com Fri Apr 15 17:34:51 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:34:51 -0700 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5710DD7E.1060105@redhat.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> Message-ID: <5711263B.9070505@oracle.com> On 4/15/16 5:24 AM, Roland Westrelin wrote: > Hi Vladimir, > > Thanks for looking at this. > >> I agree with optimization but I am not sure about changes. > > Is this an optimization? It looks more like a bug to me. Code is correct but not optimal. I don't think it is bug. > >> Can we check only one previous block to be more conservative?: >> >> Block* b = prev(targ_block) >> bool has_top = targ_block->head()->is_Loop() && b->has_loop_alignment() >> && !b->head()->is_Loop() > > That would be good enough as far as I can tell. Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8154135/webrev.01/ Looks good. > >> Did you try to play with NumberOfLoopInstrToAlign and MaxLoopPad? May be >> for RISC cpus (with fixed instruction size) we should change them. > > Thanks for the pointer. This said, I don't see what could prevent the > problem I see from happening on x86 so to me it looks like a bug, rather > than a tuning problem. NumberOfLoopInstrToAlign code is used only on x86 and may hide the problem you see. And I suggested to look on that code to see if we can get additional performance benefits on RISC (on arm64 in your case). Thanks, Vladimir > > Roland. > From vladimir.kozlov at oracle.com Fri Apr 15 17:44:32 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 10:44:32 -0700 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <5711046B.9080808@oracle.com> References: <5711046B.9080808@oracle.com> Message-ID: <57112880.1010204@oracle.com> Use resizable stream: stringStream(size_t initial_bufsize = 256); 1024 may not be enough. Thanks, Vladimir On 4/15/16 8:10 AM, Nils Eliasson wrote: > Hi, > > Please review this fix of print opto_assembly. > > Summary: > The compilelog can get corrupted and the VM may assert on "failed: bad tag in log". > > When printing assembly in output.cpp we first take the ttylock, print the head and then the method metadata. However the > metadata printing makes a vm entry and may block for a safepoint and will then release the lock > (break_tty_lock_for_safepoint). After that some of the other compiler thread that haven't safepointed will take the lock > and the broken log will be a fact when the safepoint is over and the first thread starts logging again. > > Solution: > Print the method metadata to a temporary buffer, then take the tty lock. > > Testing: > Repro from bug stops failing. > Running :hotspot_all (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 > Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ > > Regards, > Nils Eliasson From rwestrel at redhat.com Fri Apr 15 18:17:47 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 20:17:47 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5711263B.9070505@oracle.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> <5711263B.9070505@oracle.com> Message-ID: <5711304B.3040907@redhat.com> >> http://cr.openjdk.java.net/~roland/8154135/webrev.01/ > > Looks good. Thanks for the review. I need a sponsor now... > NumberOfLoopInstrToAlign code is used only on x86 and may hide the > problem you see. > And I suggested to look on that code to see if we can get additional > performance benefits on RISC (on arm64 in your case). Ok. Again, thanks for the pointer to that piece of code. Roland. From vladimir.x.ivanov at oracle.com Fri Apr 15 18:25:26 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Apr 2016 21:25:26 +0300 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses In-Reply-To: <57103FF2.5060907@oracle.com> References: <570FCB02.6000507@oracle.com> <57103FF2.5060907@oracle.com> Message-ID: <57113216.8030906@oracle.com> Thanks for the feedback, Vladimir. Updated version: http://cr.openjdk.java.net/~vlivanov/8134918/webrev.01/ Additional changes: * alias type doesn't differentiate between byte[] & boolean[]; use address type to narrow the basic type; > Next assert should be at the beginning of method: > + assert(type != T_OBJECT || !unaligned, "unaligned access not > supported with object type"); Fixed. > Fix Copyright year in the test. Fixed. > There is no PIT link in the bug report. Added. Best regards, Vladimir Ivanov > > Thanks, > Vladimir > > On 4/14/16 9:53 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8134918 >> >> Type speculation can produce mismatched unsafe accesses. >> >> It injects a guard based on profile data and then propagate type info >> down to the users. If there's an unsafe access, it can become >> mismatched w.r.t. profile data being used. >> >> It happens even for valid usages. If an unsafe access always matches >> memory location at runtime, the code produced by type speculation in >> that case is effectively dead. >> >> What cause problems are unsafe OOP accesses (U.putObject()/getObject() >> on non-OOP locations). >> >> The fix is to avoid intrinsification of problematic accesses. Type >> speculation injects precise type information, which is available >> during intrinsification. >> >> We could try to support mismatched unsafe object accesses instead, but >> I don't see any value in that. >> >> Testing: JPRT, pit-hs-comp (in progress). >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Apr 15 18:26:21 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Apr 2016 21:26:21 +0300 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5710DD7E.1060105@redhat.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> Message-ID: <5711324D.6090000@oracle.com> Looks good. I'll sponsor the fix. Best regards, Vladimir Ivanov On 4/15/16 3:24 PM, Roland Westrelin wrote: > That would be good enough as far as I can tell. Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8154135/webrev.01/ From vladimir.kozlov at oracle.com Fri Apr 15 18:52:47 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 11:52:47 -0700 Subject: [9] RFR (S): 8134918: C2: Type speculation produces mismatched unsafe accesses In-Reply-To: <57113216.8030906@oracle.com> References: <570FCB02.6000507@oracle.com> <57103FF2.5060907@oracle.com> <57113216.8030906@oracle.com> Message-ID: <5711387F.1040600@oracle.com> Looks good. Thanks, Vladimir On 4/15/16 11:25 AM, Vladimir Ivanov wrote: > Thanks for the feedback, Vladimir. > > Updated version: > http://cr.openjdk.java.net/~vlivanov/8134918/webrev.01/ > > Additional changes: > > * alias type doesn't differentiate between byte[] & boolean[]; use address type to narrow the basic type; > >> Next assert should be at the beginning of method: >> + assert(type != T_OBJECT || !unaligned, "unaligned access not >> supported with object type"); > Fixed. > >> Fix Copyright year in the test. > Fixed. > >> There is no PIT link in the bug report. > Added. > > Best regards, > Vladimir Ivanov > >> >> Thanks, >> Vladimir >> >> On 4/14/16 9:53 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8134918/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8134918 >>> >>> Type speculation can produce mismatched unsafe accesses. >>> >>> It injects a guard based on profile data and then propagate type info >>> down to the users. If there's an unsafe access, it can become >>> mismatched w.r.t. profile data being used. >>> >>> It happens even for valid usages. If an unsafe access always matches >>> memory location at runtime, the code produced by type speculation in >>> that case is effectively dead. >>> >>> What cause problems are unsafe OOP accesses (U.putObject()/getObject() >>> on non-OOP locations). >>> >>> The fix is to avoid intrinsification of problematic accesses. Type >>> speculation injects precise type information, which is available >>> during intrinsification. >>> >>> We could try to support mismatched unsafe object accesses instead, but >>> I don't see any value in that. >>> >>> Testing: JPRT, pit-hs-comp (in progress). >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov From vladimir.kozlov at oracle.com Fri Apr 15 18:56:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 11:56:16 -0700 Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if on-stack-replacement is enabled In-Reply-To: <571107CD.8070205@oracle.com> References: <571107CD.8070205@oracle.com> Message-ID: <57113950.9070700@oracle.com> Looks good. Thanks, Vladimir On 4/15/16 8:25 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8072428. > > https://bugs.openjdk.java.net/browse/JDK-8072428 > > Problem: On-stack-replacement requires loop counters; disabling loop counters with on-stack-replacement enabled triggers > an assert. > > Solution: Set UseLoopCounter ergonomically if on-stack-replacement is enabled. Print warning. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/ > > Tested with locally-built VM (linux_x64). > > Thank you! > > Best regards, > > > Zoltan > From rwestrel at redhat.com Fri Apr 15 18:58:39 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 15 Apr 2016 20:58:39 +0200 Subject: RFR(S): 8154135: Loop alignment may be added inside the loop body In-Reply-To: <5711324D.6090000@oracle.com> References: <571023F8.5090903@oracle.com> <5710DD7E.1060105@redhat.com> <5711324D.6090000@oracle.com> Message-ID: <571139DF.4070607@redhat.com> > Looks good. > > I'll sponsor the fix. Thanks for the review and for pushing it! Roland. From vladimir.kozlov at oracle.com Fri Apr 15 19:30:15 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Apr 2016 12:30:15 -0700 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57109BE2.1090602@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> <5710478C.8050200@oracle.com> <57109BE2.1090602@oracle.com> Message-ID: <57114147.5060206@oracle.com> Thank you, Jamsheed. Testing results looks fine so far. I am pushing it. Thanks, Vladimir On 4/15/16 12:44 AM, Jamsheed C m wrote: > Hi Vladimir, > > PIT testing is in progress, link is available in bug report. > > Best Regards, > Jamsheed > > On 4/15/2016 7:14 AM, Vladimir Kozlov wrote: >> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results. >> >> Thanks, >> Vladimir >> >> On 4/12/16 2:45 AM, Doerr, Martin wrote: >>> Hi, >>> >>> I think we have come to a common understanding and there was no complaint about my latest webrev: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Can I consider it reviewed? >>> Can somebody sponsor, please? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin >>> Sent: Donnerstag, 7. April 2016 12:52 >>> To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> Hi Andrew, Jamsheed and all, >>> >>> thank you very much for your input. >>> >>> As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). >>> Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also >>> correct). >>> >>> My change still contains a releasing store for newly created ExceptionCache instances. >>> As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce >>> additional false negatives on weak memory model platforms. >>> I think having the release doesn't hurt too much and makes the design a little cleaner. >>> >>> I also added comments based on your input. >>> >>> The new webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Please review. I will also need a sponsor from Oracle, please. >>> >>> Thanks again and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Andrew Haley [mailto:aph at redhat.com] >>> Sent: Donnerstag, 7. April 2016 12:14 >>> To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> On 07/04/16 10:08, Doerr, Martin wrote: >>> >>>> atomic update for the _count would only be required if there were >>>> multiply threads which attempt to increment it >>>> concurrently. However, updates are under lock, so we only have >>>> concurrent readers which is ok. >>>> >>>> I still think "volatile" does what we need here. Especially the xlC >>>> compiler on AIX tends to reload variables from memory. Exactly this >>>> can be prevented by making the field volatile. >>> >>> I think your latest patch is OK. Whether volatile is really good >>> enough, I don't know. The new(ish) C++ memory model treats this as a >>> race, and therefore undefined behaviour. Old C++ didn't have a memory >>> model, so the best we can do with racy code is guess about what our >>> compilers might do. >>> >>> I certainly much prefer a release_store to the storestore fence used >>> in the fix for 8143897. >>> >>> Andrew. >>> > From christian.thalinger at oracle.com Fri Apr 15 20:36:33 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 15 Apr 2016 10:36:33 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: <532B4EF9-CE01-4BCF-8B9A-396D6B004BED@oracle.com> > On Apr 14, 2016, at 11:04 PM, Berg, Michael C wrote: > > Vladimir, the code has been updated and is available at: > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ Much better! There are a few smaller things: +class MaskCreateINode : public Node { Didn?t we agree on CreateMaskINode? names should be consistent: MaskCreateINode -> CreateMaskINode, set_mask -> createMask. + Flag_has_vect_mask_set = Flag_is_scheduled << 1, + bool has_vect_mask_set() const { return (_flags & Flag_has_vect_mask_set) != 0; } Please rename to *has_vector_mask_set. +const bool Matcher::has_predicated_vectors(void) { + bool ret_value = false; + switch(UseAVX) { + case 0: + case 1: + case 2: + break; + + case 3: + ret_value = VM_Version::supports_avx512vl(); + break; + } + + return ret_value; +} Change this to: +const bool Matcher::has_predicated_vectors(void) { + switch (UseAVX) { + case 3: + return VM_Version::supports_avx512vl(); + default: return false; + } +} src/share/vm/opto/matcher.hpp + // Some uarchs have predicated registers on vectors Is ?uarchs? a typo? > > Thanks, > Michael > > -----Original Message----- > From: Berg, Michael C > Sent: Thursday, April 14, 2016 5:54 PM > To: Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: CR for RFR 8153998 > > Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. > It will be clean when next you see the code. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:52 PM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 5:12 PM, Berg, Michael C wrote: >> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. > > How it is sizeless when it generates kmovwl() instruction? > Do you mean it does not have side effects (no flags modified)? > > Vladimir > >> >> Ok, I will try the pattern match method. >> >> Thanks >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:02 PM >> To: Berg, Michael C ; Christian Thalinger >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>> Vladimir, >>> >>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>> >>> I tried something like that early on with CountedLoopEnd. >> >> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >> I don't see any side effects for restoremask in your code. What are you talking about? >> >> I am suggesting something like next: >> >> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >> predicate(n->has_vect_mask_set()); >> match(CountedLoopEnd cop cr); >> effect(USE labl); >> >> ins_cost(400); >> format %{ "j$cop $labl\t# loop end\n\t" >> "restoremask \t# vector mask restore for loops" >> %} >> ins_encode %{ >> Label* L = $labl$$label; >> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >> __ restoremask(); >> %} >> ins_pipe(pipe_jcc); >> %} >> >> Vladimir >> >>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 4:27 PM >>> To: Christian Thalinger ; Berg, >>> Michael C >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>> >>>>> Christian, >>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>> >>>> That?s unfortunate but I understand. I?m fine with it then. >>> >>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>> >>> Vladimir >>> >>>> >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>> > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>> See below for context. >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>> *To:*Berg, Michael C > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>> This code was tested as follows(see jbs entry below): >>>>> >>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>> >>>>> webrev: >>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>> >>>>> >>>>> +//------------------------------MachMskNode----------------------- >>>>> +- >>>>> +- >>>>> ---------- >>>>> >>>>> +// Machine function Msk Node >>>>> >>>>> +class MachMskNode : public MachIdealNode { >>>>> >>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>> Ok, that?s easy enough. >>>>> Also, I don?t quite understand why we have: >>>>> >>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>> >>>>> + predicate(VM_Version::supports_avx512vl()); >>>>> >>>>> + match(Set dst (MaskCreateI src)); >>>>> >>>>> + effect(TEMP dst); >>>>> >>>>> + format %{ "createmsk $dst, $src" %} >>>>> >>>>> + ins_encode %{ >>>>> >>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>> >>>>> + %} >>>>> >>>>> but: >>>>> >>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>> const { >>>>> >>>>> + MacroAssembler _masm(&cbuf); >>>>> >>>>> + __ restoremsk(); >>>>> >>>>> + } >>>>> >>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>> >>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>> >>>>> Thanks, >>>>> Michael >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Apr 15 20:43:10 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 15 Apr 2016 10:43:10 -1000 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5710E772.5050801@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> Message-ID: > On Apr 15, 2016, at 3:06 AM, Nils Eliasson wrote: > > Hi, > > On 2016-04-14 20:45, Christian Thalinger wrote: >> >>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson < nils.eliasson at oracle.com > wrote: >>> >>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. >>> >>> It gets verbose in the method declarations in compileBroker >> >> Don?t worry about this. >> >>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. >> >> Yes, that?s the right place. >> >>> >>> New webrev: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >> >> + bool can_become_stale() const { >> + return !_is_blocking && (_compile_reason < Reason_Whitebox); >> + } >> I?m not a fan of implicit contracts just defined by comments. This method doesn?t seem to be performance critical so I would suggest to use a switch-case. An attribute on the enum would be much better but we all know this isn?t Java. > > As you suggested: > http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 Thanks. A space is missing and the closing } indent is wrong: + bool can_become_stale() const { + switch(_compile_reason) { + case Reason_BackedgeCount: + case Reason_InvocationCount: + case Reason_Tiered: + return !_is_blocking; + } + return false; + } Also, what about: + Reason_None, + Reason_CTW, // Compile the world + Reason_Replay, // ciReplay These were covered before. > > Also made reasons CTW and Replay not stale-able. > > Thanks! > Nils > >> >>> >>> Thanks! >>> Nils >>> >>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>> Very nice, I like it. >>>> >>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> New webrev: >>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>> >>>>> Summary >>>>> Introduced an enum CompileReason with members matching all the old >>>>> variants, and a table containing all the unchanged strings. I see the >>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>> have choosen not to do so in this change. >>>>> >>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>> >>>>> Testing: >>>>> Running Testset hotspot on all platforms and hotspot_all on one platform >>>>> >>>>> Regards, >>>>> Nils Eliawsson >>>>> >>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>> >>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>>>> quick. I'll change the comment to an enum that represent who submitted >>>>>>> the compile, and add a table for the comments. This could be useful in >>>>>>> other settings to. >>>>>> >>>>>> Sounds good. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Nils >>>>>>> >>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>> What do you mean "stale"? >>>>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>> >>>>>>>>> Summary: >>>>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>>>> the compile queue as stale. >>>>>>>>> >>>>>>>>> Solution: >>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>>> >>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>> task with info about the origin of the compile. The comment field has >>>>>>>>> this information - but then it needs to be >>>>>>>>> converted to an enum. >>>>>>>>> >>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Nils Eliasson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Sat Apr 16 04:20:33 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Sat, 16 Apr 2016 04:20:33 +0000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Vladimir/Christian: I believe I have addressed all concerns in this update: Webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ Regards, Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Friday, April 15, 2016 2:04 AM To: 'Vladimir Kozlov' Cc: 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: CR for RFR 8153998 Vladimir, the code has been updated and is available at: webrev: http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ Thanks, Michael -----Original Message----- From: Berg, Michael C Sent: Thursday, April 14, 2016 5:54 PM To: Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: CR for RFR 8153998 Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. It will be clean when next you see the code. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, April 14, 2016 5:52 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: CR for RFR 8153998 On 4/14/16 5:12 PM, Berg, Michael C wrote: > The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. How it is sizeless when it generates kmovwl() instruction? Do you mean it does not have side effects (no flags modified)? Vladimir > > Ok, I will try the pattern match method. > > Thanks > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:02 PM > To: Berg, Michael C ; Christian Thalinger > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 4:38 PM, Berg, Michael C wrote: >> Vladimir, >> >> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >> >> I tried something like that early on with CountedLoopEnd. > > In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). > I don't see any side effects for restoremask in your code. What are you talking about? > > I am suggesting something like next: > > instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ > predicate(n->has_vect_mask_set()); > match(CountedLoopEnd cop cr); > effect(USE labl); > > ins_cost(400); > format %{ "j$cop $labl\t# loop end\n\t" > "restoremask \t# vector mask restore for loops" > %} > ins_encode %{ > Label* L = $labl$$label; > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump > __ restoremask(); > %} > ins_pipe(pipe_jcc); > %} > > Vladimir > >> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 4:27 PM >> To: Christian Thalinger ; Berg, >> Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>> >>>> Christian, >>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>> >>> That?s unfortunate but I understand. I?m fine with it then. >> >> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >> >> Vladimir >> >>> >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>> > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>> See below for context. >>>> Regards, >>>> Michael >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>> *To:*Berg, Michael C > >>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>> *Subject:*Re: CR for RFR 8153998 >>>> >>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>> Hi Folks, >>>> >>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>> This code was tested as follows(see jbs entry below): >>>> >>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>> >>>> >>>> +//------------------------------MachMskNode----------------------- >>>> +- >>>> +- >>>> ---------- >>>> >>>> +// Machine function Msk Node >>>> >>>> +class MachMskNode : public MachIdealNode { >>>> >>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>> Ok, that?s easy enough. >>>> Also, I don?t quite understand why we have: >>>> >>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>> >>>> + predicate(VM_Version::supports_avx512vl()); >>>> >>>> + match(Set dst (MaskCreateI src)); >>>> >>>> + effect(TEMP dst); >>>> >>>> + format %{ "createmsk $dst, $src" %} >>>> >>>> + ins_encode %{ >>>> >>>> + __ createmsk($dst$$Register, $src$$Register); >>>> >>>> + %} >>>> >>>> but: >>>> >>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>> const { >>>> >>>> + MacroAssembler _masm(&cbuf); >>>> >>>> + __ restoremsk(); >>>> >>>> + } >>>> >>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>> >>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>> >>>> Thanks, >>>> Michael >>> From aph at redhat.com Sat Apr 16 07:59:38 2016 From: aph at redhat.com (Andrew Haley) Date: Sat, 16 Apr 2016 08:59:38 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: Message-ID: <5711F0EA.8020106@redhat.com> + void dc(cache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); + } + + void ic(cache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); } Are DC and IC really synonyms? +typedef void (*_zero_Fn)(HeapWord* to, size_t count); + static void pd_fill_to_aligned_words(HeapWord* tohw, size_t count, juint value) { - pd_fill_to_words(tohw, count, value); + if (UseBlockZeroing + && value == 0 + && count >= (size_t)(BlockZeroingLowLimit >> LogHeapWordSize)) { + ((_zero_Fn)StubRoutines::zero_aligned_words())(tohw, count); + } + else { + pd_fill_to_words(tohw, count, value); + } } I'm not convinced of the value of this. We already know that a simple while (count-- > 0) { *to++ = v; } turns into a call to memset() which does DC ZVA. diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp @@ -4670,11 +4670,54 @@ BLOCK_COMMENT(is_string ? "} string_equals" : "} array_equals"); } + +// base: Address of a buffer to be zeroed, 8 bytes aligned. +// cnt: Count in 8-byte unit. +// is_large: True when 'cnt' is known to be >= BlockZeroingLowLimit. +void MacroAssembler::zero_words(Register base, Register cnt, bool is_large) +{ + if (UseBlockZeroing) { + Label non_block_zeroing; + block_zeroing(base, cnt, non_block_zeroing, is_large); Always use the imperative form of a verb for methods: "block_zero", not, "block_zeroing". // base: Address of a buffer to be zeroed, 8 bytes aligned. -// cnt: Count in 8-byte unit. -void MacroAssembler::zero_words(Register base, Register cnt) +// cnt: Immediate count in 8-byte unit. Please make this // cnt: count in HeapWords Thanks, Andrew. From vladimir.kozlov at oracle.com Mon Apr 18 07:00:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 00:00:28 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: <5714860C.5030702@oracle.com> This looks good. I will start our testing for it. Thanks, Vladimir On 4/15/16 9:20 PM, Berg, Michael C wrote: > Vladimir/Christian: > > I believe I have addressed all concerns in this update: > > Webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ > > Regards, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C > Sent: Friday, April 15, 2016 2:04 AM > To: 'Vladimir Kozlov' > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: RE: CR for RFR 8153998 > > Vladimir, the code has been updated and is available at: > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ > > Thanks, > Michael > > -----Original Message----- > From: Berg, Michael C > Sent: Thursday, April 14, 2016 5:54 PM > To: Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: CR for RFR 8153998 > > Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. > It will be clean when next you see the code. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:52 PM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 5:12 PM, Berg, Michael C wrote: >> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. > > How it is sizeless when it generates kmovwl() instruction? > Do you mean it does not have side effects (no flags modified)? > > Vladimir > >> >> Ok, I will try the pattern match method. >> >> Thanks >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:02 PM >> To: Berg, Michael C ; Christian Thalinger >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>> Vladimir, >>> >>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>> >>> I tried something like that early on with CountedLoopEnd. >> >> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >> I don't see any side effects for restoremask in your code. What are you talking about? >> >> I am suggesting something like next: >> >> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >> predicate(n->has_vect_mask_set()); >> match(CountedLoopEnd cop cr); >> effect(USE labl); >> >> ins_cost(400); >> format %{ "j$cop $labl\t# loop end\n\t" >> "restoremask \t# vector mask restore for loops" >> %} >> ins_encode %{ >> Label* L = $labl$$label; >> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >> __ restoremask(); >> %} >> ins_pipe(pipe_jcc); >> %} >> >> Vladimir >> >>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 4:27 PM >>> To: Christian Thalinger ; Berg, >>> Michael C >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>> >>>>> Christian, >>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>> >>>> That?s unfortunate but I understand. I?m fine with it then. >>> >>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>> >>> Vladimir >>> >>>> >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>> > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>> See below for context. >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>> *To:*Berg, Michael C > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>> This code was tested as follows(see jbs entry below): >>>>> >>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>> >>>>> webrev: >>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>> >>>>> >>>>> +//------------------------------MachMskNode----------------------- >>>>> +- >>>>> +- >>>>> ---------- >>>>> >>>>> +// Machine function Msk Node >>>>> >>>>> +class MachMskNode : public MachIdealNode { >>>>> >>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>> Ok, that?s easy enough. >>>>> Also, I don?t quite understand why we have: >>>>> >>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>> >>>>> + predicate(VM_Version::supports_avx512vl()); >>>>> >>>>> + match(Set dst (MaskCreateI src)); >>>>> >>>>> + effect(TEMP dst); >>>>> >>>>> + format %{ "createmsk $dst, $src" %} >>>>> >>>>> + ins_encode %{ >>>>> >>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>> >>>>> + %} >>>>> >>>>> but: >>>>> >>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>> const { >>>>> >>>>> + MacroAssembler _masm(&cbuf); >>>>> >>>>> + __ restoremsk(); >>>>> >>>>> + } >>>>> >>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>> >>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>> >>>>> Thanks, >>>>> Michael >>>> From martin.doerr at sap.com Mon Apr 18 07:31:36 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 18 Apr 2016 07:31:36 +0000 Subject: RFR(S): 8153267: nmethod's exception cache not multi-thread safe In-Reply-To: <57114147.5060206@oracle.com> References: <1890034be33741f5b088e8dd5693e6ae@DEWDFE13DE14.global.corp.sap> <57020636.7010806@oracle.com> <8fcbbb5c49844d419a27cab9dad08ddf@DEWDFE13DE14.global.corp.sap> <5704C48C.2070502@oracle.com> <5704F8DA.9030000@oracle.com> <57061D3E.8050408@oracle.com> <570632FF.7090103@redhat.com> <805b51b644b043e989120d7e86505f57@DEWDFE13DE14.global.corp.sap> <63d847270b544be89c8071ec04f6b29e@DEWDFE13DE14.global.corp.sap> <5710478C.8050200@oracle.com> <57109BE2.1090602@oracle.com> <57114147.5060206@oracle.com> Message-ID: <4e54bacd93284faf9843b60f98314de5@DEWDFE13DE14.global.corp.sap> Thanks everybody for the discussion, for reviewing and for sponsoring. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Freitag, 15. April 2016 21:30 To: Jamsheed C m ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe Thank you, Jamsheed. Testing results looks fine so far. I am pushing it. Thanks, Vladimir On 4/15/16 12:44 AM, Jamsheed C m wrote: > Hi Vladimir, > > PIT testing is in progress, link is available in bug report. > > Best Regards, > Jamsheed > > On 4/15/2016 7:14 AM, Vladimir Kozlov wrote: >> Looks fine to me. Jamsheed, please, run our PIT testing with these changes and analyze results. >> >> Thanks, >> Vladimir >> >> On 4/12/16 2:45 AM, Doerr, Martin wrote: >>> Hi, >>> >>> I think we have come to a common understanding and there was no complaint about my latest webrev: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Can I consider it reviewed? >>> Can somebody sponsor, please? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin >>> Sent: Donnerstag, 7. April 2016 12:52 >>> To: Andrew Haley ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> Hi Andrew, Jamsheed and all, >>> >>> thank you very much for your input. >>> >>> As Andrew, Jamsheed and I think, it's better to have a releasing store in increment_count(). >>> Therefore, I have replaced the storestore barrier introduced with JDK-8143897 (even though this barrier was also >>> correct). >>> >>> My change still contains a releasing store for newly created ExceptionCache instances. >>> As Jamsheed has pointed out, this should not be strictly required as we have the other barrier. It may only produce >>> additional false negatives on weak memory model platforms. >>> I think having the release doesn't hurt too much and makes the design a little cleaner. >>> >>> I also added comments based on your input. >>> >>> The new webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8153267_exception_cache/webrev.02/ >>> >>> Please review. I will also need a sponsor from Oracle, please. >>> >>> Thanks again and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Andrew Haley [mailto:aph at redhat.com] >>> Sent: Donnerstag, 7. April 2016 12:14 >>> To: Doerr, Martin ; Jamsheed C m ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(S): 8153267: nmethod's exception cache not multi-thread safe >>> >>> On 07/04/16 10:08, Doerr, Martin wrote: >>> >>>> atomic update for the _count would only be required if there were >>>> multiply threads which attempt to increment it >>>> concurrently. However, updates are under lock, so we only have >>>> concurrent readers which is ok. >>>> >>>> I still think "volatile" does what we need here. Especially the xlC >>>> compiler on AIX tends to reload variables from memory. Exactly this >>>> can be prevented by making the field volatile. >>> >>> I think your latest patch is OK. Whether volatile is really good >>> enough, I don't know. The new(ish) C++ memory model treats this as a >>> race, and therefore undefined behaviour. Old C++ didn't have a memory >>> model, so the best we can do with racy code is guess about what our >>> compilers might do. >>> >>> I certainly much prefer a release_store to the storestore fence used >>> in the fix for 8143897. >>> >>> Andrew. >>> > From zoltan.majo at oracle.com Mon Apr 18 07:36:40 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 18 Apr 2016 09:36:40 +0200 Subject: [9] RFR (XS): 8072428: Enable UseLoopCounter ergonomically if on-stack-replacement is enabled In-Reply-To: <57113950.9070700@oracle.com> References: <571107CD.8070205@oracle.com> <57113950.9070700@oracle.com> Message-ID: <57148E88.9010905@oracle.com> Thank you, Vladimir, for the review! Best regards, Zoltan On 04/15/2016 08:56 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/15/16 8:25 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8072428. >> >> https://bugs.openjdk.java.net/browse/JDK-8072428 >> >> Problem: On-stack-replacement requires loop counters; disabling loop >> counters with on-stack-replacement enabled triggers >> an assert. >> >> Solution: Set UseLoopCounter ergonomically if on-stack-replacement is >> enabled. Print warning. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8072428/webrev.00/ >> >> Tested with locally-built VM (linux_x64). >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> From edward.nevill at gmail.com Mon Apr 18 08:10:51 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Mon, 18 Apr 2016 09:10:51 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: Message-ID: <1460967051.10749.31.camel@mint> On Fri, 2016-04-15 at 20:45 +0800, Long Chen wrote: > Hi > > Please review this patch making use of DC ZVA to do block zeroing. > > http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.patch > > I?m sorry that I can?t produce a test case matching the ?clear_array? pattern showing obvious improvement. However, generating ?DC ZVA? should be the right thing to do as it usually has better cache behaviors. Besides, gcc and linux?s memset have been using ?DC ZVA?. > Hi Long, Thanks for this. I have benchmarked this on 3 different partners HW using the following JMH test case http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java On two partners HW I see a significant improvement. On one partners HW I see almost identical performance. Here are the results I get with the original normalised to 100 sec to avoid disclosing any absolute performance figures. Partner A, Original = 100 sec, revised = 100.7 sec Partner B, Original = 100 sec, revised = 97.6 sec Partner C, Original = 100 sec, revised = 91.2 sec One small improvement might be to above using a tmp register which has to be allocated here -instruct clearArray_imm_reg(immL cnt, iRegP base, Universe dummy, rFlagsReg cr) +instruct clearArray_imm_reg(immL cnt, iRegP base, iRegLNoSp tmp, Universe dummy, rFlagsReg cr) - __ zero_words($base$$Register, (u_int64_t)$cnt$$constant); + __ zero_words($base$$Register, (u_int64_t)$cnt$$constant, $tmp$$Register); by using 'lr' as the tmp register here + } else if (UseBlockZeroing && cnt >= (u_int64_t)(BlockZeroingLowLimit >> LogBytesPerWord)) { + mov(tmp, cnt); + zero_words(base, tmp, true); AFAIK, 'lr' is always available as a tmp register in C2 generated code. All the best, Ed. From zoltan.majo at oracle.com Mon Apr 18 09:22:38 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 18 Apr 2016 11:22:38 +0200 Subject: [9] RFR (XS): 8153357: C2 creates incorrect cast after eliminating phi with unique input In-Reply-To: <5711248C.7000503@oracle.com> References: <570F993C.3040509@oracle.com> <571039FC.1000603@oracle.com> <571104A9.7060208@oracle.com> <5711248C.7000503@oracle.com> Message-ID: <5714A75E.8050300@oracle.com> Hi Vladimir, On 04/15/2016 07:27 PM, Vladimir Kozlov wrote: > Looks good to me. thank you for the review! Best regards, Zoltan > > thanks, > Vladimir > > On 4/15/16 8:11 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! >> >> On 04/15/2016 02:46 AM, Vladimir Kozlov wrote: >>> I think check should use !isa_oopptr() since one of nodes could be >>> ConP NULL ptr which is not klassptr. >> >> Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8153357/webrev.01/ >> >> RBT testing passes. I did ~70 runs with the reproducer, no problems >> have shown up so far. I'll do ~900 more runs, though. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/14/16 6:21 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8153357. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153357 >>>> >>>> Problem: When determining the unique input of a phi, the C2 >>>> compiler removes cast nodes connecting the phi to its >>>> unique input. >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1181 >>>> >>>> >>>> Then (if the phi has indeed a unique input), the C2 compiler >>>> attempts replace the phi with a cast node. The new cast >>>> node feeds from the unique input. >>>> >>>> To be able to remove the phi node, the C2 compiler must to >>>> determine the type of cast to add in place of the phi >>>> node (CastII, CastPP, or CheckCastPP). >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/a76d63163758/src/share/vm/opto/cfgnode.cpp#l1705 >>>> >>>> >>>> The failure in the bug report appears because the C2 compiler adds >>>> a cast node of unexpected type to the graph (a >>>> CheckCastPP instead of a CastPP when casting between two klass >>>> pointers). >>>> >>>> Please find more details about the cause of the failure in the bug >>>> description: >>>> https://bugs.openjdk.java.net/browse/JDK-8153357?focusedCommentId=13927108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13927108 >>>> >>>> >>>> >>>> >>>> Solution: Refine C2's logic to determine the type of cast node added. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8153357/webrev.00/ >>>> >>>> Testing: >>>> - JPRT; >>>> - all hotspot compiler tests with RBT (-Xmixed, -Xcomp); >>>> - 500 non-failing runs with the reproducer (the problem reproduces >>>> with < 100 runs). >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> From nils.eliasson at oracle.com Mon Apr 18 09:39:23 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 18 Apr 2016 11:39:23 +0200 Subject: RFR(S): 8154151: VM crashes with assert "Ensure we don't compile before compilebroker init" In-Reply-To: <5711215D.4060202@oracle.com> References: <570F987B.2070202@oracle.com> <57102AB3.30709@oracle.com> <5710B6DD.9090009@oracle.com> <5710CD5E.5070103@oracle.com> <5710CF13.5090404@oracle.com> <5711215D.4060202@oracle.com> Message-ID: <5714AB4B.9070200@oracle.com> Thank you Vladimir, I have verified the test executes in JPRT. Regards, Nils On 2016-04-15 19:14, Vladimir Kozlov wrote: > Looks good. Make sure the test is executed in JPRT. > > Thanks, > Vladimir > > On 4/15/16 4:22 AM, Nils Eliasson wrote: >> Hi Tobias, >> >> Thanks for your feedback! >> >> New webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.03 >> >> Regards, >> Nils >> >> On 2016-04-15 13:15, Tobias Hartmann wrote: >>> Hi Nils, >>> >>> On 15.04.2016 11:39, Nils Eliasson wrote: >>>> Thanks Vladimir! >>>> On 2016-04-15 01:41, Vladimir Kozlov wrote: >>>>> I agree with this simple change as the fix. >>>>> Note, -Xcomp does not switch off Interpreter (we can run without >>>>> Interpreter). We use !UseInterpreter as indication >>>>> if Xcomp was used. >>>>> I don't see a PIT link in the bug report. >>>> There was none, Tobias found this regression testing something else. >>>> >>>> Now I have added a regression test: >>>> hotspot/test/compiler/startup/TieredStopAtLevel0SanityTest.java >>>> >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.02/ >>> Please set the test copyright date to 2016. I would maybe also >>> change the test summary to what you wrote in line 30 >>> ("Sanity test flag combo..") because this has nothing to do without >>> support for blocking compiles. >>> >>> Otherwise looks good to me. >>> >>> Best regards, >>> Tobias >>> >>>> Regards, >>>> Nils >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/14/16 6:17 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> Please review this fix. >>>>>> >>>>>> Summary: >>>>>> In JDK-8150646 I added an assert in compile_method that the >>>>>> compiler must not be NULL. Before there was a return >>>>>> there that just ignored the compile. >>>>>> >>>>>> Running the VM with the flag combination -Xcomp and >>>>>> -XX:TieredStopAtLevel=0 creates a special situation: >>>>>> UseInterpreter is set to false (but the interpreter it is still >>>>>> available) and then some >>>>>> essential methods are forced to be compiled, but the initial >>>>>> complevel becomes 0 and hits the assert in compileBroker. >>>>>> >>>>>> Solution: >>>>>> We could discuss if it should be allowed to submit compiles on >>>>>> level 0, a change that would become a bit larger. >>>>>> This time I choose to extend the _initalized check in >>>>>> compile_method. I didn't add any >>>>>> logging or warning because this is really a corner case. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8154151 >>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8154151/webrev.01/ >>>>>> (Ignore the extra tags in the webrev) >>>>>> >>>>>> Best regards, >>>>>> Nils Eliasson >> From nils.eliasson at oracle.com Mon Apr 18 10:24:11 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 18 Apr 2016 12:24:11 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> Message-ID: <5714B5CB.70705@oracle.com> Hi, On 2016-04-15 22:43, Christian Thalinger wrote: > >> On Apr 15, 2016, at 3:06 AM, Nils Eliasson > > wrote: >> >> Hi, >> >> On 2016-04-14 20:45, Christian Thalinger wrote: >>> >>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson >>>> wrote: >>>> >>>> I moved the reasons to CompileTask.hpp and put it together with the >>>> names list. Also changed the type from int to CompileReason as Igor >>>> suggested. >>>> >>>> It gets verbose in the method declarations in compileBroker >>> >>> Don?t worry about this. >>> >>>> and sometimes I think CompileReason should be declared in >>>> CompileBroker because it is mostly used there. On the other hand, >>>> CompileTask is the keeper of the CompileReason so it makes sense too. >>> >>> Yes, that?s the right place. >>> >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>> >>> >>> *+ bool can_become_stale() const {* >>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* >>> *+ }* >>> I?m not a fan of implicit contracts just defined by comments. This >>> method doesn?t seem to be performance critical so I would suggest to >>> use a switch-case. An attribute on the enum would be much better >>> but we all know this isn?t Java. >> >> As you suggested: >> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 > > Thanks. A space is missing and the closing } indent is wrong: > *+ bool can_become_stale() const {* > *+ switch(_compile_reason) {* > *+ case Reason_BackedgeCount:* > *+ case Reason_InvocationCount:* > *+ case Reason_Tiered:* > *+ return !_is_blocking;* > *+ }* > *+ return false;* > *+ }* > Also, what about: > *+ Reason_None,* > *+ Reason_CTW, // Compile the world* > *+ Reason_Replay, // ciReplay* > These were covered before. Reason_None - is only used for bounds checking together with Reason_Count. Reason_Replay - if these compilations can get stale we can get indeterminism in replay. Reason_CTW - CTW could silently drop compiles -> more indeterminism. Regards, Nils > >> >> Also made reasons CTW and Replay not stale-able. >> >> Thanks! >> Nils >> >>> >>>> >>>> Thanks! >>>> Nils >>>> >>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>> Very nice, I like it. >>>>> >>>>> One note. CompileReason (and its names) should be CompileTask >>>>> class where it is recorded. Then CompileTask::can_become_stale() >>>>> can be in header file so it is inlinined on all platforms. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>> Hi, >>>>>> >>>>>> New webrev: >>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>> >>>>>> >>>>>> Summary >>>>>> Introduced an enum CompileReason with members matching all the old >>>>>> variants, and a table containing all the unchanged strings. I see the >>>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>>> have choosen not to do so in this change. >>>>>> >>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>> >>>>>> Testing: >>>>>> Running Testset hotspot on all platforms and hotspot_all on one >>>>>> platform >>>>>> >>>>>> Regards, >>>>>> Nils Eliawsson >>>>>> >>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>> Tasks get evicted from the compile_queue if their invocation >>>>>>>> counter >>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>> >>>>>>>> I'll do a proper fix, it is the right thing to do and should be >>>>>>>> pretty >>>>>>>> quick. I'll change the comment to an enum that represent who >>>>>>>> submitted >>>>>>>> the compile, and add a table for the comments. This could be >>>>>>>> useful in >>>>>>>> other settings to. >>>>>>> >>>>>>> Sounds good. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nils >>>>>>>> >>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>> What do you mean "stale"? >>>>>>>>> I would prefer to see the real fix as you suggested to avoid >>>>>>>>> removing >>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>> >>>>>>>>>> Summary: >>>>>>>>>> Add method enqueued for compilation with WB API may be >>>>>>>>>> removed from >>>>>>>>>> the compile queue as stale. >>>>>>>>>> >>>>>>>>>> Solution: >>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>>>> >>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>> task with info about the origin of the compile. The comment >>>>>>>>>> field has >>>>>>>>>> this information - but then it needs to be >>>>>>>>>> converted to an enum. >>>>>>>>>> >>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Nils Eliasson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ENOMIKI at jp.ibm.com Mon Apr 18 10:36:39 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Mon, 18 Apr 2016 10:36:39 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604181036.u3IAarDQ018409@d19av08.sagamino.japan.ibm.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest1.java Type: application/octet-stream Size: 3177 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 32237 bytes Desc: not available URL: From nils.eliasson at oracle.com Mon Apr 18 11:24:00 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 18 Apr 2016 13:24:00 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <57112880.1010204@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> Message-ID: <5714C3D0.2070804@oracle.com> Resizeable is better, but then we assert on expanding the stringbuffer while being under a different ResourceMark. Regards, Nils On 2016-04-15 19:44, Vladimir Kozlov wrote: > Use resizable stream: > > stringStream(size_t initial_bufsize = 256); > > 1024 may not be enough. > > Thanks, > Vladimir > > On 4/15/16 8:10 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this fix of print opto_assembly. >> >> Summary: >> The compilelog can get corrupted and the VM may assert on "failed: >> bad tag in log". >> >> When printing assembly in output.cpp we first take the ttylock, print >> the head and then the method metadata. However the >> metadata printing makes a vm entry and may block for a safepoint and >> will then release the lock >> (break_tty_lock_for_safepoint). After that some of the other compiler >> thread that haven't safepointed will take the lock >> and the broken log will be a fact when the safepoint is over and the >> first thread starts logging again. >> >> Solution: >> Print the method metadata to a temporary buffer, then take the tty lock. >> >> Testing: >> Repro from bug stops failing. >> Running :hotspot_all >> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >> >> Regards, >> Nils Eliasson From aph at redhat.com Mon Apr 18 11:45:10 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Apr 2016 12:45:10 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <1460967051.10749.31.camel@mint> References: <1460967051.10749.31.camel@mint> Message-ID: <5714C8C6.3030302@redhat.com> On 04/18/2016 09:10 AM, Edward Nevill wrote: > I have benchmarked this on 3 different partners HW using the following JMH test case > > http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java This isn't a great test for block zeroing. Nevertheless, I have approved this patch, with a few alterations. A note about using jmh, not just for you but for everyone working on this project. Don't do something like for (int i = 0; i < 10000; i++) theTest(); as an attempt to pad out the execution time. jmh is much better at this sort of thing than you are. Just put the code you're trying to test in the test case. Andrew. From aph at redhat.com Mon Apr 18 12:55:12 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Apr 2016 13:55:12 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: Message-ID: <5714D930.4090804@redhat.com> One other thing. This is rather a lot of code to emit every time an array is created: ;; zero_words { 0x0000007fa880f5f0: cmp x11, #0x20 0x0000007fa880f5f4: b.lt 0x0000007fa880f62c 0x0000007fa880f5f8: neg x8, x10 0x0000007fa880f5fc: and x8, x8, #0x7f 0x0000007fa880f600: cbz x8, 0x0000007fa880f614 0x0000007fa880f604: sub x11, x11, x8, asr #3 0x0000007fa880f608: sub x8, x8, #0x8 0x0000007fa880f60c: str xzr, [x10],#8 0x0000007fa880f610: cbnz x8, 0x0000007fa880f608 0x0000007fa880f614: sub x11, x11, #0x10 0x0000007fa880f618: dc zva, x10 0x0000007fa880f61c: subs x11, x11, #0x10 0x0000007fa880f620: add x10, x10, #0x80 0x0000007fa880f624: b.ge 0x0000007fa880f618 0x0000007fa880f628: add x11, x11, #0x10 0x0000007fa880f62c: and x8, x11, #0x7 I don't think this CBZ does anything useful: 0x0000007fa880f630: cbz x8, 0x0000007fa880f670 (I'm assuming that the 0-7 cases are uniformly distributed.) 0x0000007fa880f634: sub x11, x11, x8 0x0000007fa880f638: add x10, x10, x8, lsl #3 0x0000007fa880f63c: adr x9, 0x0000007fa880f670 0x0000007fa880f640: sub x9, x9, x8, lsl #2 0x0000007fa880f644: br x9 0x0000007fa880f648: add x10, x10, #0x40 0x0000007fa880f64c: sub x11, x11, #0x8 0x0000007fa880f650: stur xzr, [x10,#-64] 0x0000007fa880f654: stur xzr, [x10,#-56] 0x0000007fa880f658: stur xzr, [x10,#-48] 0x0000007fa880f65c: stur xzr, [x10,#-40] 0x0000007fa880f660: stur xzr, [x10,#-32] 0x0000007fa880f664: stur xzr, [x10,#-24] 0x0000007fa880f668: stur xzr, [x10,#-16] 0x0000007fa880f66c: stur xzr, [x10,#-8] 0x0000007fa880f670: cbnz x11, 0x0000007fa880f648 ;; } zero_words We could think about moving the large block case into a stub which is emitted after the main body of the method, or even into a shared stub. A shared stub would require the args to be in fixed registers, though. Andrew. From ENOMIKI at jp.ibm.com Sun Apr 17 18:28:01 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Sun, 17 Apr 2016 18:28:01 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604171828.u3HIS9u0012295@d19av06.sagamino.japan.ibm.com> Dear all, Could you please review the following change? I created two patches for generate_disjoint_long_copy with VMX(Vector Multimedia Extension) and VSX(Vector-Scalar Extension). Let me share our performance results. I changed array copy size with aligned (= src and dst alignments match) and unaligned. It means that I measured performance with the following four patterns at a time. Long array is 8 byte alignment, so these patterns will cover align and unaligned case. System.arraycopy(src, 0, dst, 0, size); System.arraycopy(src, 0, dst, 1, size); System.arraycopy(src, 1, dst, 0, size); System.arraycopy(src, 1, dst, 1, size); VMX(max), VSX(max) are aligned score, while VMX(min),VSX(min) are unaligned score. Scalar is original OpenJDK. VSX got better performance when array size is less than about 2048 byte, but VSX(min) got worse than VMX in large array size. It would be overhead of the alignment in VSX. Server: 8247-22L (POWER8 (3.3GHz 12 cores) x2, 512GB memory), Ubuntu Linux 15.04 ppc64LE (kernel: 3.19.0-18-generic), OpenJDK (build based on 1.9), JVMARGS: ?-Xmx40g ?Xms40g -Xmn20g" Here are benchmark code and patch files. In the VMX, it is implemented for ppc LE only now. (generated with "hg diff -g" under the latest hotspot directory.) Related links: "8154156: PPC64: improve array copy stubs by using vector instructions" https://bugs.openjdk.java.net/browse/JDK-8154156 "PPC64 VSX load/store instructions in stubs" http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-April/002419.html Regards, Miki + + + + + + + Miki ENOKI, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 47807 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest1.java Type: application/octet-stream Size: 3177 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: From ENOMIKI at jp.ibm.com Mon Apr 18 07:47:45 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Mon, 18 Apr 2016 07:47:45 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604180747.u3I7lwYA004525@d19av08.sagamino.japan.ibm.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1460965465853.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: From ENOMIKI at jp.ibm.com Mon Apr 18 10:19:27 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Mon, 18 Apr 2016 10:19:27 +0000 Subject: PPC64 VMX/VSX array copy stubs Message-ID: <201604181019.u3IAJh3E008402@d19av08.sagamino.japan.ibm.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest1.java Type: application/octet-stream Size: 3177 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vmx.diff Type: application/octet-stream Size: 7923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64le_vsx.diff Type: application/octet-stream Size: 7001 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result.jpg Type: image/jpeg Size: 62057 bytes Desc: not available URL: From vladimir.kozlov at oracle.com Mon Apr 18 17:30:38 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 10:30:38 -0700 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <5714C3D0.2070804@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> Message-ID: <571519BE.605@oracle.com> tty would have the same problem but it use C_HEAP to allocate: defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) defaultStream(); Please, look if you can do something similar. Thanks, Vladimir On 4/18/16 4:24 AM, Nils Eliasson wrote: > Resizeable is better, but then we assert on expanding the stringbuffer > while being under a different ResourceMark. > > Regards, > Nils > > On 2016-04-15 19:44, Vladimir Kozlov wrote: >> Use resizable stream: >> >> stringStream(size_t initial_bufsize = 256); >> >> 1024 may not be enough. >> >> Thanks, >> Vladimir >> >> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this fix of print opto_assembly. >>> >>> Summary: >>> The compilelog can get corrupted and the VM may assert on "failed: >>> bad tag in log". >>> >>> When printing assembly in output.cpp we first take the ttylock, print >>> the head and then the method metadata. However the >>> metadata printing makes a vm entry and may block for a safepoint and >>> will then release the lock >>> (break_tty_lock_for_safepoint). After that some of the other compiler >>> thread that haven't safepointed will take the lock >>> and the broken log will be a fact when the safepoint is over and the >>> first thread starts logging again. >>> >>> Solution: >>> Print the method metadata to a temporary buffer, then take the tty lock. >>> >>> Testing: >>> Repro from bug stops failing. >>> Running :hotspot_all >>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>> >>> Regards, >>> Nils Eliasson > From vivek.r.deshpande at intel.com Mon Apr 18 17:38:14 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 18 Apr 2016 17:38:14 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Apr 18 18:25:52 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 08:25:52 -1000 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: Looks good. > On Apr 15, 2016, at 6:20 PM, Berg, Michael C wrote: > > Vladimir/Christian: > > I believe I have addressed all concerns in this update: > > Webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ > > Regards, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C > Sent: Friday, April 15, 2016 2:04 AM > To: 'Vladimir Kozlov' > Cc: 'hotspot-compiler-dev at openjdk.java.net' > Subject: RE: CR for RFR 8153998 > > Vladimir, the code has been updated and is available at: > > webrev: > http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ > > Thanks, > Michael > > -----Original Message----- > From: Berg, Michael C > Sent: Thursday, April 14, 2016 5:54 PM > To: Vladimir Kozlov > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: CR for RFR 8153998 > > Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. > It will be clean when next you see the code. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, April 14, 2016 5:52 PM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: CR for RFR 8153998 > > On 4/14/16 5:12 PM, Berg, Michael C wrote: >> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. > > How it is sizeless when it generates kmovwl() instruction? > Do you mean it does not have side effects (no flags modified)? > > Vladimir > >> >> Ok, I will try the pattern match method. >> >> Thanks >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:02 PM >> To: Berg, Michael C ; Christian Thalinger >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>> Vladimir, >>> >>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>> >>> I tried something like that early on with CountedLoopEnd. >> >> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >> I don't see any side effects for restoremask in your code. What are you talking about? >> >> I am suggesting something like next: >> >> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >> predicate(n->has_vect_mask_set()); >> match(CountedLoopEnd cop cr); >> effect(USE labl); >> >> ins_cost(400); >> format %{ "j$cop $labl\t# loop end\n\t" >> "restoremask \t# vector mask restore for loops" >> %} >> ins_encode %{ >> Label* L = $labl$$label; >> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >> __ restoremask(); >> %} >> ins_pipe(pipe_jcc); >> %} >> >> Vladimir >> >>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 4:27 PM >>> To: Christian Thalinger ; Berg, >>> Michael C >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>> >>>>> Christian, >>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>> >>>> That?s unfortunate but I understand. I?m fine with it then. >>> >>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>> >>> Vladimir >>> >>>> >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>> > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>> See below for context. >>>>> Regards, >>>>> Michael >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>> *To:*Berg, Michael C > >>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>> *Subject:*Re: CR for RFR 8153998 >>>>> >>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>> Hi Folks, >>>>> >>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>> This code was tested as follows(see jbs entry below): >>>>> >>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>> >>>>> webrev: >>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>> >>>>> >>>>> +//------------------------------MachMskNode----------------------- >>>>> +- >>>>> +- >>>>> ---------- >>>>> >>>>> +// Machine function Msk Node >>>>> >>>>> +class MachMskNode : public MachIdealNode { >>>>> >>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>> Ok, that?s easy enough. >>>>> Also, I don?t quite understand why we have: >>>>> >>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>> >>>>> + predicate(VM_Version::supports_avx512vl()); >>>>> >>>>> + match(Set dst (MaskCreateI src)); >>>>> >>>>> + effect(TEMP dst); >>>>> >>>>> + format %{ "createmsk $dst, $src" %} >>>>> >>>>> + ins_encode %{ >>>>> >>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>> >>>>> + %} >>>>> >>>>> but: >>>>> >>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>> const { >>>>> >>>>> + MacroAssembler _masm(&cbuf); >>>>> >>>>> + __ restoremsk(); >>>>> >>>>> + } >>>>> >>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>> >>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>> >>>>> Thanks, >>>>> Michael >>>> From vladimir.kozlov at oracle.com Mon Apr 18 18:32:53 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 11:32:53 -0700 Subject: CR for RFR 8153998 In-Reply-To: References: <90DCD317-C0C4-487A-AB63-5461986FC7AD@oracle.com> <57102743.8080508@oracle.com> <57102F7D.2090303@oracle.com> <57103B20.1040207@oracle.com> Message-ID: <57152855.2090106@oracle.com> Testing looks good. I will push it after few other pushes currently in a queue. Thanks, Vladimir On 4/18/16 11:25 AM, Christian Thalinger wrote: > Looks good. > >> On Apr 15, 2016, at 6:20 PM, Berg, Michael C wrote: >> >> Vladimir/Christian: >> >> I believe I have addressed all concerns in this update: >> >> Webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.04/ >> >> Regards, >> Michael >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C >> Sent: Friday, April 15, 2016 2:04 AM >> To: 'Vladimir Kozlov' >> Cc: 'hotspot-compiler-dev at openjdk.java.net' >> Subject: RE: CR for RFR 8153998 >> >> Vladimir, the code has been updated and is available at: >> >> webrev: >> http://cr.openjdk.java.net/~mcberg/8153998/webrev.03a/ >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: Berg, Michael C >> Sent: Thursday, April 14, 2016 5:54 PM >> To: Vladimir Kozlov >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: RE: CR for RFR 8153998 >> >> Ok, now we understand one another, I have added a size calc for the new pattern match which will accurately map the emit size for the 32bit and 64bit add files for CountedLoopEnd. >> It will be clean when next you see the code. >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, April 14, 2016 5:52 PM >> To: Berg, Michael C >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: CR for RFR 8153998 >> >> On 4/14/16 5:12 PM, Berg, Michael C wrote: >>> The restore mark is sizeless, it restores to the know global configuration value for k1 which is used automatically in all of code gen. >> >> How it is sizeless when it generates kmovwl() instruction? >> Do you mean it does not have side effects (no flags modified)? >> >> Vladimir >> >>> >>> Ok, I will try the pattern match method. >>> >>> Thanks >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, April 14, 2016 5:02 PM >>> To: Berg, Michael C ; Christian Thalinger >>> >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: CR for RFR 8153998 >>> >>> On 4/14/16 4:38 PM, Berg, Michael C wrote: >>>> Vladimir, >>>> >>>> Multiversion copies the post rce'd loop after we check, and we check again before we mark the post loop for mask vector optimization after superword succeeds on the main loop, so we are all good. The mask version of the post loop is always clean when we apply the optimization. >>>> >>>> I tried something like that early on with CountedLoopEnd. >>> >>> In the mach version with restore mark you should specify correct size(X) or don't specify at all to calculate it dynamically (done automatically). >>> I don't see any side effects for restoremask in your code. What are you talking about? >>> >>> I am suggesting something like next: >>> >>> instruct jmpLoopEnd_and_restoreMask(cmpOp cop, rFlagsReg cr, label labl) %{ >>> predicate(n->has_vect_mask_set()); >>> match(CountedLoopEnd cop cr); >>> effect(USE labl); >>> >>> ins_cost(400); >>> format %{ "j$cop $labl\t# loop end\n\t" >>> "restoremask \t# vector mask restore for loops" >>> %} >>> ins_encode %{ >>> Label* L = $labl$$label; >>> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); // Always long jump >>> __ restoremask(); >>> %} >>> ins_pipe(pipe_jcc); >>> %} >>> >>> Vladimir >>> >>>> The problem is that the resultant jump has an expected size counter for jcc, so you could not do it during emit. You would still have to add the side effect much like what I did. I would be adding a flag to node when we don't need one. What would like to do then, process via flag or how I do it now? We would basically be doing it in the same place. >>>> >>>> -Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Thursday, April 14, 2016 4:27 PM >>>> To: Christian Thalinger ; Berg, >>>> Michael C >>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: CR for RFR 8153998 >>>> >>>> On 4/14/16 3:35 PM, Christian Thalinger wrote: >>>>> >>>>>> On Apr 14, 2016, at 8:44 AM, Berg, Michael C > wrote: >>>>>> >>>>>> Christian, >>>>>> There is but I would have to anchor it via a gpr register, instead I am treating it like nop emits (yet another side effect), with some guidance for placement. >>>>> >>>>> That?s unfortunate but I understand. I?m fine with it then. >>>> >>>> You can try to generate restore mask on loop exit in mach instruction for CountedLoopEnd node with predicate(n->has_vect_mask_set()). has_vect_mask_set flag should be set in CountedLoopEnd node after creation of CreateMaskI node. Note, CreateMaskI will be only generated for clean counted post loop with on vector iteration - there should not be any other branches there. >>>> >>>> Vladimir >>>> >>>>> >>>>>> Regards, >>>>>> Michael >>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>>> *Sent:*Thursday, April 14, 2016 11:20 AM *To:*Berg, Michael C >>>>>> > >>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>>> >>>>>> *Subject:*Re: CR for RFR 8153998 >>>>>> >>>>>> On Apr 13, 2016, at 11:35 AM, Berg, Michael C > wrote: >>>>>> See below for context. >>>>>> Regards, >>>>>> Michael >>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>>> *Sent:*Wednesday, April 13, 2016 2:08 PM >>>>>> *To:*Berg, Michael C > >>>>>> *Cc:*hotspot-compiler-dev at openjdk.java.net >>>>>> *Subject:*Re: CR for RFR 8153998 >>>>>> >>>>>> On Apr 12, 2016, at 8:26 PM, Berg, Michael C > wrote: >>>>>> Hi Folks, >>>>>> >>>>>> I would like to contribute Programmable SIMD as implemented on multi-versioned post loops. See:https://bugs.openjdk.java.net/browse/JDK-8151573for the first half of the implementation. >>>>>> This component delivers mask programmed post loops which execute in a single iteration in place of fixup scalar loops which used to take many iterations to complete work for user loops. >>>>>> Currently I have enabled this optimization for x86 only, specifically for machines with masked data predication implemented as per fully enabled EVEX targets. It delivers up to 2x >>>>>> performance and has been modeled over a large number of loop lengths and forms of loops. >>>>>> This code was tested as follows(see jbs entry below): >>>>>> >>>>>> Bug-id:https://bugs.openjdk.java.net/browse/JDK-8153998 >>>>>> >>>>>> webrev: >>>>>> http://cr.openjdk.java.net/~mcberg/8153998/webrev.01a/ >>>>>> >>>>>> >>>>>> +//------------------------------MachMskNode----------------------- >>>>>> +- >>>>>> +- >>>>>> ---------- >>>>>> >>>>>> +// Machine function Msk Node >>>>>> >>>>>> +class MachMskNode : public MachIdealNode { >>>>>> >>>>>> Does ?Msk? mean mask? Then we should call it MachMaskNode. >>>>>> Ok, that?s easy enough. >>>>>> Also, I don?t quite understand why we have: >>>>>> >>>>>> +instruct set_mask(rRegI dst, rRegI src) %{ >>>>>> >>>>>> + predicate(VM_Version::supports_avx512vl()); >>>>>> >>>>>> + match(Set dst (MaskCreateI src)); >>>>>> >>>>>> + effect(TEMP dst); >>>>>> >>>>>> + format %{ "createmsk $dst, $src" %} >>>>>> >>>>>> + ins_encode %{ >>>>>> >>>>>> + __ createmsk($dst$$Register, $src$$Register); >>>>>> >>>>>> + %} >>>>>> >>>>>> but: >>>>>> >>>>>> + void MachMskNode::emit(CodeBuffer &cbuf, PhaseRegAlloc*) >>>>>> const { >>>>>> >>>>>> + MacroAssembler _masm(&cbuf); >>>>>> >>>>>> + __ restoremsk(); >>>>>> >>>>>> + } >>>>>> >>>>>> The reason: Currently k registers or mask registers are not allocated, meaning we have to treat their usage as side effects. >>>>>> The case with set_mask is to take our remaining iterations as an index and provide a mask used in k1 that is applicable to its post loop. >>>>>> The subsequent restore, preplaces the default value back into k1. The set_mask rule is posed in such a way that it ensures that the side effect value will survive optimization. >>>>>> The restore is fully a side effect with no produced definition in rule space as mask registers are not formal definitions. >>>>>> >>>>>> Hmm. So, there is no way we can have a RestoreMaskINode? >>>>>> >>>>>> Thanks, >>>>>> Michael >>>>> > From christian.thalinger at oracle.com Mon Apr 18 18:34:40 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 08:34:40 -1000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> > On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R wrote: > > Hi all > > I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. > This uses -XX:DisableIntrinsic option to achieve the same. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ src/cpu/x86/vm/stubGenerator_x86_64.cpp + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { StubRoutines::_dpow = generate_libmPow(); - StubRoutines::_dtan = generate_libmTan(); + } Was removing libmTan on purpose? > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Mon Apr 18 18:38:52 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 18 Apr 2016 18:38:52 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> Hi Christian I have added this. Just moved generate_libmTan() after sin and cos generation. if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { StubRoutines::_dtan = generate_libmTan(); } Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, April 18, 2016 11:35 AM To: Deshpande, Vivek R Cc: hotspot compiler ; Vladimir Kozlov ; Viswanathan, Sandhya Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ src/cpu/x86/vm/stubGenerator_x86_64.cpp + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { StubRoutines::_dpow = generate_libmPow(); - StubRoutines::_dtan = generate_libmTan(); + } Was removing libmTan on purpose? Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Apr 18 19:15:06 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 09:15:06 -1000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> Message-ID: > On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R wrote: > > Hi Christian > > I have added this. Just moved generate_libmTan() after sin and cos generation. > if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { > StubRoutines::_dtan = generate_libmTan(); > } > Sorry, I missed this. I should have used the browser?s search instead of eyeballing it. src/cpu/x86/vm/macroAssembler_x86.cpp fp_runtime_fallback is unused now: cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/ hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp 5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) { 5828: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use); 5833: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use); 5838: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use); hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp 995: void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use); src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp - __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); + mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)); I understand what it?s doing but we are calling the same methods as before. What has changed? > Regards, <> > Vivek > > <>From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Monday, April 18, 2016 11:35 AM > To: Deshpande, Vivek R > Cc: hotspot compiler ; Vladimir Kozlov ; Viswanathan, Sandhya > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > > On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: > > Hi all > > I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. > This uses -XX:DisableIntrinsic option to achieve the same. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > src/cpu/x86/vm/stubGenerator_x86_64.cpp > > + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { > StubRoutines::_dpow = generate_libmPow(); > - StubRoutines::_dtan = generate_libmTan(); > + } > Was removing libmTan on purpose? > > > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Mon Apr 18 20:28:08 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 18 Apr 2016 20:28:08 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com> Hi Christian Just calling SharedRuntime function kills the address in memory where there is jump (shown below) after the routine finishes and also need to make sure stack pointer is 16 byte aligned. So calling mathfunc() to take care of that instead of fp_runtime_fallback() which has extra overhead of storing/ restoring all the registers and xmm registers. 444 __ pop(rax); 445 __ mov(rsp, r13); 446 __ jmp(rax); Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, April 18, 2016 12:15 PM To: Deshpande, Vivek R Cc: Vladimir Kozlov; hotspot compiler Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R > wrote: Hi Christian I have added this. Just moved generate_libmTan() after sin and cos generation. if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { StubRoutines::_dtan = generate_libmTan(); } Sorry, I missed this. I should have used the browser?s search instead of eyeballing it. src/cpu/x86/vm/macroAssembler_x86.cpp fp_runtime_fallback is unused now: cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/ hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp 5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) { 5828: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use); 5833: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use); 5838: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use); hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp 995: void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use); src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp - __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); + mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)); I understand what it?s doing but we are calling the same methods as before. What has changed? Regards, Vivek From: Christian Thalinger [mailto:christian.thalinger at oracle.com] Sent: Monday, April 18, 2016 11:35 AM To: Deshpande, Vivek R > Cc: hotspot compiler >; Vladimir Kozlov >; Viswanathan, Sandhya > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ src/cpu/x86/vm/stubGenerator_x86_64.cpp + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { StubRoutines::_dpow = generate_libmPow(); - StubRoutines::_dtan = generate_libmTan(); + } Was removing libmTan on purpose? Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From jan.civlin at intel.com Mon Apr 18 21:41:17 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Mon, 18 Apr 2016 21:41:17 +0000 Subject: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Message-ID: <39F83597C33E5F408096702907E6C4500F16BF0A@ORSMSX104.amr.corp.intel.com> We would like to contribute the SHA256 AVX2 intrinsic. This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. Contributor: Jan Civlin. bug: https://bugs.openjdk.java.net/browse/JDK-8154495 webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ From jan.civlin at intel.com Mon Apr 18 21:44:26 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Mon, 18 Apr 2016 21:44:26 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Message-ID: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> == Correction in the subject line === We would like to contribute the SHA256 AVX2 intrinsic. This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. Contributor: Jan Civlin. bug: https://bugs.openjdk.java.net/browse/JDK-8154495 webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ From vladimir.kozlov at oracle.com Tue Apr 19 00:09:10 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Apr 2016 17:09:10 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> Message-ID: <57157726.4030701@oracle.com> Hi Jan, The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. Please, move new code in macroAssembler_x86_sha.cpp to the end of file. _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: StubRoutines::x86::_k256_W_adr = generate_k256_W(); What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. Thanks, Vladimir On 4/18/16 2:44 PM, Civlin, Jan wrote: > == Correction in the subject line === > > We would like to contribute the SHA256 AVX2 intrinsic. > > This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. > > The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. > > Contributor: Jan Civlin. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8154495 > webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ > From christian.thalinger at oracle.com Tue Apr 19 04:33:13 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 18 Apr 2016 18:33:13 -1000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <6517B0F3-8CDE-4FB8-B318-BE50898BA231@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A802A2@ORSMSX106.amr.corp.intel.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A80394@ORSMSX106.amr.corp.intel.com> Message-ID: > On Apr 18, 2016, at 10:28 AM, Deshpande, Vivek R wrote: > > Hi Christian > > Just calling SharedRuntime function kills the address in memory where there is jump (shown below) after the routine finishes and also need to make sure stack pointer is 16 byte aligned. So calling mathfunc() to take care of that instead of fp_runtime_fallback() which has extra overhead of storing/ restoring all the registers and xmm registers. > > 444 __ pop(rax); > 445 __ mov(rsp, r13); > 446 __ jmp(rax); > Ok, that makes sense. > Regards, > Vivek > > From: Christian Thalinger [mailto:christian.thalinger at oracle.com] > Sent: Monday, April 18, 2016 12:15 PM > To: Deshpande, Vivek R > Cc: Vladimir Kozlov; hotspot compiler > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > > On Apr 18, 2016, at 8:38 AM, Deshpande, Vivek R > wrote: > > Hi Christian > > I have added this. Just moved generate_libmTan() after sin and cos generation. > if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dtan)) { > StubRoutines::_dtan = generate_libmTan(); > } > > > Sorry, I missed this. I should have used the browser?s search instead of eyeballing it. > src/cpu/x86/vm/macroAssembler_x86.cpp > > fp_runtime_fallback is unused now: > > cthaling at macbook:~/ws/jdk9/hs-comp$ ack fp_runtime_fallback hotspot/src/ > hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp > 5625:void MacroAssembler::fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use) { > 5828: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 1, num_fpu_regs_in_use); > 5833: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 1, num_fpu_regs_in_use); > 5838: fp_runtime_fallback(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 1, num_fpu_regs_in_use); > > hotspot/src/cpu/x86/vm/macroAssembler_x86.hpp > 995: void fp_runtime_fallback(address runtime_entry, int nb_args, int num_fpu_regs_in_use); > > > src/cpu/x86/vm/templateInterpreterGenerator_x86_64.cpp > > - __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, SharedRuntime::dexp))); > + mathfunc(CAST_FROM_FN_PTR(address, SharedRuntime::dexp)); > I understand what it?s doing but we are calling the same methods as before. What has changed? > > > Regards, <> > Vivek > > <>From: Christian Thalinger [mailto:christian.thalinger at oracle.com ] > Sent: Monday, April 18, 2016 11:35 AM > To: Deshpande, Vivek R > > Cc: hotspot compiler >; Vladimir Kozlov >; Viswanathan, Sandhya > > Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics > > > On Apr 18, 2016, at 7:38 AM, Deshpande, Vivek R > wrote: > > Hi all > > I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. > This uses -XX:DisableIntrinsic option to achieve the same. > Could you please review and sponsor this patch. > > Bug-id: > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > src/cpu/x86/vm/stubGenerator_x86_64.cpp > > + if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dpow)) { > StubRoutines::_dpow = generate_libmPow(); > - StubRoutines::_dtan = generate_libmTan(); > + } > Was removing libmTan on purpose? > > > > > Thanks and regards, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Tue Apr 19 08:46:45 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 19 Apr 2016 10:46:45 +0200 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: Hi Vivek, you introduce the new method TemplateInterpreterGenerator::mathfunc() but only implement it on x86_64. Shouldn't we have at least empty implementations of this method for all architectures? Also the description in the bug sounds quite general but you only seem to implement it for certain math-intrinsics on x64. Another minor nit: in vmSymbols.hpp I don't think we need the const qualifier on the ID argument because it is only an enum anyway: + static bool is_disabled_by_flags(const vmIntrinsics::ID id); It makes sense on: static bool is_disabled_by_flags(const methodHandle& method); because here we are passing method by reference and the const qualifier guaranties that is_disabled_by_flags will not change the method. Regards, Volker On Mon, Apr 18, 2016 at 7:38 PM, Deshpande, Vivek R wrote: > Hi all > > > > I would like to contribute a patch which helps to control the intrinsics in > interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > > > Thanks and regards, > > Vivek > > From rwestrel at redhat.com Tue Apr 19 11:44:35 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 19 Apr 2016 13:44:35 +0200 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted Message-ID: <57161A23.3050807@redhat.com> (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << -shift) with src an int have some support in the aarch64.ad ad file: rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken and never match any ideal graph subtree. http://cr.openjdk.java.net/~roland/8154537/webrev.00/ Roland. From tobias.hartmann at oracle.com Tue Apr 19 12:35:43 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 19 Apr 2016 14:35:43 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC Message-ID: <5716261F.1070205@oracle.com> Hi, please review the following enhancement: https://bugs.openjdk.java.net/browse/JDK-6941938 MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. I evaluated the following three versions of the patch. -- Basic -- http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png Version "small" tries to improve this. -- Prefetching -- http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. -- Small -- http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). The numbers can be found here: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. What do you think? Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. Thanks, Tobias [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip [3] Microbenchmark results for the "basic" implementation http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png [4] Microbenchmark results for the "prefetching" implementation http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png From aph at redhat.com Tue Apr 19 12:52:50 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 19 Apr 2016 13:52:50 +0100 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57161A23.3050807@redhat.com> References: <57161A23.3050807@redhat.com> Message-ID: <57162A22.2050706@redhat.com> On 04/19/2016 12:44 PM, Roland Westrelin wrote: > (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << > -shift) with src an int have some support in the aarch64.ad ad file: > rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken > and never match any ideal graph subtree. > > http://cr.openjdk.java.net/~roland/8154537/webrev.00/ OK, thanks. We'll need backports for http://hg.openjdk.java.net/aarch64-port/jdk8u/ and http://hg.openjdk.java.net/aarch64-port/jdk7u/ These should just apply cleanly. Andrew. From nils.eliasson at oracle.com Tue Apr 19 12:54:32 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 19 Apr 2016 14:54:32 +0200 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: <57162A88.7030608@oracle.com> Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: > > Hi all > > I would like to contribute a patch which helps to control the > intrinsics in interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > Thanks and regards, > > Vivek > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adinn at redhat.com Tue Apr 19 12:55:19 2016 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 19 Apr 2016 13:55:19 +0100 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57162A22.2050706@redhat.com> References: <57161A23.3050807@redhat.com> <57162A22.2050706@redhat.com> Message-ID: <57162AB7.6010602@redhat.com> On 19/04/16 13:52, Andrew Haley wrote: > On 04/19/2016 12:44 PM, Roland Westrelin wrote: >> (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << >> -shift) with src an int have some support in the aarch64.ad ad file: >> rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken >> and never match any ideal graph subtree. >> >> http://cr.openjdk.java.net/~roland/8154537/webrev.00/ > > OK, thanks. We'll need backports for > http://hg.openjdk.java.net/aarch64-port/jdk8u/ and > http://hg.openjdk.java.net/aarch64-port/jdk7u/ Patch also looks good to me. regards, Andrew Dinn ----------- From aph at redhat.com Tue Apr 19 13:19:31 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 19 Apr 2016 14:19:31 +0100 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: References: <5714D930.4090804@redhat.com> Message-ID: <57163063.3020506@redhat.com> On 04/19/2016 01:54 PM, Long Chen wrote: > Thanks for all these nice comments. Here is a revised version: > > http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch > > > Changes: > > 1. Are DC and IC really synonyms? > > DC and IC assembling was supposed to be distinguished by different > cache_maintenance parameters. I create two enums ?icache_maintanence? and > ?dcache_maintanence? in the revised patch, to make it look better. > > + enum icache_maintenance {IVAU = 0b0101}; > + enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110, > ZVA = 0b100}; > + void dc(dcache_maintenance cm, Register Rt) { > + sys(0b011, 0b0111, cm, 0b001, Rt); > + } > + > + void ic(icache_maintenance cm, Register Rt) { > + sys(0b011, 0b0111, cm, 0b001, Rt); > } That looks better, yes. > 5. To avoid scratching a new register, I write a small piece of code > after the dc zva loop in block_zero, so that block_zero doesn?t need to > fall through to fill_words to zero the small part of array. This code might > not perform as good as fill_words (unrolled), but it requires one less > register, and the code size becomes smaller as well. > The final code is like this: > > 0x0000007f7d3dd4fc: cmp x11, #0x20 > 0x0000007f7d3dd500: b.lt 0x0000007f7d3dd538 > 0x0000007f7d3dd504: neg x8, x10 > 0x0000007f7d3dd508: and x8, x8, #0x3f > 0x0000007f7d3dd50c: cbz x8, 0x0000007f7d3dd520 > 0x0000007f7d3dd510: sub x11, x11, x8, asr #3 > 0x0000007f7d3dd514: sub x8, x8, #0x8 > 0x0000007f7d3dd518: str xzr, [x10],#8 > 0x0000007f7d3dd51c: cbnz x8, 0x0000007f7d3dd514 > 0x0000007f7d3dd520: sub x11, x11, #0x8 > 0x0000007f7d3dd524: dc zva, x10 > 0x0000007f7d3dd528: subs x11, x11, #0x8 > 0x0000007f7d3dd52c: add x10, x10, #0x40 > 0x0000007f7d3dd530: b.ge 0x0000007f7d3dd524 > 0x0000007f7d3dd534: add x11, x11, #0x8 > 0x0000007f7d3dd538: tbz w11, #0, 0x0000007f7d3dd544 > 0x0000007f7d3dd53c: str xzr, [x10],#8 > 0x0000007f7d3dd540: sub x11, x11, #0x1 > 0x0000007f7d3dd544: cbz x11, 0x0000007f7d3dd554 > 0x0000007f7d3dd548: sub x11, x11, #0x2 > 0x0000007f7d3dd54c: stp xzr, xzr, [x10],#16 > 0x0000007f7d3dd550: cbnz x11, 0x0000007f7d3dd548 > > Would this be fine? It might well be. I'd like Ed to do a few measurements of large and small block zeroing. My guess is that a reasonably small unrolled loop doing STP ZR, ZR will work better than anything else, but we'll see. Thanks, Andrew. From vladimir.kozlov at oracle.com Tue Apr 19 16:06:36 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Apr 2016 09:06:36 -0700 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <5716261F.1070205@oracle.com> References: <5716261F.1070205@oracle.com> Message-ID: <5716578C.5080902@oracle.com> Very good. Go with basic. We can do SPU special improvements later if needed. "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." We do have arraycopy code for it but by default we don't use it: product(uintx, ArraycopySrcPrefetchDistance, 0, product(uintx, ArraycopyDstPrefetchDistance, 0, Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. Thanks, Vladimir On 4/19/16 5:35 AM, Tobias Hartmann wrote: > Hi, > > please review the following enhancement: > https://bugs.openjdk.java.net/browse/JDK-6941938 > > MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). > > I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. > > We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). > > Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. > > I evaluated the following three versions of the patch. > > -- Basic -- > http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ > The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png > > I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. > > There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png > Version "small" tries to improve this. > > -- Prefetching -- > http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ > This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png > > However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. > > -- Small -- > http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ > This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). > > The numbers can be found here: > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx > > I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. > > What do you think? > > Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java > [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip > [3] Microbenchmark results for the "basic" implementation > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png > [4] Microbenchmark results for the "prefetching" implementation > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png > From vladimir.kozlov at oracle.com Tue Apr 19 15:55:19 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Apr 2016 08:55:19 -0700 (PDT) Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57161A23.3050807@redhat.com> References: <57161A23.3050807@redhat.com> Message-ID: <571654E7.5020404@oracle.com> Looks good. thanks, Vladimir On 4/19/16 4:44 AM, Roland Westrelin wrote: > (src >>> shift) | (src << (32 - shift)) and (src >>> shift) | (src << > -shift) with src an int have some support in the aarch64.ad ad file: > rorI_rReg_Var_C_32 and rorI_rReg_Var_C0 but their definitions is broken > and never match any ideal graph subtree. > > http://cr.openjdk.java.net/~roland/8154537/webrev.00/ > > Roland. > From vivek.r.deshpande at intel.com Tue Apr 19 17:27:42 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 19 Apr 2016 17:27:42 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A8118A@ORSMSX106.amr.corp.intel.com> Hi Volkar Thanks for your review and comments. I will surely take care of these things you mentioned. I am using mathfunc() to methods which call SharedRuntime::d(exp, pow, sin, cos, tan, log, log10) as an alternate when DisableIntrinsic is used to not use LIBM intrinsics. Thanks and regards, Vivek -----Original Message----- From: Volker Simonis [mailto:volker.simonis at gmail.com] Sent: Tuesday, April 19, 2016 1:47 AM To: Deshpande, Vivek R Cc: hotspot compiler; Vladimir Kozlov; Christian Thalinger Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, you introduce the new method TemplateInterpreterGenerator::mathfunc() but only implement it on x86_64. Shouldn't we have at least empty implementations of this method for all architectures? Also the description in the bug sounds quite general but you only seem to implement it for certain math-intrinsics on x64. Another minor nit: in vmSymbols.hpp I don't think we need the const qualifier on the ID argument because it is only an enum anyway: + static bool is_disabled_by_flags(const vmIntrinsics::ID id); It makes sense on: static bool is_disabled_by_flags(const methodHandle& method); because here we are passing method by reference and the const qualifier guaranties that is_disabled_by_flags will not change the method. Regards, Volker On Mon, Apr 18, 2016 at 7:38 PM, Deshpande, Vivek R wrote: > Hi all > > > > I would like to contribute a patch which helps to control the > intrinsics in interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webr > ev.00/ > > > > Thanks and regards, > > Vivek > > From nils.eliasson at oracle.com Tue Apr 19 17:13:12 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 19 Apr 2016 10:13:12 -0700 (PDT) Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <5714B5CB.70705@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> <5714B5CB.70705@oracle.com> Message-ID: <57166728.4060906@oracle.com> On 2016-04-18 12:24, Nils Eliasson wrote: > Hi, > > On 2016-04-15 22:43, Christian Thalinger wrote: >> >>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson >>> wrote: >>> >>> Hi, >>> >>> On 2016-04-14 20:45, Christian Thalinger wrote: >>>> >>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson >>>>> wrote: >>>>> >>>>> I moved the reasons to CompileTask.hpp and put it together with >>>>> the names list. Also changed the type from int to CompileReason as >>>>> Igor suggested. >>>>> >>>>> It gets verbose in the method declarations in compileBroker >>>> >>>> Don?t worry about this. >>>> >>>>> and sometimes I think CompileReason should be declared in >>>>> CompileBroker because it is mostly used there. On the other hand, >>>>> CompileTask is the keeper of the CompileReason so it makes sense too. >>>> >>>> Yes, that?s the right place. >>>> >>>>> >>>>> New webrev: >>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>>> >>>> >>>> *+ bool can_become_stale() const {* >>>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* >>>> *+ }* >>>> I?m not a fan of implicit contracts just defined by comments. This >>>> method doesn?t seem to be performance critical so I would suggest >>>> to use a switch-case. An attribute on the enum would be much >>>> better but we all know this isn?t Java. >>> >>> As you suggested: >>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 >> >> Thanks. A space is missing and the closing } indent is wrong: >> *+ bool can_become_stale() const {* >> *+ switch(_compile_reason) {* >> *+ case Reason_BackedgeCount:* >> *+ case Reason_InvocationCount:* >> *+ case Reason_Tiered:* >> *+ return !_is_blocking;* >> *+ }* >> *+ return false;* >> *+ }* And I fixed the indentation. Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ Thanks! Nils >> Also, what about: >> *+ Reason_None,* >> *+ Reason_CTW, // Compile the world* >> *+ Reason_Replay, // ciReplay* >> These were covered before. > Reason_None - is only used for bounds checking together with Reason_Count. > Reason_Replay - if these compilations can get stale we can get > indeterminism in replay. > Reason_CTW - CTW could silently drop compiles -> more indeterminism. > > Regards, > Nils > >> >>> >>> Also made reasons CTW and Replay not stale-able. >>> >>> Thanks! >>> Nils >>> >>>> >>>>> >>>>> Thanks! >>>>> Nils >>>>> >>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>>> Very nice, I like it. >>>>>> >>>>>> One note. CompileReason (and its names) should be CompileTask >>>>>> class where it is recorded. Then CompileTask::can_become_stale() >>>>>> can be in header file so it is inlinined on all platforms. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> New webrev: >>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>>> >>>>>>> >>>>>>> Summary >>>>>>> Introduced an enum CompileReason with members matching all the old >>>>>>> variants, and a table containing all the unchanged strings. I >>>>>>> see the >>>>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>>>> have choosen not to do so in this change. >>>>>>> >>>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>>> >>>>>>> Testing: >>>>>>> Running Testset hotspot on all platforms and hotspot_all on one >>>>>>> platform >>>>>>> >>>>>>> Regards, >>>>>>> Nils Eliawsson >>>>>>> >>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>>> Tasks get evicted from the compile_queue if their invocation >>>>>>>>> counter >>>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>>> >>>>>>>>> I'll do a proper fix, it is the right thing to do and should >>>>>>>>> be pretty >>>>>>>>> quick. I'll change the comment to an enum that represent who >>>>>>>>> submitted >>>>>>>>> the compile, and add a table for the comments. This could be >>>>>>>>> useful in >>>>>>>>> other settings to. >>>>>>>> >>>>>>>> Sounds good. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nils >>>>>>>>> >>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>>> What do you mean "stale"? >>>>>>>>>> I would prefer to see the real fix as you suggested to avoid >>>>>>>>>> removing >>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>>> >>>>>>>>>>> Summary: >>>>>>>>>>> Add method enqueued for compilation with WB API may be >>>>>>>>>>> removed from >>>>>>>>>>> the compile queue as stale. >>>>>>>>>>> >>>>>>>>>>> Solution: >>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>>> checks that may spare us from waiting until timeout for >>>>>>>>>>> failing.) >>>>>>>>>>> >>>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>>> task with info about the origin of the compile. The comment >>>>>>>>>>> field has >>>>>>>>>>> this information - but then it needs to be >>>>>>>>>>> converted to an enum. >>>>>>>>>>> >>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Nils Eliasson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Apr 19 17:37:09 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Apr 2016 07:37:09 -1000 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <57166728.4060906@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> <5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com> Message-ID: <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com> > On Apr 19, 2016, at 7:13 AM, Nils Eliasson wrote: > > > > On 2016-04-18 12:24, Nils Eliasson wrote: >> Hi, >> >> On 2016-04-15 22:43, Christian Thalinger wrote: >>> >>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson > wrote: >>>> >>>> Hi, >>>> >>>> On 2016-04-14 20:45, Christian Thalinger wrote: >>>>> >>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson > wrote: >>>>>> >>>>>> I moved the reasons to CompileTask.hpp and put it together with the names list. Also changed the type from int to CompileReason as Igor suggested. >>>>>> >>>>>> It gets verbose in the method declarations in compileBroker >>>>> >>>>> Don?t worry about this. >>>>> >>>>>> and sometimes I think CompileReason should be declared in CompileBroker because it is mostly used there. On the other hand, CompileTask is the keeper of the CompileReason so it makes sense too. >>>>> >>>>> Yes, that?s the right place. >>>>> >>>>>> >>>>>> New webrev: >>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>>> >>>>> + bool can_become_stale() const { >>>>> + return !_is_blocking && (_compile_reason < Reason_Whitebox); >>>>> + } >>>>> I?m not a fan of implicit contracts just defined by comments. This method doesn?t seem to be performance critical so I would suggest to use a switch-case. An attribute on the enum would be much better but we all know this isn?t Java. >>>> >>>> As you suggested: >>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 >>> >>> Thanks. A space is missing and the closing } indent is wrong: >>> + bool can_become_stale() const { >>> + switch(_compile_reason) { >>> + case Reason_BackedgeCount: >>> + case Reason_InvocationCount: >>> + case Reason_Tiered: >>> + return !_is_blocking; >>> + } >>> + return false; >>> + } > And I fixed the indentation. > > Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ + switch(_compile_reason) { Space after switch. > > Thanks! > Nils >>> Also, what about: >>> + Reason_None, >>> + Reason_CTW, // Compile the world >>> + Reason_Replay, // ciReplay >>> These were covered before. >> Reason_None - is only used for bounds checking together with Reason_Count. >> Reason_Replay - if these compilations can get stale we can get indeterminism in replay. >> Reason_CTW - CTW could silently drop compiles -> more indeterminism. >> >> Regards, >> Nils >> >>> >>>> >>>> Also made reasons CTW and Replay not stale-able. >>>> >>>> Thanks! >>>> Nils >>>> >>>>> >>>>>> >>>>>> Thanks! >>>>>> Nils >>>>>> >>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>>>> Very nice, I like it. >>>>>>> >>>>>>> One note. CompileReason (and its names) should be CompileTask class where it is recorded. Then CompileTask::can_become_stale() can be in header file so it is inlinined on all platforms. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> New webrev: >>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>>>> >>>>>>>> Summary >>>>>>>> Introduced an enum CompileReason with members matching all the old >>>>>>>> variants, and a table containing all the unchanged strings. I see the >>>>>>>> possibility of removing/changing/simplifying some CompileReasons but >>>>>>>> have choosen not to do so in this change. >>>>>>>> >>>>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>>>> >>>>>>>> Testing: >>>>>>>> Running Testset hotspot on all platforms and hotspot_all on one platform >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nils Eliawsson >>>>>>>> >>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>>>> Tasks get evicted from the compile_queue if their invocation counter >>>>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>>>> >>>>>>>>>> I'll do a proper fix, it is the right thing to do and should be pretty >>>>>>>>>> quick. I'll change the comment to an enum that represent who submitted >>>>>>>>>> the compile, and add a table for the comments. This could be useful in >>>>>>>>>> other settings to. >>>>>>>>> >>>>>>>>> Sounds good. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nils >>>>>>>>>> >>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>>>> What do you mean "stale"? >>>>>>>>>>> I would prefer to see the real fix as you suggested to avoid removing >>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>>>> >>>>>>>>>>>> Summary: >>>>>>>>>>>> Add method enqueued for compilation with WB API may be removed from >>>>>>>>>>>> the compile queue as stale. >>>>>>>>>>>> >>>>>>>>>>>> Solution: >>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure nothing gets >>>>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>>>> checks that may spare us from waiting until timeout for failing.) >>>>>>>>>>>> >>>>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>>>> task with info about the origin of the compile. The comment field has >>>>>>>>>>>> this information - but then it needs to be >>>>>>>>>>>> converted to an enum. >>>>>>>>>>>> >>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> Nils Eliasson >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Tue Apr 19 17:13:05 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 19 Apr 2016 10:13:05 -0700 (PDT) Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <5716578C.5080902@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> Message-ID: <57166721.5010208@oracle.com> Thanks, Vladimir! On 19.04.2016 18:06, Vladimir Kozlov wrote: > Very good. Go with basic. We can do SPU special improvements later if needed. Okay, I'll push the basic version. For reference, here are the results on a SPARC T4: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png > "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." > We do have arraycopy code for it but by default we don't use it: > product(uintx, ArraycopySrcPrefetchDistance, 0, > product(uintx, ArraycopyDstPrefetchDistance, 0, > > Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: java -XX:ArraycopySrcPrefetchDistance=42 -version ArraycopySrcPrefetchDistance (42) must be 0 Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit Thanks, Tobias > > Thanks, > Vladimir > > On 4/19/16 5:35 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following enhancement: >> https://bugs.openjdk.java.net/browse/JDK-6941938 >> >> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >> >> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >> >> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >> >> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >> >> I evaluated the following three versions of the patch. >> >> -- Basic -- >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >> >> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >> >> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >> Version "small" tries to improve this. >> >> -- Prefetching -- >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >> >> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >> >> -- Small -- >> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >> >> The numbers can be found here: >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >> >> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >> >> What do you think? >> >> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >> [3] Microbenchmark results for the "basic" implementation >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >> [4] Microbenchmark results for the "prefetching" implementation >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >> From rwestrel at redhat.com Tue Apr 19 18:54:04 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 19 Apr 2016 20:54:04 +0200 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <571654E7.5020404@oracle.com> References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com> Message-ID: <57167ECC.8080304@redhat.com> Thanks everyone for the review. I need a sponsor for that one given it touches a shared code test case. Roland. From vladimir.kozlov at oracle.com Tue Apr 19 22:12:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Apr 2016 15:12:21 -0700 (PDT) Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <57167ECC.8080304@redhat.com> References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com> <57167ECC.8080304@redhat.com> Message-ID: <5716AD45.1060302@oracle.com> In JPRT. Vladimir On 4/19/16 11:54 AM, Roland Westrelin wrote: > Thanks everyone for the review. I need a sponsor for that one given it > touches a shared code test case. > > Roland. > From vivek.r.deshpande at intel.com Wed Apr 20 00:44:43 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 20 Apr 2016 00:44:43 +0000 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <57162A88.7030608@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> HI Nils Yes you are right the function accesses the command line flag DisableIntrinsic and changes are static. Could you point me the right location for the function ? Also I have updated the webrev with rest of the comments here: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ Regards, Vivek From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Nils Eliasson Sent: Tuesday, April 19, 2016 5:55 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics Hi Vivek, The changes in is_intrinsic_disabled in compilerDirectives.* are static and only access the command line flag DisableIntrinsics. As long as stubs are only generated during startup and don't have a method context - that is ok - but it doesn't belong in the compilerDirectives-files if it doens't use directives. Regards, Nils On 2016-04-18 19:38, Deshpande, Vivek R wrote: Hi all I would like to contribute a patch which helps to control the intrinsics in interpreter, c1 and c2 by disabling the stub generation. This uses -XX:DisableIntrinsic option to achieve the same. Could you please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8154473 webrev: http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ Thanks and regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Apr 20 01:46:21 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Apr 2016 18:46:21 -0700 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: <57166721.5010208@oracle.com> References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> Message-ID: On Apr 19, 2016, at 10:13 AM, Tobias Hartmann wrote: > > Okay, I'll push the basic version. > So I started looking at your code and my inner SPARC junkie took over. This is what happened: http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ Perhaps there are some ideas that might be helpful: - The rampdown logic can lose a couple of instructions by using xorcc and movr. - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? On the other hand, what you wrote is nice and simple. HTH ? John P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more versions of misalignment, still with vectorization, as with the arraycopy stubs. But that's neither nice nor simple. > On Apr 19, 2016, at 10:13 AM, Tobias Hartmann wrote: > > Thanks, Vladimir! > > On 19.04.2016 18:06, Vladimir Kozlov wrote: >> Very good. Go with basic. We can do SPU special improvements later if needed. > > Okay, I'll push the basic version. > > For reference, here are the results on a SPARC T4: > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png > http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png > >> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >> We do have arraycopy code for it but by default we don't use it: >> product(uintx, ArraycopySrcPrefetchDistance, 0, >> product(uintx, ArraycopyDstPrefetchDistance, 0, >> >> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. > > Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: > > java -XX:ArraycopySrcPrefetchDistance=42 -version > ArraycopySrcPrefetchDistance (42) must be 0 > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit > > Thanks, > Tobias > >> >> Thanks, >> Vladimir >> >> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following enhancement: >>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>> >>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>> >>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>> >>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>> >>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>> >>> I evaluated the following three versions of the patch. >>> >>> -- Basic -- >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>> >>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>> >>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>> Version "small" tries to improve this. >>> >>> -- Prefetching -- >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>> >>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>> >>> -- Small -- >>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>> >>> The numbers can be found here: >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>> >>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>> >>> What do you think? >>> >>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>> [3] Microbenchmark results for the "basic" implementation >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>> [4] Microbenchmark results for the "prefetching" implementation >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From long.chen at linaro.org Tue Apr 19 12:54:55 2016 From: long.chen at linaro.org (Long Chen) Date: Tue, 19 Apr 2016 20:54:55 +0800 Subject: aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <5714D930.4090804@redhat.com> References: <5714D930.4090804@redhat.com> Message-ID: Thanks for all these nice comments. Here is a revised version: http://people.linaro.org/~long.chen/block_zeroing/block_zeroing.v02.patch Changes: 1. Are DC and IC really synonyms? DC and IC assembling was supposed to be distinguished by different cache_maintenance parameters. I create two enums ?icache_maintanence? and ?dcache_maintanence? in the revised patch, to make it look better. + enum icache_maintenance {IVAU = 0b0101}; + enum dcache_maintenance {CVAC = 0b1010, CVAU = 0b1011, CIVAC = 0b1110, ZVA = 0b100}; + void dc(dcache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); + } + + void ic(icache_maintenance cm, Register Rt) { + sys(0b011, 0b0111, cm, 0b001, Rt); } 2. I'm not convinced of the value of this. We already know that a simple while (count-- > 0) { *to++ = v; } turns into a call to memset() which does DC ZVA. OK. I reverted this change and leave it to the compiler. The patch becomes simpler :) 3. Block_zeroing -> block_zero, 8-byte unit -> HeapWords 4. I don't think this CBZ does anything useful: 0x0000007fa880f630: cbz x8, 0x0000007fa880f670 Removed 5. To avoid scratching a new register, I write a small piece of code after the dc zva loop in block_zero, so that block_zero doesn?t need to fall through to fill_words to zero the small part of array. This code might not perform as good as fill_words (unrolled), but it requires one less register, and the code size becomes smaller as well. The final code is like this: 0x0000007f7d3dd4fc: cmp x11, #0x20 0x0000007f7d3dd500: b.lt 0x0000007f7d3dd538 0x0000007f7d3dd504: neg x8, x10 0x0000007f7d3dd508: and x8, x8, #0x3f 0x0000007f7d3dd50c: cbz x8, 0x0000007f7d3dd520 0x0000007f7d3dd510: sub x11, x11, x8, asr #3 0x0000007f7d3dd514: sub x8, x8, #0x8 0x0000007f7d3dd518: str xzr, [x10],#8 0x0000007f7d3dd51c: cbnz x8, 0x0000007f7d3dd514 0x0000007f7d3dd520: sub x11, x11, #0x8 0x0000007f7d3dd524: dc zva, x10 0x0000007f7d3dd528: subs x11, x11, #0x8 0x0000007f7d3dd52c: add x10, x10, #0x40 0x0000007f7d3dd530: b.ge 0x0000007f7d3dd524 0x0000007f7d3dd534: add x11, x11, #0x8 0x0000007f7d3dd538: tbz w11, #0, 0x0000007f7d3dd544 0x0000007f7d3dd53c: str xzr, [x10],#8 0x0000007f7d3dd540: sub x11, x11, #0x1 0x0000007f7d3dd544: cbz x11, 0x0000007f7d3dd554 0x0000007f7d3dd548: sub x11, x11, #0x2 0x0000007f7d3dd54c: stp xzr, xzr, [x10],#16 0x0000007f7d3dd550: cbnz x11, 0x0000007f7d3dd548 Would this be fine? Regards Long On 18 April 2016 at 20:55, Andrew Haley wrote: > One other thing. This is rather a lot of code to emit every time an > array is created: > > ;; zero_words { > 0x0000007fa880f5f0: cmp x11, #0x20 > 0x0000007fa880f5f4: b.lt 0x0000007fa880f62c > > 0x0000007fa880f5f8: neg x8, x10 > 0x0000007fa880f5fc: and x8, x8, #0x7f > 0x0000007fa880f600: cbz x8, 0x0000007fa880f614 > 0x0000007fa880f604: sub x11, x11, x8, asr #3 > 0x0000007fa880f608: sub x8, x8, #0x8 > 0x0000007fa880f60c: str xzr, [x10],#8 > 0x0000007fa880f610: cbnz x8, 0x0000007fa880f608 > 0x0000007fa880f614: sub x11, x11, #0x10 > 0x0000007fa880f618: dc zva, x10 > 0x0000007fa880f61c: subs x11, x11, #0x10 > 0x0000007fa880f620: add x10, x10, #0x80 > 0x0000007fa880f624: b.ge 0x0000007fa880f618 > 0x0000007fa880f628: add x11, x11, #0x10 > > 0x0000007fa880f62c: and x8, x11, #0x7 > > I don't think this CBZ does anything useful: > > 0x0000007fa880f630: cbz x8, 0x0000007fa880f670 > > (I'm assuming that the 0-7 cases are uniformly distributed.) > > 0x0000007fa880f634: sub x11, x11, x8 > 0x0000007fa880f638: add x10, x10, x8, lsl #3 > 0x0000007fa880f63c: adr x9, 0x0000007fa880f670 > 0x0000007fa880f640: sub x9, x9, x8, lsl #2 > 0x0000007fa880f644: br x9 > 0x0000007fa880f648: add x10, x10, #0x40 > 0x0000007fa880f64c: sub x11, x11, #0x8 > 0x0000007fa880f650: stur xzr, [x10,#-64] > 0x0000007fa880f654: stur xzr, [x10,#-56] > 0x0000007fa880f658: stur xzr, [x10,#-48] > 0x0000007fa880f65c: stur xzr, [x10,#-40] > 0x0000007fa880f660: stur xzr, [x10,#-32] > 0x0000007fa880f664: stur xzr, [x10,#-24] > 0x0000007fa880f668: stur xzr, [x10,#-16] > 0x0000007fa880f66c: stur xzr, [x10,#-8] > 0x0000007fa880f670: cbnz x11, 0x0000007fa880f648 > ;; } zero_words > > We could think about moving the large block case into a stub which is > emitted after the main body of the method, or even into a shared stub. > A shared stub would require the args to be in fixed registers, though. > > Andrew. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Wed Apr 20 06:30:53 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 20 Apr 2016 08:30:53 +0200 Subject: RFR(S): 8154537: AArch64: some integer rotate instructions are never emitted In-Reply-To: <5716AD45.1060302@oracle.com> References: <57161A23.3050807@redhat.com> <571654E7.5020404@oracle.com> <57167ECC.8080304@redhat.com> <5716AD45.1060302@oracle.com> Message-ID: <5717221D.2010107@redhat.com> > In JPRT. Thanks! Roland. From nils.eliasson at oracle.com Wed Apr 20 07:46:17 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Apr 2016 09:46:17 +0200 Subject: RFR(S): 8153013: BlockingCompilation test times out In-Reply-To: <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com> References: <5707A3AF.3040807@oracle.com> <5707E5C7.3000000@oracle.com> <570CF86B.3050804@oracle.com> <570D2869.5030206@oracle.com> <570E42B2.2090306@oracle.com> <570EBB79.7060805@oracle.com> <570F905A.4050202@oracle.com> <5710E772.5050801@oracle.com> <5714B5CB.70705@oracle.com> <57166728.4060906@oracle.com> <31AA6615-9D85-4E67-A12F-DB3A2196CBC4@oracle.com> Message-ID: <571733C9.5090302@oracle.com> On 2016-04-19 19:37, Christian Thalinger wrote: > >> On Apr 19, 2016, at 7:13 AM, Nils Eliasson > > wrote: >> >> >> >> On 2016-04-18 12:24, Nils Eliasson wrote: >>> Hi, >>> >>> On 2016-04-15 22:43, Christian Thalinger wrote: >>>> >>>>> On Apr 15, 2016, at 3:06 AM, Nils Eliasson >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> On 2016-04-14 20:45, Christian Thalinger wrote: >>>>>> >>>>>>> On Apr 14, 2016, at 2:43 AM, Nils Eliasson >>>>>>> wrote: >>>>>>> >>>>>>> I moved the reasons to CompileTask.hpp and put it together with >>>>>>> the names list. Also changed the type from int to CompileReason >>>>>>> as Igor suggested. >>>>>>> >>>>>>> It gets verbose in the method declarations in compileBroker >>>>>> >>>>>> Don?t worry about this. >>>>>> >>>>>>> and sometimes I think CompileReason should be declared in >>>>>>> CompileBroker because it is mostly used there. On the other >>>>>>> hand, CompileTask is the keeper of the CompileReason so it makes >>>>>>> sense too. >>>>>> >>>>>> Yes, that?s the right place. >>>>>> >>>>>>> >>>>>>> New webrev: >>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.03/ >>>>>>> >>>>>> >>>>>> *+ bool can_become_stale() const {* >>>>>> *+ return !_is_blocking && (_compile_reason < Reason_Whitebox);* >>>>>> *+ }* >>>>>> I?m not a fan of implicit contracts just defined by comments. >>>>>> This method doesn?t seem to be performance critical so I would >>>>>> suggest to use a switch-case. An attribute on the enum would be >>>>>> much better but we all know this isn?t Java. >>>>> >>>>> As you suggested: >>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.04 >>>> >>>> Thanks. A space is missing and the closing } indent is wrong: >>>> *+ bool can_become_stale() const {* >>>> *+ switch(_compile_reason) {* >>>> *+ case Reason_BackedgeCount:* >>>> *+ case Reason_InvocationCount:* >>>> *+ case Reason_Tiered:* >>>> *+ return !_is_blocking;* >>>> *+ }* >>>> *+ return false;* >>>> *+ }* >> And I fixed the indentation. >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.05/ > > *+ switch(_compile_reason) {* > Space after switch. New webrev: http://cr.openjdk.java.net/~neliasso/8153013/webrev.06/ Thanks, Nils > >> >> Thanks! >> Nils >>>> Also, what about: >>>> *+ Reason_None,* >>>> *+ Reason_CTW, // Compile the world* >>>> *+ Reason_Replay, // ciReplay* >>>> These were covered before. >>> Reason_None - is only used for bounds checking together with >>> Reason_Count. >>> Reason_Replay - if these compilations can get stale we can get >>> indeterminism in replay. >>> Reason_CTW - CTW could silently drop compiles -> more indeterminism. >>> >>> Regards, >>> Nils >>> >>>> >>>>> >>>>> Also made reasons CTW and Replay not stale-able. >>>>> >>>>> Thanks! >>>>> Nils >>>>> >>>>>> >>>>>>> >>>>>>> Thanks! >>>>>>> Nils >>>>>>> >>>>>>> On 2016-04-13 23:34, Vladimir Kozlov wrote: >>>>>>>> Very nice, I like it. >>>>>>>> >>>>>>>> One note. CompileReason (and its names) should be CompileTask >>>>>>>> class where it is recorded. Then >>>>>>>> CompileTask::can_become_stale() can be in header file so it is >>>>>>>> inlinined on all platforms. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 4/13/16 5:59 AM, Nils Eliasson wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> New webrev: >>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.02/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Summary >>>>>>>>> Introduced an enum CompileReason with members matching all the old >>>>>>>>> variants, and a table containing all the unchanged strings. I >>>>>>>>> see the >>>>>>>>> possibility of removing/changing/simplifying some >>>>>>>>> CompileReasons but >>>>>>>>> have choosen not to do so in this change. >>>>>>>>> >>>>>>>>> Only new logic is the CompileTask::can_become_stale() method. >>>>>>>>> >>>>>>>>> Testing: >>>>>>>>> Running Testset hotspot on all platforms and hotspot_all on >>>>>>>>> one platform >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nils Eliawsson >>>>>>>>> >>>>>>>>> On 2016-04-12 18:55, Vladimir Kozlov wrote: >>>>>>>>>> On 4/12/16 6:30 AM, Nils Eliasson wrote: >>>>>>>>>>> Tasks get evicted from the compile_queue if their invocation >>>>>>>>>>> counter >>>>>>>>>>> hasn't increased during TieredCompileTaskTimeout. >>>>>>>>>>> (AdvancedThresholdPolicy::is_stale(...)). >>>>>>>>>>> >>>>>>>>>>> I'll do a proper fix, it is the right thing to do and should >>>>>>>>>>> be pretty >>>>>>>>>>> quick. I'll change the comment to an enum that represent who >>>>>>>>>>> submitted >>>>>>>>>>> the compile, and add a table for the comments. This could be >>>>>>>>>>> useful in >>>>>>>>>>> other settings to. >>>>>>>>>> >>>>>>>>>> Sounds good. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nils >>>>>>>>>>> >>>>>>>>>>> On 2016-04-08 19:09, Vladimir Kozlov wrote: >>>>>>>>>>>> What do you mean "stale"? >>>>>>>>>>>> I would prefer to see the real fix as you suggested to >>>>>>>>>>>> avoid removing >>>>>>>>>>>> WB comp tasks from queue. Adding timeout is not reliable. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 4/8/16 5:27 AM, Nils Eliasson wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Please review this small fix of the BlockingCompilation test. >>>>>>>>>>>>> >>>>>>>>>>>>> Summary: >>>>>>>>>>>>> Add method enqueued for compilation with WB API may be >>>>>>>>>>>>> removed from >>>>>>>>>>>>> the compile queue as stale. >>>>>>>>>>>>> >>>>>>>>>>>>> Solution: >>>>>>>>>>>>> Add -XX:TieredCompileTaskTimeout=60000 to make sure >>>>>>>>>>>>> nothing gets >>>>>>>>>>>>> stale while the test is running. (Also added some extra >>>>>>>>>>>>> checks that may spare us from waiting until timeout for >>>>>>>>>>>>> failing.) >>>>>>>>>>>>> >>>>>>>>>>>>> This is an workaround but we should consider fixing something >>>>>>>>>>>>> permanent for WB API compiles - like tagging the compile >>>>>>>>>>>>> task with info about the origin of the compile. The >>>>>>>>>>>>> comment field has >>>>>>>>>>>>> this information - but then it needs to be >>>>>>>>>>>>> converted to an enum. >>>>>>>>>>>>> >>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153013 >>>>>>>>>>>>> Webrev: >>>>>>>>>>>>> http://cr.openjdk.java.net/~neliasso/8153013/webrev.01/ >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Nils Eliasson >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at oracle.com Tue Apr 19 16:38:33 2016 From: vladimir.kempik at oracle.com (Vladimir Kempik) Date: Tue, 19 Apr 2016 09:38:33 -0700 (PDT) Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <570B9803.2030509@oracle.com> References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com> Message-ID: <57165F09.5050404@oracle.com> Hello Can I get some jdk8u reviewer to take a look at it as well? Thanks, Vladimir. On 11.04.2016 15:26, Tobias Hartmann wrote: > Hi Vladimir, > > On 11.04.2016 14:00, Vladimir Kempik wrote: >> Hello >> >> Please review this backport of 8130309 to jdk8u. >> >> Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. >> >> Testing: jprt, failing test. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 >> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ > Looks good to me. Thanks for backporting this! > > Best regards, > Tobias > >> Thanks >> -Vladimir >> From tobias.hartmann at oracle.com Wed Apr 20 08:05:47 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 10:05:47 +0200 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <57165F09.5050404@oracle.com> References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com> <57165F09.5050404@oracle.com> Message-ID: <5717385B.7040505@oracle.com> Hi Vladimir, I think this should go to jdk8u-dev (CC'ed) as well. Best regards, Tobias On 19.04.2016 18:38, Vladimir Kempik wrote: > Hello > > Can I get some jdk8u reviewer to take a look at it as well? > > Thanks, Vladimir. > > On 11.04.2016 15:26, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 11.04.2016 14:00, Vladimir Kempik wrote: >>> Hello >>> >>> Please review this backport of 8130309 to jdk8u. >>> >>> Small changes for jdk8 were applied. AArch64 changes were moved out of openjdk scope. >>> >>> Testing: jprt, failing test. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 >>> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ >> Looks good to me. Thanks for backporting this! >> >> Best regards, >> Tobias >> >>> Thanks >>> -Vladimir >>> > From jan.civlin at intel.com Wed Apr 20 10:11:52 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Wed, 20 Apr 2016 10:11:52 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <57157726.4030701@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> Vladimir, Please look at the updated patch at http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. Thank you, J [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA throughput = 356.09558280340946 MB/s [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA throughput = 354.1696071938408 MB/s [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA throughput = 349.01408678325697 MB/s -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, April 18, 2016 5:09 PM To: Civlin, Jan; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Hi Jan, The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. Please, move new code in macroAssembler_x86_sha.cpp to the end of file. _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: StubRoutines::x86::_k256_W_adr = generate_k256_W(); What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. Thanks, Vladimir On 4/18/16 2:44 PM, Civlin, Jan wrote: > == Correction in the subject line === > > We would like to contribute the SHA256 AVX2 intrinsic. > > This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. > > The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. > > Contributor: Jan Civlin. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8154495 > webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ > From tobias.hartmann at oracle.com Wed Apr 20 13:31:51 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 15:31:51 +0200 Subject: [9] RFR(S): 6941938: Improve array equals intrinsic on SPARC In-Reply-To: References: <5716261F.1070205@oracle.com> <5716578C.5080902@oracle.com> <57166721.5010208@oracle.com> Message-ID: <571784C7.6020304@oracle.com> Hi John, On 20.04.2016 03:46, John Rose wrote: > So I started looking at your code and my inner SPARC junkie took over. > > This is what happened: > http://cr.openjdk.java.net/~jrose/draft/sparc/6941938/ Thanks a lot for having a look! > Perhaps there are some ideas that might be helpful: > - The rampdown logic can lose a couple of instructions by using xorcc and movr. Right, this simplifies the code a bit: http://cr.openjdk.java.net/~thartmann/6941938/webrev.00/ I did some experiments but surprisingly it seems that this does not improve but slightly degrade performance. See benchmark results on page "webrev.00" of [1]. Any idea why that is? > - Perhaps the 32-bit version only makes sense for sizes of 16 bytes or less? I tried this already with an explicit check and branch (see webrev.small [2]) and the benchmarks showed a regression for small array sizes because of the additional check. I also evaluated the "shared check" you proposed: http://cr.openjdk.java.net/~thartmann/6941938/webrev.01 Unfortunately, this leads to a regression as well. See page "webrev.01" of [1]. > - It's possible to work with 64-bit loads in more cases (both-odd and one-odd). Yes, I thought about this when implementing the intrinsic but assumed that those cases are rare and it's sufficient to fall back to the 4-byte loop. I added runtime checks for misalignment [3] to the intrinsic and executed some tests (Microbenchmarks, JPRT and Nashorn + Octane). It seems that the arrays are always 8 byte aligned and misalignment is really rare. I would therefore like to avoid the additional complexity of the skewed loop. What do you think? Thanks, Tobias [1] http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938_T4.xlsx [2] http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ [3] Runtime alignment checks: bind(Lunaligned); Label next; xor3(ary1, ary2, tmp); and3(tmp, 7, tmp); br_null_short(tmp, Assembler::pn, next); STOP("One array is unaligned!"); should_not_reach_here(); bind(next); STOP("Both arrays are unaligned!"); > On the other hand, what you wrote is nice and simple. > > HTH > ? John > > P.S. On the otherest hand, I wish we could cover the mismatch intrinsic, and more > versions of misalignment, still with vectorization, as with the arraycopy stubs. > But that's neither nice nor simple. > >> On Apr 19, 2016, at 10:13 AM, Tobias Hartmann > wrote: >> >> Thanks, Vladimir! >> >> On 19.04.2016 18:06, Vladimir Kozlov wrote: >>> Very good. Go with basic. We can do SPU special improvements later if needed. >> >> Okay, I'll push the basic version. >> >> For reference, here are the results on a SPARC T4: >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic_T4.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic_T4.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic_T4.png >> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic_T4.png >> >>> "I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC." >>> We do have arraycopy code for it but by default we don't use it: >>> product(uintx, ArraycopySrcPrefetchDistance, 0, >>> product(uintx, ArraycopyDstPrefetchDistance, 0, >>> >>> Experiments back then did not show improvement on JBB benchmarks but some workloads may have benefit. that is why we keep the code. >> >> Right but currently it's not possible to enable prefetching, because "ArraycopySrcPrefetchDistanceConstraintFunc" enforces the prefetch distance to be 0: >> >> java -XX:ArraycopySrcPrefetchDistance=42 -version >> ArraycopySrcPrefetchDistance (42) must be 0 >> Error: Could not create the Java Virtual Machine. >> Error: A fatal exception has occurred. Program will exit >> >> Thanks, >> Tobias >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/19/16 5:35 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following enhancement: >>>> https://bugs.openjdk.java.net/browse/JDK-6941938 >>>> >>>> MacroAssembler::array_equals() currently emits a 4-byte array comparison loop on SPARC but we can do a 8-byte comparison if the array addresses are doubleword aligned. The intrinsic is used for byte/char Arrays.equals() and String.equals(). >>>> >>>> I added the method MacroAssembler::array_equals_loop() that loads and compares array elements of size 'byte_width' until the elements are not equal or we reached the end of the arrays. Instead of first comparing trailing characters if the array sizes are not a multiple of 'byte_width', we simply read over the end of the arrays, bail out and compare the remaining elements by shifting out the garbage bits. >>>> >>>> We use a width of 8 bytes if the array addresses are doubleword aligned, or fall back to 4 bytes if they are only word aligned (which is always guaranteed). I also refactored MacroAssembler::string_compare() to use load_sized_value(). >>>> >>>> Performance evaluation showed an improvement of up to 65% with the microbenchmarks on a SPARC M7. I used a slightly modified version of the "EqualsBench" [1] from Aleksey's compact string microbenchmark suite [2] that exercises the intrinsic on two equal/unequal char/byte arrays. Larger benchmarks (SPECjbb, SPECjvm) show no significant difference in performance. >>>> >>>> I evaluated the following three versions of the patch. >>>> >>>> -- Basic -- >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.basic/ >>>> The basic version implements the algorithm described above. Microbenchmark evaluation [3] shows a great improvement for all array sizes except for size 16 where the performance suddenly drops: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>> >>>> I figured out that this is due to caching. It seems that with 8 byte steps, the processor notices later in the loop that we are going to access the subsequent array elements, causing the reads to trigger costly cache misses that degrade overall performance. With 4 byte reads, fetching is triggered earlier and the subsequent reads are very fast. I executed the same benchmark on a SPARC T4 and there is no performance problem. Version "prefetching" tries to improve this. >>>> >>>> There is also a regression for sizes 1 and 2 if the arrays are not equal because the baseline version explicitly checks those sizes before the loop: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>> Version "small" tries to improve this. >>>> >>>> -- Prefetching -- >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.prefetch/ >>>> This version adds explicit prefetching before the loop to make sure the first array elements are always in the cache. This helps to avoid the cache misses with 16 bytes: http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>> >>>> However, prefetching negatively affects overall performance in the other cases (see [4]). Zoltan suggested that this may be because software prefetching temporarily disables or affects the hardware prefetcher. I also experimented with adding incremental prefetching to the loop body but this does not improve performance. >>>> >>>> -- Small -- >>>> http://cr.openjdk.java.net/~thartmann/6941938/webrev.small/ >>>> This version includes special handling of arrays of size <= 4 by emitting a single load without a loop and the corresponding checks. Performance evaluation showed that this does not pay off because the additional check induces an overhead for small arrays > 4 elements (see sheet "small arrays"). >>>> >>>> The numbers can be found here: >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/microbenchmark_results_6941938.xlsx >>>> >>>> I would prefer to use the basic version because explicit prefetching is highly dependent on the processor and behavior may change. I also noticed that we don't use prefetching for arraycopy and other intrinsics on SPARC. >>>> >>>> What do you think? >>>> >>>> Tested with all hotspot tests on RBT (-Xmixed/-Xcomp). Links are attached to the bug. >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://bugs.openjdk.java.net/secure/attachment/58697/EqualsBench.java >>>> [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip >>>> [3] Microbenchmark results for the "basic" implementation >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_basic.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_basic.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_basic.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_basic.png >>>> [4] Microbenchmark results for the "prefetching" implementation >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_equal_prefetch.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/byte_unequal_prefetch.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_equal_prefetch.png >>>> http://cr.openjdk.java.net/~thartmann/6941938/benchmarks/char_unequal_prefetch.png >>>> From tobias.hartmann at oracle.com Wed Apr 20 13:46:53 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 15:46:53 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options Message-ID: <5717884D.2020108@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8086068 http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. Tested with regression test and RBT (running). Thanks, Tobias From zoltan.majo at oracle.com Wed Apr 20 14:02:43 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 20 Apr 2016 16:02:43 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <5717884D.2020108@oracle.com> References: <5717884D.2020108@oracle.com> Message-ID: <57178C03.1010902@oracle.com> Hi Tobias, thank you for looking taking care of this issue. There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems? Otherwise it looks good to me. Best regards, Zoltan On 04/20/2016 03:46 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8086068 > http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ > > The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. > > The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. > > Tested with regression test and RBT (running). > > Thanks, > Tobias From tobias.hartmann at oracle.com Wed Apr 20 14:22:23 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Apr 2016 16:22:23 +0200 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <57178C03.1010902@oracle.com> References: <5717884D.2020108@oracle.com> <57178C03.1010902@oracle.com> Message-ID: <5717909F.5040004@oracle.com> Hi Zoltan, On 20.04.2016 16:02, Zolt?n Maj? wrote: > There are some other flags that are set to 'false' with -Xint (UseLoopCounter, AlwaysCompileLoopMethods, and UseOnStackReplacement). Do you know if re-enabling any of those causes problems? I checked and combining them with -Xint does not cause any problems because they are guarded by UseCompiler. > Otherwise it looks good to me. Thanks for the review! Best regards, Tobias > Best regards, > > > Zoltan > > On 04/20/2016 03:46 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8086068 >> http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ >> >> The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. >> >> The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. >> >> Tested with regression test and RBT (running). >> >> Thanks, >> Tobias > From zoltan.majo at oracle.com Wed Apr 20 15:01:18 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 20 Apr 2016 17:01:18 +0200 Subject: [9] RFR(XS): 8153292: AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger out-of-heap prefetching Message-ID: <571799BE.1030203@oracle.com> Hi, please review the patch for 8153292. https://bugs.openjdk.java.net/browse/JDK-8153292 Problem: To avoid out-of-heap accesses by instructions prefetching data, TLABs have a reserved area. The size of that area is supposed to be large enough to accommodate possible prefetching. The amount of prefetched data is controlled separately for instance and array allocations (by the AllocateInstancePrefetchLines and AllocatePrefetchLines flags). The size of the reserved area in the TLAB is, however, determined only based on AllocatePrefetchLines. As a result, AllocateInstancePrefetchLines > AllocatePrefetchLines can trigger out-of-heap memory accesses. Solution: Set the size of the reserved TLAB area to the MAX of both flags. Webrev: http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/ Testing: - JPRT; - local testing on a solaris_sparc machine. Thank you! Best regards, Zoltan From edward.nevill at gmail.com Wed Apr 20 17:08:30 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 20 Apr 2016 18:08:30 +0100 Subject: [aarch64-port-dev ] aarch64: RFR: Block zeroing by 'DC ZVA' In-Reply-To: <57163063.3020506@redhat.com> References: <5714D930.4090804@redhat.com> <57163063.3020506@redhat.com> Message-ID: <1461172110.2941.63.camel@mylittlepony.linaroharston> On Tue, 2016-04-19 at 14:19 +0100, Andrew Haley wrote: > On 04/19/2016 01:54 PM, Long Chen wrote: > > Would this be fine? > > It might well be. I'd like Ed to do a few measurements of large and > small block zeroing. My guess is that a reasonably small unrolled loop > doing STP ZR, ZR will work better than anything else, but we'll see. OK. So I started by doing some basic measurements of how long it takes to clear a cache line on 3 different partners HW using 3 different methods. 1) A sequence of str zr, [base, #N] instructions 2) A sequence of stp zr, zr, [base, #N] instructions 3) Using dc zva Each test was repeated for 3 different memory sizes, 100 cache lines, 10000 cache lines and 1E7 cache lines to simulate the cases where we are hitting L1, L2 and main memory respectively. The results are here. I have normalised the time for the 100 cache line str to 100 for each partner to avoid disclosing any absolute performance figures. http://people.linaro.org/~edward.nevill/block_zero/zva.pdf >From this I get the following conclusions Partner X: - Significant improvement using stp vs str across all block zero sizes - Significant improvement using dc zva over stp across all sizes Partner Y: - Virtually no performance improvement using stp vs str all sizes - Significant improvement using dc zva Partner Z: - Small improvement using stp vs str on L2 sized clears - Small improvement using dc zva on L1/L2 sizes clears - Large block zeros show no performance improvement str/stp/dc zva (this is probably a feature of the external memory system on the partner Z board) So, guided by this I modified the block zeroing patch as follows if (!small) { } Here is the webrev for this http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v03/ I also made a minor modifcation to Long Chen's v02 patch. In the following code + tbz(cnt, 0, store_pair); + str(zr, Address(post(base, 8))); + sub(cnt, cnt, 1); + bind(store_pair); + cbz(cnt, done); + bind(loop_store_pair); + sub(cnt, cnt, 2); + stp(zr, zr, Address(post(base, 16))); + cbnz(cnt, loop_store_pair); + bind(done); it unnecessarily misaligns the base before continuing to do the stps. We know the base is aligned in the large case because it has just finished clearing cache lines. I moved the single word zero to the end. The number of instructions is the same. The webrev for this is here. http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04 For completeness I also implemented a version using stp only and not using dc zva at all. Webrev here http://people.linaro.org/~edward.nevill/block_zero/stp I have tested all of these, including Long Chens v01 and v02 patches using jmh as before (http://people.linaro.org/~edward.nevill/jmh/test/src/main/java/org/sample/JMHTest_00_StringConcatTest.java) Results are here, I have normalised the original value in each case to 1E7uS to avoid disclosing any absolute performance figures. http://people.linaro.org/~edward.nevill/block_zero/zero.pdf In this orig - is a clean jdk9/hs-comp build (results normalised to 1E7uS) stp - is the stp patch above using only stps (no dc zva) bzero1 - is Long Chens v01 patch bzero2 - is Long Chens v02 patch bzero3 - is my patch above bzero4 - is Long Chens v02 patch with the minor mod to avoid misaligning the stps >From this it looks like bzero3 or bzero4 would be the preferred options, and I would suggest bzero4 as bzero3 is significantly larger. If people are happy could I prepare final changeset for review based on bzero4 (ie this one) http://people.linaro.org/~edward.nevill/block_zero/block_zeroing.v04 All the best, Ed. From vladimir.kozlov at oracle.com Wed Apr 20 17:58:45 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 10:58:45 -0700 Subject: [8u] RFR 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <57165F09.5050404@oracle.com> References: <570B91DC.2040904@oracle.com> <570B9803.2030509@oracle.com> <57165F09.5050404@oracle.com> Message-ID: <5717C355.9080509@oracle.com> Reviewed. Looks good. Thanks, Vladimir On 4/19/16 9:38 AM, Vladimir Kempik wrote: > Hello > > Can I get some jdk8u reviewer to take a look at it as well? > > Thanks, Vladimir. > > On 11.04.2016 15:26, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 11.04.2016 14:00, Vladimir Kempik wrote: >>> Hello >>> >>> Please review this backport of 8130309 to jdk8u. >>> >>> Small changes for jdk8 were applied. AArch64 changes were moved out >>> of openjdk scope. >>> >>> Testing: jprt, failing test. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130309 >>> Webrev: http://cr.openjdk.java.net/~vkempik/8130309/webrev.00/ >> Looks good to me. Thanks for backporting this! >> >> Best regards, >> Tobias >> >>> Thanks >>> -Vladimir >>> > From vladimir.kozlov at oracle.com Wed Apr 20 18:01:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 11:01:21 -0700 Subject: [9] RFR(XS): 8153292: AllocateInstancePrefetchLines>AllocatePrefetchLines can trigger out-of-heap prefetching In-Reply-To: <571799BE.1030203@oracle.com> References: <571799BE.1030203@oracle.com> Message-ID: <5717C3F1.6030809@oracle.com> Looks good. Thanks, Vladimir On 4/20/16 8:01 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8153292. > > https://bugs.openjdk.java.net/browse/JDK-8153292 > > > Problem: To avoid out-of-heap accesses by instructions prefetching data, > TLABs have a reserved area. The size of that area is supposed to be > large enough to accommodate possible prefetching. > > The amount of prefetched data is controlled separately for instance and > array allocations (by the AllocateInstancePrefetchLines and > AllocatePrefetchLines flags). The size of the reserved area in the TLAB > is, however, determined only based on AllocatePrefetchLines. As a > result, AllocateInstancePrefetchLines > AllocatePrefetchLines can > trigger out-of-heap memory accesses. > > > Solution: Set the size of the reserved TLAB area to the MAX of both flags. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8153292/webrev.00/ > > Testing: > - JPRT; > - local testing on a solaris_sparc machine. > > Thank you! > > Best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Wed Apr 20 18:37:55 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 11:37:55 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> Message-ID: <5717CC83.3070401@oracle.com> Looks good to me. I submitted testing on all platforms before integrating. Thanks, Vladimir On 4/20/16 3:11 AM, Civlin, Jan wrote: > Vladimir, > > Please look at the updated patch at > http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ > > I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). > > The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. > > The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. > > Thank you, > > J > > [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 > provider = SUN > algorithm = SHA-256 > msgSize = 1024 bytes > offset = 0 > iters = 10000000 > warmupIters = 20000 > hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 > TestSHA runtime = 28.756324129 seconds > TestSHA throughput = 356.09558280340946 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 > provider = SUN > algorithm = SHA-256 > msgSize = 1024 bytes > offset = 0 > iters = 10000000 > warmupIters = 20000 > hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 > TestSHA runtime = 28.912701124 seconds > TestSHA throughput = 354.1696071938408 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar 10000000 > provider = SUN > algorithm = SHA-256 > msgSize = 1024 bytes > offset = 0 > iters = 10000000 > warmupIters = 20000 > hash [32]: 78 5b 07 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 9c 24 2c 26 c9 > TestSHA runtime = 29.339789962 seconds > TestSHA throughput = 349.01408678325697 MB/s > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, April 18, 2016 5:09 PM > To: Civlin, Jan; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) > > Hi Jan, > > The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. > > I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. > > Please, move new code in macroAssembler_x86_sha.cpp to the end of file. > > _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: > > StubRoutines::x86::_k256_W_adr = generate_k256_W(); > > What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. > > Thanks, > Vladimir > > On 4/18/16 2:44 PM, Civlin, Jan wrote: >> == Correction in the subject line === >> >> We would like to contribute the SHA256 AVX2 intrinsic. >> >> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >> >> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >> >> Contributor: Jan Civlin. >> >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >> From jan.civlin at intel.com Wed Apr 20 19:07:28 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Wed, 20 Apr 2016 19:07:28 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <5717CC83.3070401@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> Thank you! -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 20, 2016 11:38 AM To: Civlin, Jan ; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) Looks good to me. I submitted testing on all platforms before integrating. Thanks, Vladimir On 4/20/16 3:11 AM, Civlin, Jan wrote: > Vladimir, > > Please look at the updated patch at > http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ > > I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). > > The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. > > The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. > > Thank you, > > J > > [jcivlin at HSW-EP02 TestSHA]$ > ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java > -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics > -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar > 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes > offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 > fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 > 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA > throughput = 356.09558280340946 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ > ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ja > va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics > -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar > 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes > offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 > fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 > 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA > throughput = 354.1696071938408 MB/s > > [jcivlin at HSW-EP02 TestSHA]$ > ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ja > va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics > -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar > 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes > offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 > fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 > 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA > throughput = 349.01408678325697 MB/s > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, April 18, 2016 5:09 PM > To: Civlin, Jan; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no > supports_sha() available) > > Hi Jan, > > The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. > > I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. > > Please, move new code in macroAssembler_x86_sha.cpp to the end of file. > > _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: > > StubRoutines::x86::_k256_W_adr = generate_k256_W(); > > What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. > > Thanks, > Vladimir > > On 4/18/16 2:44 PM, Civlin, Jan wrote: >> == Correction in the subject line === >> >> We would like to contribute the SHA256 AVX2 intrinsic. >> >> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >> >> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >> >> Contributor: Jan Civlin. >> >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >> From nils.eliasson at oracle.com Wed Apr 20 19:26:32 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Apr 2016 21:26:32 +0200 Subject: RFR (S): 8154473: Update for CompilerDirectives to control stub generation and intrinsics In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56A800C7@ORSMSX106.amr.corp.intel.com> <57162A88.7030608@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A56A828A2@ORSMSX106.amr.corp.intel.com> Message-ID: <5717D7E8.5000108@oracle.com> In vmSymbols.cpp together with the other flag checks. Regards, Nils On 2016-04-20 02:44, Deshpande, Vivek R wrote: > > HI Nils > > Yes you are right the function accesses the command line flag > DisableIntrinsic and changes are static. > > Could you point me the right location for the function ? > > Also I have updated the webrev with rest of the comments here: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.01/ > > Regards, > > Vivek > > *From:*hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of > *Nils Eliasson > *Sent:* Tuesday, April 19, 2016 5:55 AM > *To:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR (S): 8154473: Update for CompilerDirectives to > control stub generation and intrinsics > > Hi Vivek, > > The changes in is_intrinsic_disabled in compilerDirectives.* are > static and only access the command line flag DisableIntrinsics. As > long as stubs are only generated during startup and don't have a > method context - that is ok - but it doesn't belong in the > compilerDirectives-files if it doens't use directives. > > Regards, > Nils > > On 2016-04-18 19:38, Deshpande, Vivek R wrote: > > Hi all > > I would like to contribute a patch which helps to control the > intrinsics in interpreter, c1 and c2 by disabling the stub generation. > > This uses -XX:DisableIntrinsic option to achieve the same. > > Could you please review and sponsor this patch. > > Bug-id: > > https://bugs.openjdk.java.net/browse/JDK-8154473 > webrev: > > http://cr.openjdk.java.net/~vdeshpande/CompilerDirectives/8154473/webrev.00/ > > > Thanks and regards, > > Vivek > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry.dmitriev at oracle.com Wed Apr 20 19:51:24 2016 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Wed, 20 Apr 2016 22:51:24 +0300 Subject: [9] RFR(S): 8086068: VM crashes with "-Xint -XX:+UseCompiler" options In-Reply-To: <5717884D.2020108@oracle.com> References: <5717884D.2020108@oracle.com> Message-ID: <5717DDBC.4030909@oracle.com> Hi Tobias, Can comment only about new test: I think that you don't need @library and @modules for this simple test. Not need a new webrev for that. Thank you! Dmitry On 20.04.2016 16:46, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8086068 > http://cr.openjdk.java.net/~thartmann/8086068/webrev.00/ > > The VM crashes in product or fails with an assert in debug if -Xint and -XX:+UseCompiler is set. This is because CodeCache::heap_available() relies on the fact that if Arguments::mode() == _int, no compilation will be triggered. Although UseCompiler is first disabled by -Xint, the flag may be re-enabled if set via command line. > > The solution is to catch such an inconsistent flag combination, issue a warning and reset the flag. > > Tested with regression test and RBT (running). > > Thanks, > Tobias From nils.eliasson at oracle.com Wed Apr 20 19:56:54 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Apr 2016 21:56:54 +0200 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <571519BE.605@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com> Message-ID: <5717DF06.3090305@oracle.com> Hi, Thanks for the help, I got it to work, and added NoSafePointVerifiers to make sure I hadn't missed anything. Then after many test iterations it failed again. It didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to the temporary buffer too, or relax the tag checks in the xml and accept that the output may need to be sorted by writer-thread before use. The output looks like: ... releases tty when blocking on a safepoint ... // back again after safepoint writing without ttylock now. // Here we fail on an assert today when we expect a closing print_nmethod tag This is malformed xml but has enough information to be reconstructed. Would this be an acceptable output? Regards, Nils On 2016-04-18 19:30, Vladimir Kozlov wrote: > tty would have the same problem but it use C_HEAP to allocate: > > defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) > defaultStream(); > > Please, look if you can do something similar. > > Thanks, > Vladimir > > On 4/18/16 4:24 AM, Nils Eliasson wrote: >> Resizeable is better, but then we assert on expanding the stringbuffer >> while being under a different ResourceMark. >> >> Regards, >> Nils >> >> On 2016-04-15 19:44, Vladimir Kozlov wrote: >>> Use resizable stream: >>> >>> stringStream(size_t initial_bufsize = 256); >>> >>> 1024 may not be enough. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this fix of print opto_assembly. >>>> >>>> Summary: >>>> The compilelog can get corrupted and the VM may assert on "failed: >>>> bad tag in log". >>>> >>>> When printing assembly in output.cpp we first take the ttylock, print >>>> the head and then the method metadata. However the >>>> metadata printing makes a vm entry and may block for a safepoint and >>>> will then release the lock >>>> (break_tty_lock_for_safepoint). After that some of the other compiler >>>> thread that haven't safepointed will take the lock >>>> and the broken log will be a fact when the safepoint is over and the >>>> first thread starts logging again. >>>> >>>> Solution: >>>> Print the method metadata to a temporary buffer, then take the tty >>>> lock. >>>> >>>> Testing: >>>> Repro from bug stops failing. >>>> Running :hotspot_all >>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>>> >>>> >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>>> >>>> Regards, >>>> Nils Eliasson >> From vladimir.kozlov at oracle.com Wed Apr 20 20:04:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 13:04:20 -0700 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> Message-ID: <5717E0C4.8050006@oracle.com> One thing was caught during build is ',' at the last line of enum: + STACK_SIZE = _RSP + _RSP_SIZE, +}; Compiler complains about it so I removed it in my local repo. Vladimir On 4/20/16 12:07 PM, Civlin, Jan wrote: > Thank you! > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 11:38 AM > To: Civlin, Jan ; hotspot compiler > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) > > Looks good to me. I submitted testing on all platforms before integrating. > > Thanks, > Vladimir > > On 4/20/16 3:11 AM, Civlin, Jan wrote: >> Vladimir, >> >> Please look at the updated patch at >> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >> >> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >> >> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >> >> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >> >> Thank you, >> >> J >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/java >> -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 >> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 >> 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >> throughput = 356.09558280340946 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/ja >> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 >> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 >> 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >> throughput = 354.1696071938408 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/ja >> va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 51 >> fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af e0 >> 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >> throughput = 349.01408678325697 MB/s >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, April 18, 2016 5:09 PM >> To: Civlin, Jan; hotspot compiler >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> Hi Jan, >> >> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >> >> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >> >> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >> >> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >> >> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >> >> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >> >> Thanks, >> Vladimir >> >> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>> == Correction in the subject line === >>> >>> We would like to contribute the SHA256 AVX2 intrinsic. >>> >>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>> >>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>> >>> Contributor: Jan Civlin. >>> >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8154495 >>> webrev: http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.00/ >>> From vladimir.kozlov at oracle.com Wed Apr 20 20:07:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Apr 2016 13:07:44 -0700 Subject: [9] RFR(S): 8153527: break_tty_lock_for_safepoint causes "assert(false) failed: bad tag in log" and broken compile log In-Reply-To: <5717DF06.3090305@oracle.com> References: <5711046B.9080808@oracle.com> <57112880.1010204@oracle.com> <5714C3D0.2070804@oracle.com> <571519BE.605@oracle.com> <5717DF06.3090305@oracle.com> Message-ID: <5717E190.5070107@oracle.com> On 4/20/16 12:56 PM, Nils Eliasson wrote: > Hi, > > Thanks for the help, > > I got it to work, and added NoSafePointVerifiers to make sure I hadn't > missed anything. Then after many test iterations it failed again. It > didn't fail on the NSPV, but in dump_asm we blocked on a VM entry to get > a ciSymbol->as_utf8. Now I am considering if I should direct dump_asm to > the temporary buffer too, or relax the tag checks in the xml and accept > that the output may need to be sorted by writer-thread before use. The > output looks like: > > > > ... > releases tty when blocking on a safepoint > > > ... > // back again after safepoint writing without > ttylock now. > // Here we fail on an assert today when we expect > a closing print_nmethod tag > > > > This is malformed xml but has enough information to be reconstructed. > Would this be an acceptable output? Yes, I think it is acceptable - we don't loose information. And it is not worse than it was before. Thanks, Vladimir > > Regards, > Nils > > > On 2016-04-18 19:30, Vladimir Kozlov wrote: >> tty would have the same problem but it use C_HEAP to allocate: >> >> defaultStream::instance = new(ResourceObj::C_HEAP, mtInternal) >> defaultStream(); >> >> Please, look if you can do something similar. >> >> Thanks, >> Vladimir >> >> On 4/18/16 4:24 AM, Nils Eliasson wrote: >>> Resizeable is better, but then we assert on expanding the stringbuffer >>> while being under a different ResourceMark. >>> >>> Regards, >>> Nils >>> >>> On 2016-04-15 19:44, Vladimir Kozlov wrote: >>>> Use resizable stream: >>>> >>>> stringStream(size_t initial_bufsize = 256); >>>> >>>> 1024 may not be enough. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 4/15/16 8:10 AM, Nils Eliasson wrote: >>>>> Hi, >>>>> >>>>> Please review this fix of print opto_assembly. >>>>> >>>>> Summary: >>>>> The compilelog can get corrupted and the VM may assert on "failed: >>>>> bad tag in log". >>>>> >>>>> When printing assembly in output.cpp we first take the ttylock, print >>>>> the head and then the method metadata. However the >>>>> metadata printing makes a vm entry and may block for a safepoint and >>>>> will then release the lock >>>>> (break_tty_lock_for_safepoint). After that some of the other compiler >>>>> thread that haven't safepointed will take the lock >>>>> and the broken log will be a fact when the safepoint is over and the >>>>> first thread starts logging again. >>>>> >>>>> Solution: >>>>> Print the method metadata to a temporary buffer, then take the tty >>>>> lock. >>>>> >>>>> Testing: >>>>> Repro from bug stops failing. >>>>> Running :hotspot_all >>>>> (http://jdash.se.oracle.com/rbt/rbt-nils.eliasson-compiler_control-20160415-1508-10854) >>>>> >>>>> >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153527 >>>>> Webrev: http://cr.openjdk.java.net/~neliasso/8153527/webrev.01/ >>>>> >>>>> Regards, >>>>> Nils Eliasson >>> > From jan.civlin at intel.com Wed Apr 20 20:13:37 2016 From: jan.civlin at intel.com (Civlin, Jan) Date: Wed, 20 Apr 2016 20:13:37 +0000 Subject: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) In-Reply-To: <5717E0C4.8050006@oracle.com> References: <39F83597C33E5F408096702907E6C4500F16BF24@ORSMSX104.amr.corp.intel.com> <57157726.4030701@oracle.com> <39F83597C33E5F408096702907E6C4500F16C4B3@ORSMSX104.amr.corp.intel.com> <5717CC83.3070401@oracle.com> <39F83597C33E5F408096702907E6C4500F16C55B@ORSMSX104.amr.corp.intel.com> <5717E0C4.8050006@oracle.com> Message-ID: <39F83597C33E5F408096702907E6C4500F16C598@ORSMSX104.amr.corp.intel.com> Thank you, Vladimir. I guess it was a warning. I usually keep a comma in the last line of enum so I will not need to change the existing lines if I add new. Section 6.7.2.2 of C99 lists the syntax as: enum-specifier: enum identifieropt { enumerator-list } enum identifieropt { enumerator-list , } enum identifier enumerator-list: enumerator enumerator-list , enumerator enumerator: enumeration-constant enumeration-constant = constant-expression -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 20, 2016 1:04 PM To: Civlin, Jan ; hotspot compiler Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no supports_sha() available) One thing was caught during build is ',' at the last line of enum: + STACK_SIZE = _RSP + _RSP_SIZE, +}; Compiler complains about it so I removed it in my local repo. Vladimir On 4/20/16 12:07 PM, Civlin, Jan wrote: > Thank you! > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 20, 2016 11:38 AM > To: Civlin, Jan ; hotspot compiler > > Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no > supports_sha() available) > > Looks good to me. I submitted testing on all platforms before integrating. > > Thanks, > Vladimir > > On 4/20/16 3:11 AM, Civlin, Jan wrote: >> Vladimir, >> >> Please look at the updated patch at >> http://cr.openjdk.java.net/~vdeshpande/8154495/webrev.01/ >> >> I removed the definitions of unused [v]movdqa(), vpsrldq(), vpslldq(). >> >> The k256_W is actually a table of the size of two k256 - each line of k256 is repeated twice. As you have suggested I made changes to generate k256_W from k256. >> >> The patch was tested in three configurations: slowdebug, release and fastdebug in Win/Linux 64. >> >> Thank you, >> >> J >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-release/jdk/bin/jav >> a -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >> e0 9c 24 2c 26 c9 TestSHA runtime = 28.756324129 seconds TestSHA >> throughput = 356.09558280340946 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-fastdebug/jdk/bin/j >> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >> e0 9c 24 2c 26 c9 TestSHA runtime = 28.912701124 seconds TestSHA >> throughput = 354.1696071938408 MB/s >> >> [jcivlin at HSW-EP02 TestSHA]$ >> ../../sha-041116/build/linux-x86_64-normal-server-slowdebug/jdk/bin/j >> a va -Xbatch -XX:+UseSHA -XX:+UseSHA256Intrinsics >> -XX:+ShowMessageBoxOnError -Dalgorithm=SHA-256 -jar TestSHA.jar >> 10000000 provider = SUN algorithm = SHA-256 msgSize = 1024 bytes >> offset = 0 iters = 10000000 warmupIters = 20000 hash [32]: 78 5b 07 >> 51 fc 2c 53 dc 14 a4 ce 3d 80 0e 69 ef 9c e1 00 9e b3 27 cc f4 58 af >> e0 9c 24 2c 26 c9 TestSHA runtime = 29.339789962 seconds TestSHA >> throughput = 349.01408678325697 MB/s >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, April 18, 2016 5:09 PM >> To: Civlin, Jan; hotspot compiler >> Subject: Re: RFR: 8154495: SHA256 AVX2 intrinsic (when no >> supports_sha() available) >> >> Hi Jan, >> >> The patch was generated on Windows and have ^M at the end of lines so I can't apply it to our sources. >> >> I don't see usage of new [v]movdqa(), vpsrldq(), vpslldq(), instructions. >> >> Please, move new code in macroAssembler_x86_sha.cpp to the end of file. >> >> _k256_W[] is the same as _k256[] with repeated 4 values. I would suggest to generated it dynamically in stubGenerator_x86_64.cpp based on _k256: >> >> StubRoutines::x86::_k256_W_adr = generate_k256_W(); >> >> What testing was done? Did you ran with fastdebug build? I am concern about size of new stub and current code_size2 is enough. >> >> Thanks, >> Vladimir >> >> On 4/18/16 2:44 PM, Civlin, Jan wrote: >>> == Correction in the subject line === >>> >>> We would like to contribute the SHA256 AVX2 intrinsic. >>> >>> This intrinsic is for x86 AVX2 architecture when no supports_sha() is available. It is L64 code only. >>> >>> The patch delivers x2 performance gain for the latency and throughput - both are measured on an average message. >>> >>> Contributor: Jan Civlin. >>> >