From tobias.hartmann at oracle.com Mon Jan 4 08:15:28 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Jan 2016 09:15:28 +0100 Subject: [8u] Request for approval: Backport of 8144487 and 8145754 Message-ID: <568A2A20.7030601@oracle.com> Hi, please approve and review the following backports to 8u. 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true https://bugs.openjdk.java.net/browse/JDK-8144487 http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI https://bugs.openjdk.java.net/browse/JDK-8145754 http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 Nightly testing showed no problems and the changes apply cleanly to 8u-dev. Thanks, Tobias From tobias.hartmann at oracle.com Mon Jan 4 09:30:33 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Jan 2016 10:30:33 +0100 Subject: [8u] Request for approval: Backport of 8144487 and 8145754 In-Reply-To: <568A2EE7.4030600@oracle.com> References: <568A2A20.7030601@oracle.com> <568A2EE7.4030600@oracle.com> Message-ID: <568A3BB9.1010501@oracle.com> Hi David, sure, I included the links to the code review: 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true https://bugs.openjdk.java.net/browse/JDK-8144487 http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020503.html http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI https://bugs.openjdk.java.net/browse/JDK-8145754 http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020502.html http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 Thanks, Tobias On 04.01.2016 09:35, david buck wrote: > Hi Tobias! > > Would you please include links to the code review threads on mail.openjdk.java.net? > > [ JDK 8 Updates: Push Approval Request Template ] > http://openjdk.java.net/projects/jdk8u/approval-template.html > > Cheers, > -Buck > > On 2016/01/04 17:15, Tobias Hartmann wrote: >> Hi, >> >> please approve and review the following backports to 8u. >> >> 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true >> https://bugs.openjdk.java.net/browse/JDK-8144487 >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 >> >> 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI >> https://bugs.openjdk.java.net/browse/JDK-8145754 >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 >> >> Nightly testing showed no problems and the changes apply cleanly to 8u-dev. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Mon Jan 4 10:04:06 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Jan 2016 11:04:06 +0100 Subject: [8u] Request for approval: Backport of 8144487 and 8145754 In-Reply-To: <568A41D7.2030503@oracle.com> References: <568A2A20.7030601@oracle.com> <568A2EE7.4030600@oracle.com> <568A3BB9.1010501@oracle.com> <568A41D7.2030503@oracle.com> Message-ID: <568A4396.7070301@oracle.com> Thanks, David! I will push this to 8u-dev as soon as I get a peer review for the backport. Best, Tobias On 04.01.2016 10:56, david buck wrote: > approved for backport to 8u-dev > > Thank you for adding the review links. > > Cheers, > -Buck > > On 2016/01/04 18:30, Tobias Hartmann wrote: >> Hi David, >> >> sure, I included the links to the code review: >> >> 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true >> https://bugs.openjdk.java.net/browse/JDK-8144487 >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020503.html >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 >> >> 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI >> https://bugs.openjdk.java.net/browse/JDK-8145754 >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020502.html >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 >> >> Thanks, >> Tobias >> >> On 04.01.2016 09:35, david buck wrote: >>> Hi Tobias! >>> >>> Would you please include links to the code review threads on mail.openjdk.java.net? >>> >>> [ JDK 8 Updates: Push Approval Request Template ] >>> http://openjdk.java.net/projects/jdk8u/approval-template.html >>> >>> Cheers, >>> -Buck >>> >>> On 2016/01/04 17:15, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please approve and review the following backports to 8u. >>>> >>>> 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true >>>> https://bugs.openjdk.java.net/browse/JDK-8144487 >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 >>>> >>>> 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI >>>> https://bugs.openjdk.java.net/browse/JDK-8145754 >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 >>>> >>>> Nightly testing showed no problems and the changes apply cleanly to 8u-dev. >>>> >>>> Thanks, >>>> Tobias >>>> From tobias.hartmann at oracle.com Mon Jan 4 11:35:43 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 4 Jan 2016 12:35:43 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: References: <55FBDFEC.4060405@oracle.com> <56139149.5080906@oracle.com> Message-ID: <568A590F.6030104@oracle.com> Hi Roland, sorry for the delay. On 07.10.2015 11:06, Roland Westrelin wrote: >>> Maybe we could add an IfProjNode::Ideal method that disconnects the other branch of the If when this branch is always taken and that does so even during parsing. Given Ideal is called before Identity, that would guarantee the next call to Identity optimizes the If out. >> >> As you suggested, I added an IfProjNode::Ideal that disconnects the never taken branch from the IfNode. The subsequent call to Identity then removes the IfNode: >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.03/ >> >> However, I wondered if this is "legal" because the comment in Node::ideal says: >> >> // The Ideal call almost arbitrarily reshape the graph rooted at the 'this' >> // pointer. >> >> But we are changing the graph "above" the this pointer. I executed tests with -XX:+VerifyIterativeGVN and everything seems to work fine. >> Another solution would be to cut the *current* branch if it is never taken: >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.02/ >> >> But this solution depends on the assumption that we execute the identity() of the other ProjNode which is not guaranteed by GVN (I think). >> >> Therefore I would like to go for webrev.03. I verified that this solves the problem and tested the fix with JPRT. > > I thought about this more and I don?t think either work ok. > > The problem with webrev.02 is that depending on the order the projection nodes are allocated and transformed, the optimization may not happened: > > Node* never_taken = new IfTrueNode(..); > Node* always_taken = new IfFalseNode(..); > always_taken = gvn.transform(always_taken); > never_taken = gvn.transform(never_taken); > > The problem with webrev.03 is that we may change a node that is not yet transformed (never_taken changed by call to gvn.transform(always_taken)). Not sure if it could break existing code but it?s clearly an unexpected behavior. Right, that could be a problem. > An other way would be to remove the in(0)->outcnt() == 1 check from IfProjNode::Identity() and in an IfProjNode::Ideal method do what you do in webrev.03 but when can_reshape is true only. Here is the new webrev: http://cr.openjdk.java.net/~thartmann/8136469/webrev.04/ However, I'm afraid that this re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Right? Thanks, Tobias From paul.sandoz at oracle.com Mon Jan 4 11:42:15 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 4 Jan 2016 12:42:15 +0100 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> Message-ID: <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> Hi, > On 31 Dec 2015, at 22:33, John Rose wrote: > > When performing explicit range checks in pre-intrinsic code, > let's try to use the new intrinsic functions in java.util.Objects, > called checkIndex, checkFromToIndex, and checkFromIndexSize. At the moment only checkIndex is a C2 intrinsic, we could revisit making the others intrinsic as well based on use-cases. > These are simpler, safer, and more maintainable than our previous > practice of using hand-written "random logic", such as in this bug: > http://hg.openjdk.java.net/jdk9/hs-comp/jdk/rev/cb31a76eecd1#l1.52 > Yes, in this case i believe the calls to cryptBlockCheck 176 cryptBlockCheck(in, inOff, len); 177 cryptBlockCheck(out, outOff, len); 178 return implCrypt(in, inOff, len, out, outOff); could be replaced with: Objects.checkFromIndexSize(inOff, len, in.length, ); Objects.checkFromIndexSize(outOff, len, out.length, ); return implCrypt(in, inOff, len, out, outOff); Paul. > Depending on the documented API, it is usually enough that the > thrown exception be a RuntimeException of any sort. By default, > the methods throw a generic IndexOutOfBoundsException. > In cases where a particular exception must be thrown, the Objects > methods provide an optional "hook" for building the desired exception. > > In this case, since the code is already pushed, we should clean it > up as part of this bug: > https://bugs.openjdk.java.net/browse/JDK-8135250 > > ? John > > On Dec 29, 2015, at 9:33 AM, Kharbas, Kishor wrote: >> >> That's great.. Thank you! >> >> I will keep the jcheck tip in mind for next time :) >> >> - Kishor >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, December 29, 2015 12:47 AM >> To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net >> Cc: Anthony Scarpino >> Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES >> >> Hi Kishor, >> >> There were coding style problems which I fixed. Please, do cleanup in a future (use jcheck). >> >> src/cpu/x86/vm/stubGenerator_x86_32.cpp:2144: Trailing whitespace >> src/cpu/x86/vm/stubGenerator_x86_64.cpp:3061: Trailing whitespace >> src/cpu/x86/vm/stubRoutines_x86.hpp:36: Trailing whitespace >> src/cpu/x86/vm/vm_version_x86.cpp:709: Trailing whitespace >> src/share/vm/opto/library_call.cpp:702: Trailing whitespace >> src/share/vm/opto/runtime.hpp:317: Trailing whitespace >> >> src/share/vm/opto/library_call.cpp:5789: Carriage return (^M) >> >> I submitted push job. Lets see how it will go. >> >> Regards, >> Vladimir >> >> On 12/28/15 8:48 PM, Kharbas, Kishor wrote: >>> Vladimir, sorry that file was added accidentally. >>> Here is an updated patch - >>> http://cr.openjdk.java.net/~vdeshpande/8143925/webrev.01/ >>> >>> This patch includes, >>> 1. Changes to some comments. >>> 2. Small correction in vm_version_x86.cpp. >>> 3. Removal of version.rc file. >>> >>> Thanks for reviewing the code. >>> >>> Kishor >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, December 24, 2015 4:36 PM >>> To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net >>> Cc: Anthony Scarpino >>> Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES >>> >>> What are the changes in src/os/windows/vm/version.rc? >>> >>> Otherwise this looks good. >>> >>> Thanks, >>> Vladimir >>> >>> On 12/24/15 2:26 PM, Kharbas, Kishor wrote: >>>> Hello all, >>>> >>>> Thank you Vladimir and Anthony for your inputs so far. >>>> I have updated the hotspot based on the suggestions and also added CTR mode to jtreg test. >>>> >>>> During testing I also noticed that the Java code for CounterMode.crypt() uses the partially used encrypted counter from previous invocation and also saves the last encryptedCounter for next invocation. >>>> This case was not handled by the intrinsic. I have fixed this in the latest patch. >>>> >>>> Summary of changes: >>>> 1. Proper disabling of UseAESCTRIntrinsic flag based on hardware >>>> support 2. Adding the missing support explained above. >>>> 3. Added CTR mode in jtreg test 7184394 4. Added and changed some >>>> encodings (pextr and pinsr) in assembler_x86.cpp >>>> >>>> The updated hotspot webrev is at : >>>> http://cr.openjdk.java.net/~vdeshpande/8143925/webrev.00/ >>>> There is no update to jdk webrev posted earlier which is >>>> http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.02/ >>>> Bug id : https://bugs.openjdk.java.net/browse/JDK-8143925 >>>> >>>> Much appreciated! >>>> >>>> Happy holidays! >>>> Kishor >>>> >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Friday, December 04, 2015 3:59 PM >>>> To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net >>>> Cc: Anthony Scarpino >>>> Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES >>>> >>>> jdk: http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.02/ >>>> >>>> JDK changes looks good to me. >>>> >>>> hotspot: >>>> http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.04/ >>>> >>>> Please, set flag to 'false' on platforms which does not support this >>>> intrinsic: >>>> >>>> if (UseAESCTRIntrinsics) { >>>> warning("AES/CTR intrinsics are not available on this CPU"); >>>> FLAG_SET_DEFAULT(UseAESCTRIntrinsics, false); >>>> } >>>> >>>> Also Anthony asked to add test for this intrinsic. Please do it: >>>> >>>> "2) It would be good to add CTR to the TestAES tests. It's in hotspot/test/compiler/codegen/7184394/. The test currently has CBC, ECB, and GCM in it, so it should be easy. It's also the only test I know of that tests the intrinsic. None of the tests in the jdk repo that I know of loop enough to trigger the intrinsic." >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 12/4/15 1:40 PM, Kharbas, Kishor wrote: >>>>> Thanks Vladimir for the feedback! >>>>> >>>>> I have updated the jbs entry with the new patch. >>>>> >>>>> JDK changes : added range checks in the JDK using additional methods. >>>>> Hotspot changes : renamed the UseCTRAESIntrinsics flag to >>>>> UseAESCTRIntrinsics >>>>> >>>>> Further review and feedback is appreciated! >>>>> >>>>> - Kishor >>>>> >>>>> -----Original Message----- >>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>> Sent: Tuesday, December 01, 2015 5:32 PM >>>>> To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES >>>>> >>>>> Hotspot changes seems fine. But JDK changes should have additional method for range checks - this is new requirement for intrinsics which access arrays. See, for example, cryptBlockCheck() in AESCrypt.java. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 11/24/15 2:33 PM, Kharbas, Kishor wrote: >>>>>> Hello all, >>>>>> >>>>>> I request the community to review a patch for enhancing >>>>>> CounterMode.crypt() for AES. This patch defines intrinsic for >>>>>> CounterMode.crypt() to leverage the parallel nature of AES in >>>>>> Counter >>>>>> (CTR) Mode. >>>>>> >>>>>> This is achieved by operating on 6 blocks in parallel to issue >>>>>> independent x86 AES-NI instructions and keep the CPU pipeline full. >>>>>> >>>>>> Testing on micro-benchmark has shown a speedup of 4x-6x. >>>>>> >>>>>> Bug id: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8143925 >>>>>> >>>>>> Webrev: >>>>>> >>>>>> hotspot: >>>>>> http://cr.openjdk.java.net/~mcberg/8143925/hotspot/webrev.02/ >>>>>> >>>>>> jdk: >>>>>> http://cr.openjdk.java.net/~mcberg/8143925/jdk/webrev.01/ >>>>>> >>>>>> Much appreciated! >>>>>> >>>>>> Kishor Kharbas >>>>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From vladimir.kozlov at oracle.com Mon Jan 4 15:48:08 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Jan 2016 07:48:08 -0800 Subject: [8u] Request for approval: Backport of 8144487 and 8145754 In-Reply-To: <568A3BB9.1010501@oracle.com> References: <568A2A20.7030601@oracle.com> <568A2EE7.4030600@oracle.com> <568A3BB9.1010501@oracle.com> Message-ID: <568A9438.3010400@oracle.com> Looks good. Thanks, Vladimir On 1/4/16 1:30 AM, Tobias Hartmann wrote: > Hi David, > > sure, I included the links to the code review: > > 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true > https://bugs.openjdk.java.net/browse/JDK-8144487 > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020503.html > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 > > 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI > https://bugs.openjdk.java.net/browse/JDK-8145754 > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020502.html > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 > > Thanks, > Tobias > > On 04.01.2016 09:35, david buck wrote: >> Hi Tobias! >> >> Would you please include links to the code review threads on mail.openjdk.java.net? >> >> [ JDK 8 Updates: Push Approval Request Template ] >> http://openjdk.java.net/projects/jdk8u/approval-template.html >> >> Cheers, >> -Buck >> >> On 2016/01/04 17:15, Tobias Hartmann wrote: >>> Hi, >>> >>> please approve and review the following backports to 8u. >>> >>> 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true >>> https://bugs.openjdk.java.net/browse/JDK-8144487 >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 >>> >>> 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI >>> https://bugs.openjdk.java.net/browse/JDK-8145754 >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 >>> >>> Nightly testing showed no problems and the changes apply cleanly to 8u-dev. >>> >>> Thanks, >>> Tobias >>> From christian.thalinger at oracle.com Mon Jan 4 17:16:59 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Jan 2016 07:16:59 -1000 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: References: Message-ID: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> > On Dec 22, 2015, at 4:50 AM, Doug Simon wrote: > > The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: > > 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. > 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. > > This change removes the JVMCI command line option support. > > https://bugs.openjdk.java.net/browse/JDK-8146001 > http://cr.openjdk.java.net/~dnsimon/8146001/ + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); We should either use the jvmci. prefix or not. src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java: - @Option(help = "", type = OptionType.Debug) - public static final OptionValue UseProfilingInformation = new OptionValue<>(true); We are using this flag so we need to keep it. > > -Doug From christian.thalinger at oracle.com Mon Jan 4 17:19:32 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Jan 2016 07:19:32 -1000 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> References: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> Message-ID: > On Jan 4, 2016, at 7:16 AM, Christian Thalinger wrote: > >> >> On Dec 22, 2015, at 4:50 AM, Doug Simon wrote: >> >> The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: >> >> 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. >> 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. >> >> This change removes the JVMCI command line option support. >> >> https://bugs.openjdk.java.net/browse/JDK-8146001 >> http://cr.openjdk.java.net/~dnsimon/8146001/ > > + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); > > + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); > > We should either use the jvmci. prefix or not. Sorry, I was reading the patch wrong. Of course both use the jvmci. prefix. > > src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java: > > - @Option(help = "", type = OptionType.Debug) > - public static final OptionValue UseProfilingInformation = new OptionValue<>(true); > > We are using this flag so we need to keep it. > >> >> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Jan 4 17:41:39 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Jan 2016 07:41:39 -1000 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: References: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> Message-ID: <8DE14AF8-90A4-4DF2-9CC2-98EE2E4F8670@oracle.com> > On Jan 4, 2016, at 7:19 AM, Christian Thalinger wrote: > >> >> On Jan 4, 2016, at 7:16 AM, Christian Thalinger > wrote: >> >>> >>> On Dec 22, 2015, at 4:50 AM, Doug Simon > wrote: >>> >>> The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: >>> >>> 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. >>> 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. >>> >>> This change removes the JVMCI command line option support. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8146001 >>> http://cr.openjdk.java.net/~dnsimon/8146001/ >> >> + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); >> >> + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); >> >> We should either use the jvmci. prefix or not. > > Sorry, I was reading the patch wrong. Of course both use the jvmci. prefix. I think we should prefix the property name in getBooleanProperty: + public static boolean getBooleanProperty(String name, boolean def) { + String value = VM.getSavedProperty("jvmci." + name); and I put UseProfilingInformation back: diff -r 0fcfe4b07f7e src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Tue Dec 29 18:30:51 2015 +0100 +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 04 07:40:46 2016 -1000 @@ -24,7 +24,6 @@ package jdk.vm.ci.hotspot; import static jdk.vm.ci.hotspot.CompilerToVM.compilerToVM; import static jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime; -import static jdk.vm.ci.hotspot.HotSpotResolvedJavaMethod.Options.UseProfilingInformation; import static jdk.vm.ci.hotspot.HotSpotVMConfig.config; import static jdk.vm.ci.hotspot.UnsafeAccess.UNSAFE; @@ -65,6 +64,11 @@ import jdk.vm.ci.meta.TriState; final class HotSpotResolvedJavaMethodImpl extends HotSpotMethod implements HotSpotResolvedJavaMethod, HotSpotProxified, MetaspaceWrapperObject { /** + * Whether to use profiling information. + */ + private static final boolean UseProfilingInformation = HotSpotJVMCIRuntime.getBooleanProperty("UseProfilingInformation", true); + + /** * Reference to metaspace Method object. */ private final long metaspaceMethod; @@ -424,7 +428,7 @@ final class HotSpotResolvedJavaMethodImp public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { ProfilingInfo info; - if (UseProfilingInformation.getValue() && methodData == null) { + if (UseProfilingInformation && methodData == null) { long metaspaceMethodData = UNSAFE.getAddress(metaspaceMethod + config().methodDataOffset); if (metaspaceMethodData != 0) { methodData = new HotSpotMethodData(metaspaceMethodData, this); > >> >> src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java: >> >> - @Option(help = "", type = OptionType.Debug) >> - public static final OptionValue UseProfilingInformation = new OptionValue<>(true); >> >> We are using this flag so we need to keep it. >> >>> >>> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Mon Jan 4 20:12:28 2016 From: john.r.rose at oracle.com (John Rose) Date: Mon, 4 Jan 2016 12:12:28 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> Message-ID: On Jan 4, 2016, at 3:42 AM, Paul Sandoz wrote: > > Hi, > >> On 31 Dec 2015, at 22:33, John Rose > wrote: >> >> When performing explicit range checks in pre-intrinsic code, >> let's try to use the new intrinsic functions in java.util.Objects, >> called checkIndex, checkFromToIndex, and checkFromIndexSize. > > At the moment only checkIndex is a C2 intrinsic, we could revisit making the others intrinsic as well based on use-cases. Corrected, thanks. They don't need to be intrinsics if they optimize well. The point is that the library functions have code shapes which work well with the JIT. For example, the multi-index checks might (as in Kishor's code) be implemented on top of the single-index check, without themselves being intrinsics. > >> These are simpler, safer, and more maintainable than our previous >> practice of using hand-written "random logic", such as in this bug: >> http://hg.openjdk.java.net/jdk9/hs-comp/jdk/rev/cb31a76eecd1#l1.52 >> > > Yes, in this case i believe the calls to cryptBlockCheck > > 176 cryptBlockCheck(in, inOff, len); > 177 cryptBlockCheck(out, outOff, len); > 178 return implCrypt(in, inOff, len, out, outOff); > could be replaced with: > > Objects.checkFromIndexSize(inOff, len, in.length, ); > Objects.checkFromIndexSize(outOff, len, out.length, ); > return implCrypt(in, inOff, len, out, outOff); Yes. And if that doesn't produce clean code, it's a JIT bug, not a JDK bug. One caveat: If the BiFunction must produce a message with the index, it is not a constant and we might have potential capture costs. The correct trade-off here is to either simplify the message, or ask the JIT to scalarize (EA-away) the closure node on the hot path, or even add another entry point (with a TriFunction, perhaps). As I pointed out below, we can just simplify the message. ? John > Paul. > > >> Depending on the documented API, it is usually enough that the >> thrown exception be a RuntimeException of any sort. By default, >> the methods throw a generic IndexOutOfBoundsException. >> In cases where a particular exception must be thrown, the Objects >> methods provide an optional "hook" for building the desired exception. >> >> In this case, since the code is already pushed, we should clean it >> up as part of this bug: >> https://bugs.openjdk.java.net/browse/JDK-8135250 >> >> ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Mon Jan 4 22:31:14 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 4 Jan 2016 23:31:14 +0100 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: <8DE14AF8-90A4-4DF2-9CC2-98EE2E4F8670@oracle.com> References: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> <8DE14AF8-90A4-4DF2-9CC2-98EE2E4F8670@oracle.com> Message-ID: <1297DA97-3C65-403D-AB46-16E203A74F26@oracle.com> > On 04 Jan 2016, at 18:41, Christian Thalinger wrote: > >> >> On Jan 4, 2016, at 7:19 AM, Christian Thalinger wrote: >> >>> >>> On Jan 4, 2016, at 7:16 AM, Christian Thalinger wrote: >>> >>>> >>>> On Dec 22, 2015, at 4:50 AM, Doug Simon wrote: >>>> >>>> The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: >>>> >>>> 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. >>>> 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. >>>> >>>> This change removes the JVMCI command line option support. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8146001 >>>> http://cr.openjdk.java.net/~dnsimon/8146001/ >>> >>> + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); >>> >>> + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); >>> >>> We should either use the jvmci. prefix or not. >> >> Sorry, I was reading the patch wrong. Of course both use the jvmci. prefix. > > I think we should prefix the property name in getBooleanProperty: > > + public static boolean getBooleanProperty(String name, boolean def) { > + String value = VM.getSavedProperty("jvmci." + name); Ok, sounds reasonable. > > and I put UseProfilingInformation back: > > diff -r 0fcfe4b07f7e src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java > --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Tue Dec 29 18:30:51 2015 +0100 > +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 04 07:40:46 2016 -1000 > @@ -24,7 +24,6 @@ package jdk.vm.ci.hotspot; > > import static jdk.vm.ci.hotspot.CompilerToVM.compilerToVM; > import static jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime; > -import static jdk.vm.ci.hotspot.HotSpotResolvedJavaMethod.Options.UseProfilingInformation; > import static jdk.vm.ci.hotspot.HotSpotVMConfig.config; > import static jdk.vm.ci.hotspot.UnsafeAccess.UNSAFE; > > @@ -65,6 +64,11 @@ import jdk.vm.ci.meta.TriState; > final class HotSpotResolvedJavaMethodImpl extends HotSpotMethod implements HotSpotResolvedJavaMethod, HotSpotProxified, MetaspaceWrapperObject { > > /** > + * Whether to use profiling information. > + */ > + private static final boolean UseProfilingInformation = HotSpotJVMCIRuntime.getBooleanProperty("UseProfilingInformation", true); > + > + /** > * Reference to metaspace Method object. > */ > private final long metaspaceMethod; > @@ -424,7 +428,7 @@ final class HotSpotResolvedJavaMethodImp > public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { > ProfilingInfo info; > > - if (UseProfilingInformation.getValue() && methodData == null) { > + if (UseProfilingInformation && methodData == null) { > long metaspaceMethodData = UNSAFE.getAddress(metaspaceMethod + config().methodDataOffset); > if (metaspaceMethodData != 0) { > methodData = new HotSpotMethodData(metaspaceMethodData, this); JVMCI should unconditionally return available profiling information. It's up to the compiler whether or not to use it. For example, this is now compilation local in Graal: http://hg.openjdk.java.net/graal/graal-compiler/rev/f35e653aa876#l16.16 -Doug From vladimir.kozlov at oracle.com Mon Jan 4 22:46:48 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Jan 2016 14:46:48 -0800 Subject: RFR(S): 8139771: Eliminating CastPP nodes at Phis when they all come from a unique input may cause crash In-Reply-To: References: <56623E4A.9040504@oracle.com> <56663B29.7050508@oracle.com> <5021FF7F-DA52-44D0-A7E5-DAEFFC5992C1@oracle.com> <566F45F9.5000304@oracle.com> <5670BF83.7060907@oracle.com> Message-ID: <568AF658.8020607@oracle.com> The comment is wrong. I added it for next changes: https://bugs.openjdk.java.net/browse/JDK-7004535 But later removed the change due to next bug but I did not updated the comment: https://bugs.openjdk.java.net/browse/JDK-7068051 "The code added in 7004535 changes does not take into account that cloning/moving predicates below merge points invalidate jvm states recorded in corresponding uncommon traps. Phi nodes should be created for values referenced by predicate's uncommon traps when a predicate is cloned." "Remove predicate cloning from loop peeling optimization and from split fall-in paths. Leave it in loop unswitching code which is safe. Don't allow split loop entry path in IGVN optimization for Phi nodes. And do not clone predicates below merge points in split-if optimization. Remove move_loop_predicate() and eliminate_loop_predicates() unused methods." Thanks, Vladimir On 12/16/15 12:49 AM, Roland Westrelin wrote: >>> For reference, current webrev: >>> >>> http://cr.openjdk.java.net/~roland/8139771/webrev.01/ >>> >>>>> As you suggested I made CheckCastPP inherit from ConstraintCast. I also hit the following bug: one iteration of a loop is peeled which causes a CastPP to be pinned between the loop and the predicates. When a predicate that depends on the CastPP is moved out of the loop, it is moved above the CastPP. I fixed by marking all nodes that depend on a node pinned between a loop and the predicates as non loop invariant. I don?t think fixing it by moving the cast up above the predicates is a safe fix in general. >>>> >>>> Hmm. The test which depends on CastPP should be also peeled and it will dominate the test in main loop. If a test/predicate could be moved from main loop then it should be possible to use peeled one. What do you think? >>> >>> Let me take another look at this. >>> Independently: so we never apply loop predication before peeling? Otherwise moving the peeled body before the loop predicate could be incorrect, right (predicates could have been moved out of the body before it?s peeled)? >> >> We never peel before predicates. Peeling does not know about them. The peeled iteration is placed between predicates and peeled loop head. > > The comment in PhaseIdealLoop::do_peeling() implies that the peeled iteration is above the predicates. We can apply loop predication then peeling. If the peeled iteration is above the predicates, isn?t there a risk the peeled iteration is executed before a predicate it depends on for correctness? > > Roland. > >> >> Vladimir >> >>> >>> Roland. > From christian.thalinger at oracle.com Mon Jan 4 22:47:34 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 4 Jan 2016 12:47:34 -1000 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: <1297DA97-3C65-403D-AB46-16E203A74F26@oracle.com> References: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> <8DE14AF8-90A4-4DF2-9CC2-98EE2E4F8670@oracle.com> <1297DA97-3C65-403D-AB46-16E203A74F26@oracle.com> Message-ID: <6C07E8DD-50D4-4B2E-BD8E-B131579A9664@oracle.com> > On Jan 4, 2016, at 12:31 PM, Doug Simon wrote: > >> >> On 04 Jan 2016, at 18:41, Christian Thalinger wrote: >> >>> >>> On Jan 4, 2016, at 7:19 AM, Christian Thalinger wrote: >>> >>>> >>>> On Jan 4, 2016, at 7:16 AM, Christian Thalinger wrote: >>>> >>>>> >>>>> On Dec 22, 2015, at 4:50 AM, Doug Simon wrote: >>>>> >>>>> The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: >>>>> >>>>> 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. >>>>> 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. >>>>> >>>>> This change removes the JVMCI command line option support. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8146001 >>>>> http://cr.openjdk.java.net/~dnsimon/8146001/ >>>> >>>> + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); >>>> >>>> + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); >>>> >>>> We should either use the jvmci. prefix or not. >>> >>> Sorry, I was reading the patch wrong. Of course both use the jvmci. prefix. >> >> I think we should prefix the property name in getBooleanProperty: >> >> + public static boolean getBooleanProperty(String name, boolean def) { >> + String value = VM.getSavedProperty("jvmci." + name); > > Ok, sounds reasonable. > >> >> and I put UseProfilingInformation back: >> >> diff -r 0fcfe4b07f7e src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Tue Dec 29 18:30:51 2015 +0100 >> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 04 07:40:46 2016 -1000 >> @@ -24,7 +24,6 @@ package jdk.vm.ci.hotspot; >> >> import static jdk.vm.ci.hotspot.CompilerToVM.compilerToVM; >> import static jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime; >> -import static jdk.vm.ci.hotspot.HotSpotResolvedJavaMethod.Options.UseProfilingInformation; >> import static jdk.vm.ci.hotspot.HotSpotVMConfig.config; >> import static jdk.vm.ci.hotspot.UnsafeAccess.UNSAFE; >> >> @@ -65,6 +64,11 @@ import jdk.vm.ci.meta.TriState; >> final class HotSpotResolvedJavaMethodImpl extends HotSpotMethod implements HotSpotResolvedJavaMethod, HotSpotProxified, MetaspaceWrapperObject { >> >> /** >> + * Whether to use profiling information. >> + */ >> + private static final boolean UseProfilingInformation = HotSpotJVMCIRuntime.getBooleanProperty("UseProfilingInformation", true); >> + >> + /** >> * Reference to metaspace Method object. >> */ >> private final long metaspaceMethod; >> @@ -424,7 +428,7 @@ final class HotSpotResolvedJavaMethodImp >> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >> ProfilingInfo info; >> >> - if (UseProfilingInformation.getValue() && methodData == null) { >> + if (UseProfilingInformation && methodData == null) { >> long metaspaceMethodData = UNSAFE.getAddress(metaspaceMethod + config().methodDataOffset); >> if (metaspaceMethodData != 0) { >> methodData = new HotSpotMethodData(metaspaceMethodData, this); > > JVMCI should unconditionally return available profiling information. It's up to the compiler whether or not to use it. For example, this is now compilation local in Graal: > > http://hg.openjdk.java.net/graal/graal-compiler/rev/f35e653aa876#l16.16 Oh, I missed that. Yes, that works for us as well. Thanks for pointing that out. > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Jan 4 23:52:14 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 4 Jan 2016 15:52:14 -0800 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568A590F.6030104@oracle.com> References: <55FBDFEC.4060405@oracle.com> <56139149.5080906@oracle.com> <568A590F.6030104@oracle.com> Message-ID: <568B05AE.7020700@oracle.com> On 1/4/16 3:35 AM, Tobias Hartmann wrote: > Hi Roland, > > sorry for the delay. > > On 07.10.2015 11:06, Roland Westrelin wrote: >>>> Maybe we could add an IfProjNode::Ideal method that disconnects the other branch of the If when this branch is always taken and that does so even during parsing. Given Ideal is called before Identity, that would guarantee the next call to Identity optimizes the If out. >>> >>> As you suggested, I added an IfProjNode::Ideal that disconnects the never taken branch from the IfNode. The subsequent call to Identity then removes the IfNode: >>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.03/ >>> >>> However, I wondered if this is "legal" because the comment in Node::ideal says: >>> >>> // The Ideal call almost arbitrarily reshape the graph rooted at the 'this' >>> // pointer. >>> >>> But we are changing the graph "above" the this pointer. I executed tests with -XX:+VerifyIterativeGVN and everything seems to work fine. >>> Another solution would be to cut the *current* branch if it is never taken: >>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.02/ >>> >>> But this solution depends on the assumption that we execute the identity() of the other ProjNode which is not guaranteed by GVN (I think). >>> >>> Therefore I would like to go for webrev.03. I verified that this solves the problem and tested the fix with JPRT. >> >> I thought about this more and I don?t think either work ok. >> >> The problem with webrev.02 is that depending on the order the projection nodes are allocated and transformed, the optimization may not happened: >> >> Node* never_taken = new IfTrueNode(..); >> Node* always_taken = new IfFalseNode(..); >> always_taken = gvn.transform(always_taken); >> never_taken = gvn.transform(never_taken); >> >> The problem with webrev.03 is that we may change a node that is not yet transformed (never_taken changed by call to gvn.transform(always_taken)). Not sure if it could break existing code but it?s clearly an unexpected behavior. > > Right, that could be a problem. I don't see a problem. But IfProjNode::Ideal() should have additional checks for that: // Check for dead control input if (in(0) && remove_dead_region(phase, can_reshape)) { return this; } // Don't bother trying to transform a dead node if (in(0) && in(0)->is_top()) { return NULL; } Also instead of set_req() use: PhaseIterGVN* igvn = phase->is_IterGVN(); igvn->replace_input_of(other, 0, phase->C->top()); This way following gvn.transform(never_taken); will work fine. Thanks, Vladimir > >> An other way would be to remove the in(0)->outcnt() == 1 check from IfProjNode::Identity() and in an IfProjNode::Ideal method do what you do in webrev.03 but when can_reshape is true only. > > Here is the new webrev: > http://cr.openjdk.java.net/~thartmann/8136469/webrev.04/ > > However, I'm afraid that this re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Right? > > Thanks, > Tobias > From tobias.hartmann at oracle.com Tue Jan 5 06:03:14 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Jan 2016 07:03:14 +0100 Subject: [8u] Request for approval: Backport of 8144487 and 8145754 In-Reply-To: <568A9438.3010400@oracle.com> References: <568A2A20.7030601@oracle.com> <568A2EE7.4030600@oracle.com> <568A3BB9.1010501@oracle.com> <568A9438.3010400@oracle.com> Message-ID: <568B5CA2.4090003@oracle.com> Thanks, Vladimir. Best, Tobias On 04.01.2016 16:48, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 1/4/16 1:30 AM, Tobias Hartmann wrote: >> Hi David, >> >> sure, I included the links to the code review: >> >> 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true >> https://bugs.openjdk.java.net/browse/JDK-8144487 >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020503.html >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 >> >> 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI >> https://bugs.openjdk.java.net/browse/JDK-8145754 >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020502.html >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 >> >> Thanks, >> Tobias >> >> On 04.01.2016 09:35, david buck wrote: >>> Hi Tobias! >>> >>> Would you please include links to the code review threads on mail.openjdk.java.net? >>> >>> [ JDK 8 Updates: Push Approval Request Template ] >>> http://openjdk.java.net/projects/jdk8u/approval-template.html >>> >>> Cheers, >>> -Buck >>> >>> On 2016/01/04 17:15, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please approve and review the following backports to 8u. >>>> >>>> 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true >>>> https://bugs.openjdk.java.net/browse/JDK-8144487 >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 >>>> >>>> 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI >>>> https://bugs.openjdk.java.net/browse/JDK-8145754 >>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 >>>> >>>> Nightly testing showed no problems and the changes apply cleanly to 8u-dev. >>>> >>>> Thanks, >>>> Tobias >>>> From tobias.hartmann at oracle.com Tue Jan 5 07:58:01 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Jan 2016 08:58:01 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568B05AE.7020700@oracle.com> References: <55FBDFEC.4060405@oracle.com> <56139149.5080906@oracle.com> <568A590F.6030104@oracle.com> <568B05AE.7020700@oracle.com> Message-ID: <568B7789.5010600@oracle.com> Hi Vladimir, thanks for the review. On 05.01.2016 00:52, Vladimir Kozlov wrote: > On 1/4/16 3:35 AM, Tobias Hartmann wrote: >> Hi Roland, >> >> sorry for the delay. >> >> On 07.10.2015 11:06, Roland Westrelin wrote: >>>>> Maybe we could add an IfProjNode::Ideal method that disconnects the other branch of the If when this branch is always taken and that does so even during parsing. Given Ideal is called before Identity, that would guarantee the next call to Identity optimizes the If out. >>>> >>>> As you suggested, I added an IfProjNode::Ideal that disconnects the never taken branch from the IfNode. The subsequent call to Identity then removes the IfNode: >>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.03/ >>>> >>>> However, I wondered if this is "legal" because the comment in Node::ideal says: >>>> >>>> // The Ideal call almost arbitrarily reshape the graph rooted at the 'this' >>>> // pointer. >>>> >>>> But we are changing the graph "above" the this pointer. I executed tests with -XX:+VerifyIterativeGVN and everything seems to work fine. >>>> Another solution would be to cut the *current* branch if it is never taken: >>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.02/ >>>> >>>> But this solution depends on the assumption that we execute the identity() of the other ProjNode which is not guaranteed by GVN (I think). >>>> >>>> Therefore I would like to go for webrev.03. I verified that this solves the problem and tested the fix with JPRT. >>> >>> I thought about this more and I don?t think either work ok. >>> >>> The problem with webrev.02 is that depending on the order the projection nodes are allocated and transformed, the optimization may not happened: >>> >>> Node* never_taken = new IfTrueNode(..); >>> Node* always_taken = new IfFalseNode(..); >>> always_taken = gvn.transform(always_taken); >>> never_taken = gvn.transform(never_taken); >>> >>> The problem with webrev.03 is that we may change a node that is not yet transformed (never_taken changed by call to gvn.transform(always_taken)). Not sure if it could break existing code but it?s clearly an unexpected behavior. >> >> Right, that could be a problem. > > I don't see a problem. But IfProjNode::Ideal() should have additional checks for that: > > // Check for dead control input > if (in(0) && remove_dead_region(phase, can_reshape)) { > return this; > } > // Don't bother trying to transform a dead node > if (in(0) && in(0)->is_top()) { > return NULL; > } Right, I'll add those. > Also instead of set_req() use: > > PhaseIterGVN* igvn = phase->is_IterGVN(); > igvn->replace_input_of(other, 0, phase->C->top()); > > This way following gvn.transform(never_taken); will work fine. But this assumes that we are only executing the code with IGVN but we also want to cut off the dead branch with GVN. Or am I missing something? Thanks, Tobias > > Thanks, > Vladimir > >> >>> An other way would be to remove the in(0)->outcnt() == 1 check from IfProjNode::Identity() and in an IfProjNode::Ideal method do what you do in webrev.03 but when can_reshape is true only. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.04/ >> >> However, I'm afraid that this re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Right? >> >> Thanks, >> Tobias >> From aph at redhat.com Tue Jan 5 09:48:56 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 5 Jan 2016 09:48:56 +0000 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> Message-ID: <568B9188.6000506@redhat.com> On 04/01/16 20:12, John Rose wrote: > Corrected, thanks. They don't need to be intrinsics if they optimize well. > The point is that the library functions have code shapes which work well > with the JIT. For example, the multi-index checks might (as in Kishor's code) > be implemented on top of the single-index check, without themselves being > intrinsics. We seem to be missing the opportunity to convert i >= 0 && i < size into (unsigned)i < (unsigned)size and this is, as far as I can see, the only real code-quality advantage of the checkIndex intrinsic. Could we not do this optimization and then drop the C2 checkIndex intrinsic? Andrew. From paul.sandoz at oracle.com Tue Jan 5 10:23:19 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 5 Jan 2016 11:23:19 +0100 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568B9188.6000506@redhat.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> Message-ID: > On 5 Jan 2016, at 10:48, Andrew Haley wrote: > > On 04/01/16 20:12, John Rose wrote: >> Corrected, thanks. They don't need to be intrinsics if they optimize well. >> The point is that the library functions have code shapes which work well >> with the JIT. For example, the multi-index checks might (as in Kishor's code) >> be implemented on top of the single-index check, without themselves being >> intrinsics. > > We seem to be missing the opportunity to convert > > i >= 0 && i < size > > into > > (unsigned)i < (unsigned)size > > and this is, as far as I can see, the only real code-quality advantage of > the checkIndex intrinsic. Could we not do this optimization and then > drop the C2 checkIndex intrinsic? > My understanding is that the pattern matching can sometimes be fragile, hence a "belts and braces? approach. It was motivated by the VarHandle work where it was observed that explicit bounds checks plus Unsafe array access produced more generated bounds checks [*] than direct array access (which does what you propose). The VarHandle array access implementations call this method before Unsafe access. If the pattern matching gets (or is now) sufficiently reliable we could remove the intrinsic, but i would like to carefully verify before doing that. Paul. [*] Another case was identified for viewed indexed ByteBuffer access, where use of the Objects.checkIndex method in the following method on Buffer also reduced generated checks: final int checkIndex(int i, int nb) { // package-private if ((i < 0) || (nb > limit - i)) throw new IndexOutOfBoundsException(); return i; } (Note that this code assumes that limit is always non-negative.) I need to go back and revisit from last time i checked in September. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From paul.sandoz at oracle.com Tue Jan 5 11:51:45 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 5 Jan 2016 12:51:45 +0100 Subject: Conditional moves vs. branching in unrolled loops Message-ID: Hi, Recent investigation comparing for loops with streams exposed what appears to be an issue with Math.max and generated code in unrolled loops. Namely this: @Benchmark public int forTest_if() { int[] a = ints; int e = ints.length; int m = Integer.MIN_VALUE; for (int i = 0; i < e; i++) if (a[i] >= m) m = a[i]; return m; } is faster than this: @Benchmark public int forTest_MathMax() { int[] a = ints; int e = ints.length; int m = Integer.MIN_VALUE; for (int i = 0; i < e; i++) m = Math.max(m, a[i]); return m; } Or this: Arrays.stream(ints).reduce(Integer.MIN_VALUE, (a, b) -> a >= b ? a : b); is faster than this: Arrays.stream(ints).reduce(Integer.MIN_VALUE, Math::max); at least on an x86 i5 processor. See the following links for more details: https://bugs.openjdk.java.net/browse/JDK-8146071 https://bugs.openjdk.java.net/browse/JDK-8146071?focusedCommentId=13883495&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13883495 For generated code in the for loop cases above see: https://bugs.openjdk.java.net/secure/attachment/56221/mathMax.perfasm.txt I am not familiar enough with the x86 architecture to fully explain why, but i presume branch prediction is trumping the conditional moves, which suggests that on certain processors the generated code for the Math.max intrinsic (and others) in unrolled loops should not use conditional moves. Thanks, Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From vitalyd at gmail.com Tue Jan 5 12:00:33 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 5 Jan 2016 07:00:33 -0500 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: Message-ID: This is a known issue: https://bugs.openjdk.java.net/browse/JDK-8039104 On Tuesday, January 5, 2016, Paul Sandoz wrote: > Hi, > > Recent investigation comparing for loops with streams exposed what appears > to be an issue with Math.max and generated code in unrolled loops. > > Namely this: > > @Benchmark > public int forTest_if() { > int[] a = ints; > int e = ints.length; > int m = Integer.MIN_VALUE; > for (int i = 0; i < e; i++) > if (a[i] >= m) > m = a[i]; > return m; > } > > is faster than this: > > @Benchmark > public int forTest_MathMax() { > int[] a = ints; > int e = ints.length; > int m = Integer.MIN_VALUE; > for (int i = 0; i < e; i++) > m = Math.max(m, a[i]); > return m; > } > > Or this: > > Arrays.stream(ints).reduce(Integer.MIN_VALUE, (a, b) -> a >= b ? a : b); > > is faster than this: > > Arrays.stream(ints).reduce(Integer.MIN_VALUE, Math::max); > > at least on an x86 i5 processor. > > See the following links for more details: > > https://bugs.openjdk.java.net/browse/JDK-8146071 > > https://bugs.openjdk.java.net/browse/JDK-8146071?focusedCommentId=13883495&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13883495 > > For generated code in the for loop cases above see: > > > https://bugs.openjdk.java.net/secure/attachment/56221/mathMax.perfasm.txt > > I am not familiar enough with the x86 architecture to fully explain why, > but i presume branch prediction is trumping the conditional moves, which > suggests that on certain processors the generated code for the Math.max > intrinsic (and others) in unrolled loops should not use conditional moves. > > Thanks, > Paul. > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.sandoz at oracle.com Tue Jan 5 12:47:20 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 5 Jan 2016 13:47:20 +0100 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: Message-ID: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> > On 5 Jan 2016, at 13:00, Vitaly Davidovich wrote: > > This is a known issue: https://bugs.openjdk.java.net/browse/JDK-8039104 > Many thanks, i closed JDK-8146071 as a dup of JDK-8039104. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From vladimir.kozlov at oracle.com Tue Jan 5 17:05:23 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Jan 2016 09:05:23 -0800 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568B7789.5010600@oracle.com> References: <55FBDFEC.4060405@oracle.com> <56139149.5080906@oracle.com> <568A590F.6030104@oracle.com> <568B05AE.7020700@oracle.com> <568B7789.5010600@oracle.com> Message-ID: <568BF7D3.7000403@oracle.com> On 1/4/16 11:58 PM, Tobias Hartmann wrote: > Hi Vladimir, > > thanks for the review. > > On 05.01.2016 00:52, Vladimir Kozlov wrote: >> On 1/4/16 3:35 AM, Tobias Hartmann wrote: >>> Hi Roland, >>> >>> sorry for the delay. >>> >>> On 07.10.2015 11:06, Roland Westrelin wrote: >>>>>> Maybe we could add an IfProjNode::Ideal method that disconnects the other branch of the If when this branch is always taken and that does so even during parsing. Given Ideal is called before Identity, that would guarantee the next call to Identity optimizes the If out. >>>>> >>>>> As you suggested, I added an IfProjNode::Ideal that disconnects the never taken branch from the IfNode. The subsequent call to Identity then removes the IfNode: >>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.03/ >>>>> >>>>> However, I wondered if this is "legal" because the comment in Node::ideal says: >>>>> >>>>> // The Ideal call almost arbitrarily reshape the graph rooted at the 'this' >>>>> // pointer. >>>>> >>>>> But we are changing the graph "above" the this pointer. I executed tests with -XX:+VerifyIterativeGVN and everything seems to work fine. >>>>> Another solution would be to cut the *current* branch if it is never taken: >>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.02/ >>>>> >>>>> But this solution depends on the assumption that we execute the identity() of the other ProjNode which is not guaranteed by GVN (I think). >>>>> >>>>> Therefore I would like to go for webrev.03. I verified that this solves the problem and tested the fix with JPRT. >>>> >>>> I thought about this more and I don?t think either work ok. >>>> >>>> The problem with webrev.02 is that depending on the order the projection nodes are allocated and transformed, the optimization may not happened: >>>> >>>> Node* never_taken = new IfTrueNode(..); >>>> Node* always_taken = new IfFalseNode(..); >>>> always_taken = gvn.transform(always_taken); >>>> never_taken = gvn.transform(never_taken); >>>> >>>> The problem with webrev.03 is that we may change a node that is not yet transformed (never_taken changed by call to gvn.transform(always_taken)). Not sure if it could break existing code but it?s clearly an unexpected behavior. >>> >>> Right, that could be a problem. >> >> I don't see a problem. But IfProjNode::Ideal() should have additional checks for that: >> >> // Check for dead control input >> if (in(0) && remove_dead_region(phase, can_reshape)) { >> return this; >> } >> // Don't bother trying to transform a dead node >> if (in(0) && in(0)->is_top()) { >> return NULL; >> } > > Right, I'll add those. > >> Also instead of set_req() use: >> >> PhaseIterGVN* igvn = phase->is_IterGVN(); >> igvn->replace_input_of(other, 0, phase->C->top()); >> >> This way following gvn.transform(never_taken); will work fine. > > But this assumes that we are only executing the code with IGVN but we also want to cut off the dead branch with GVN. Or am I missing something? webrev.04 checks can_reshape which is true only with IGVN. For GVN you can do it by hand: bool is_in_table = C->initial_gvn()->hash_delete(other); other->set_req(0, phase->C->top()); if (is_in_table) { C->initial_gvn()->hash_find_insert(other); } C->record_for_igvn(other); Note, during Parse (GVN) we don't remove dead code aggressively. Vladimir > > Thanks, > Tobias > >> >> Thanks, >> Vladimir >> >>> >>>> An other way would be to remove the in(0)->outcnt() == 1 check from IfProjNode::Identity() and in an IfProjNode::Ideal method do what you do in webrev.03 but when can_reshape is true only. >>> >>> Here is the new webrev: >>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.04/ >>> >>> However, I'm afraid that this re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Right? >>> >>> Thanks, >>> Tobias >>> From tobias.hartmann at oracle.com Tue Jan 5 17:13:15 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 5 Jan 2016 18:13:15 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568BF7D3.7000403@oracle.com> References: <55FBDFEC.4060405@oracle.com> <56139149.5080906@oracle.com> <568A590F.6030104@oracle.com> <568B05AE.7020700@oracle.com> <568B7789.5010600@oracle.com> <568BF7D3.7000403@oracle.com> Message-ID: <568BF9AB.3010408@oracle.com> On 05.01.2016 18:05, Vladimir Kozlov wrote: > On 1/4/16 11:58 PM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> thanks for the review. >> >> On 05.01.2016 00:52, Vladimir Kozlov wrote: >>> On 1/4/16 3:35 AM, Tobias Hartmann wrote: >>>> Hi Roland, >>>> >>>> sorry for the delay. >>>> >>>> On 07.10.2015 11:06, Roland Westrelin wrote: >>>>>>> Maybe we could add an IfProjNode::Ideal method that disconnects the other branch of the If when this branch is always taken and that does so even during parsing. Given Ideal is called before Identity, that would guarantee the next call to Identity optimizes the If out. >>>>>> >>>>>> As you suggested, I added an IfProjNode::Ideal that disconnects the never taken branch from the IfNode. The subsequent call to Identity then removes the IfNode: >>>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.03/ >>>>>> >>>>>> However, I wondered if this is "legal" because the comment in Node::ideal says: >>>>>> >>>>>> // The Ideal call almost arbitrarily reshape the graph rooted at the 'this' >>>>>> // pointer. >>>>>> >>>>>> But we are changing the graph "above" the this pointer. I executed tests with -XX:+VerifyIterativeGVN and everything seems to work fine. >>>>>> Another solution would be to cut the *current* branch if it is never taken: >>>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.02/ >>>>>> >>>>>> But this solution depends on the assumption that we execute the identity() of the other ProjNode which is not guaranteed by GVN (I think). >>>>>> >>>>>> Therefore I would like to go for webrev.03. I verified that this solves the problem and tested the fix with JPRT. >>>>> >>>>> I thought about this more and I don?t think either work ok. >>>>> >>>>> The problem with webrev.02 is that depending on the order the projection nodes are allocated and transformed, the optimization may not happened: >>>>> >>>>> Node* never_taken = new IfTrueNode(..); >>>>> Node* always_taken = new IfFalseNode(..); >>>>> always_taken = gvn.transform(always_taken); >>>>> never_taken = gvn.transform(never_taken); >>>>> >>>>> The problem with webrev.03 is that we may change a node that is not yet transformed (never_taken changed by call to gvn.transform(always_taken)). Not sure if it could break existing code but it?s clearly an unexpected behavior. >>>> >>>> Right, that could be a problem. >>> >>> I don't see a problem. But IfProjNode::Ideal() should have additional checks for that: >>> >>> // Check for dead control input >>> if (in(0) && remove_dead_region(phase, can_reshape)) { >>> return this; >>> } >>> // Don't bother trying to transform a dead node >>> if (in(0) && in(0)->is_top()) { >>> return NULL; >>> } >> >> Right, I'll add those. >> >>> Also instead of set_req() use: >>> >>> PhaseIterGVN* igvn = phase->is_IterGVN(); >>> igvn->replace_input_of(other, 0, phase->C->top()); >>> >>> This way following gvn.transform(never_taken); will work fine. >> >> But this assumes that we are only executing the code with IGVN but we also want to cut off the dead branch with GVN. Or am I missing something? > > webrev.04 checks can_reshape which is true only with IGVN. > For GVN you can do it by hand: > > bool is_in_table = C->initial_gvn()->hash_delete(other); > other->set_req(0, phase->C->top()); > if (is_in_table) { > C->initial_gvn()->hash_find_insert(other); > } > C->record_for_igvn(other); > > Note, during Parse (GVN) we don't remove dead code aggressively. Right, I thought you were referring to Roland's comment about webrev.03. As I wrote in a previous email, I'm afraid that the webrev.04 solution re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Do you think that isn't a problem? Thanks, Tobias > > Vladimir > >> >> Thanks, >> Tobias >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>>> An other way would be to remove the in(0)->outcnt() == 1 check from IfProjNode::Identity() and in an IfProjNode::Ideal method do what you do in webrev.03 but when can_reshape is true only. >>>> >>>> Here is the new webrev: >>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.04/ >>>> >>>> However, I'm afraid that this re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Right? >>>> >>>> Thanks, >>>> Tobias >>>> From vladimir.kozlov at oracle.com Tue Jan 5 17:17:04 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Jan 2016 09:17:04 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568B9188.6000506@redhat.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> Message-ID: <568BFA90.4020807@oracle.com> > On 31 Dec 2015, at 22:33, John Rose wrote: > > When performing explicit range checks in pre-intrinsic code, > let's try to use the new intrinsic functions in java.util.Objects, > called checkIndex, checkFromToIndex, and checkFromIndexSize. Please, don't forget that checks in pre-intrinsic code should match checks generated by javac (bytecode) for intrinsified methods. Otherwise those checks will not be removed (by dominated checks in pre-intrinsic code) when intrinsics are not support on a platform. That is why we currently have such duplicated pre-intrinsic code. On other hand when intrinsics are supported they don't have checks so if they present we can intrinsify pre-intrinsic code as you suggested. Thanks, Vladimir On 1/5/16 1:48 AM, Andrew Haley wrote: > On 04/01/16 20:12, John Rose wrote: >> Corrected, thanks. They don't need to be intrinsics if they optimize well. >> The point is that the library functions have code shapes which work well >> with the JIT. For example, the multi-index checks might (as in Kishor's code) >> be implemented on top of the single-index check, without themselves being >> intrinsics. > > We seem to be missing the opportunity to convert > > i >= 0 && i < size > > into > > (unsigned)i < (unsigned)size > > and this is, as far as I can see, the only real code-quality advantage of > the checkIndex intrinsic. Could we not do this optimization and then > drop the C2 checkIndex intrinsic? > > Andrew. > From vladimir.kozlov at oracle.com Tue Jan 5 17:20:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Jan 2016 09:20:12 -0800 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568BF9AB.3010408@oracle.com> References: <55FBDFEC.4060405@oracle.com> <56139149.5080906@oracle.com> <568A590F.6030104@oracle.com> <568B05AE.7020700@oracle.com> <568B7789.5010600@oracle.com> <568BF7D3.7000403@oracle.com> <568BF9AB.3010408@oracle.com> Message-ID: <568BFB4C.40106@oracle.com> Yes, webrev.04 is no go. I was referring webrev.03 but I missed that it does not have can_reshape. Vladimir On 1/5/16 9:13 AM, Tobias Hartmann wrote: > > On 05.01.2016 18:05, Vladimir Kozlov wrote: >> On 1/4/16 11:58 PM, Tobias Hartmann wrote: >>> Hi Vladimir, >>> >>> thanks for the review. >>> >>> On 05.01.2016 00:52, Vladimir Kozlov wrote: >>>> On 1/4/16 3:35 AM, Tobias Hartmann wrote: >>>>> Hi Roland, >>>>> >>>>> sorry for the delay. >>>>> >>>>> On 07.10.2015 11:06, Roland Westrelin wrote: >>>>>>>> Maybe we could add an IfProjNode::Ideal method that disconnects the other branch of the If when this branch is always taken and that does so even during parsing. Given Ideal is called before Identity, that would guarantee the next call to Identity optimizes the If out. >>>>>>> >>>>>>> As you suggested, I added an IfProjNode::Ideal that disconnects the never taken branch from the IfNode. The subsequent call to Identity then removes the IfNode: >>>>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.03/ >>>>>>> >>>>>>> However, I wondered if this is "legal" because the comment in Node::ideal says: >>>>>>> >>>>>>> // The Ideal call almost arbitrarily reshape the graph rooted at the 'this' >>>>>>> // pointer. >>>>>>> >>>>>>> But we are changing the graph "above" the this pointer. I executed tests with -XX:+VerifyIterativeGVN and everything seems to work fine. >>>>>>> Another solution would be to cut the *current* branch if it is never taken: >>>>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.02/ >>>>>>> >>>>>>> But this solution depends on the assumption that we execute the identity() of the other ProjNode which is not guaranteed by GVN (I think). >>>>>>> >>>>>>> Therefore I would like to go for webrev.03. I verified that this solves the problem and tested the fix with JPRT. >>>>>> >>>>>> I thought about this more and I don?t think either work ok. >>>>>> >>>>>> The problem with webrev.02 is that depending on the order the projection nodes are allocated and transformed, the optimization may not happened: >>>>>> >>>>>> Node* never_taken = new IfTrueNode(..); >>>>>> Node* always_taken = new IfFalseNode(..); >>>>>> always_taken = gvn.transform(always_taken); >>>>>> never_taken = gvn.transform(never_taken); >>>>>> >>>>>> The problem with webrev.03 is that we may change a node that is not yet transformed (never_taken changed by call to gvn.transform(always_taken)). Not sure if it could break existing code but it?s clearly an unexpected behavior. >>>>> >>>>> Right, that could be a problem. >>>> >>>> I don't see a problem. But IfProjNode::Ideal() should have additional checks for that: >>>> >>>> // Check for dead control input >>>> if (in(0) && remove_dead_region(phase, can_reshape)) { >>>> return this; >>>> } >>>> // Don't bother trying to transform a dead node >>>> if (in(0) && in(0)->is_top()) { >>>> return NULL; >>>> } >>> >>> Right, I'll add those. >>> >>>> Also instead of set_req() use: >>>> >>>> PhaseIterGVN* igvn = phase->is_IterGVN(); >>>> igvn->replace_input_of(other, 0, phase->C->top()); >>>> >>>> This way following gvn.transform(never_taken); will work fine. >>> >>> But this assumes that we are only executing the code with IGVN but we also want to cut off the dead branch with GVN. Or am I missing something? >> >> webrev.04 checks can_reshape which is true only with IGVN. >> For GVN you can do it by hand: >> >> bool is_in_table = C->initial_gvn()->hash_delete(other); >> other->set_req(0, phase->C->top()); >> if (is_in_table) { >> C->initial_gvn()->hash_find_insert(other); >> } >> C->record_for_igvn(other); >> >> Note, during Parse (GVN) we don't remove dead code aggressively. > > Right, I thought you were referring to Roland's comment about webrev.03. > > As I wrote in a previous email, I'm afraid that the webrev.04 solution re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Do you think that isn't a problem? > > Thanks, > Tobias > >> >> Vladimir >> >>> >>> Thanks, >>> Tobias >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>>> An other way would be to remove the in(0)->outcnt() == 1 check from IfProjNode::Identity() and in an IfProjNode::Ideal method do what you do in webrev.03 but when can_reshape is true only. >>>>> >>>>> Here is the new webrev: >>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.04/ >>>>> >>>>> However, I'm afraid that this re-introduces JDK-8027626. If we call IfProjNode::Identity() during GVN and replace the ProjNode by If's input, we end up with a node having two control outputs until we remove the dead branch during IGVN. Right? >>>>> >>>>> Thanks, >>>>> Tobias >>>>> From kishor.kharbas at intel.com Tue Jan 5 21:39:31 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Tue, 5 Jan 2016 21:39:31 +0000 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568BFA90.4020807@oracle.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568BFA90.4020807@oracle.com> Message-ID: Thank you guys for the in detail discussion and review. I have patched the JDK, performing bound checking using Objects.checkFromIndexSize() in CounterMode.crypt() and AESCrypt.encryptBlock(), AESCrypt.decryptBlock() Here is the link - http://cr.openjdk.java.net/~vdeshpande/8135250/webrev.00/ Let me know if it looks correct. -Kishor -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Tuesday, January 05, 2016 9:17 AM To: Andrew Haley; John Rose Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES > On 31 Dec 2015, at 22:33, John Rose wrote: > > When performing explicit range checks in pre-intrinsic code, > let's try to use the new intrinsic functions in java.util.Objects, > called checkIndex, checkFromToIndex, and checkFromIndexSize. Please, don't forget that checks in pre-intrinsic code should match checks generated by javac (bytecode) for intrinsified methods. Otherwise those checks will not be removed (by dominated checks in pre-intrinsic code) when intrinsics are not support on a platform. That is why we currently have such duplicated pre-intrinsic code. On other hand when intrinsics are supported they don't have checks so if they present we can intrinsify pre-intrinsic code as you suggested. Thanks, Vladimir On 1/5/16 1:48 AM, Andrew Haley wrote: > On 04/01/16 20:12, John Rose wrote: >> Corrected, thanks. They don't need to be intrinsics if they optimize well. >> The point is that the library functions have code shapes which work >> well with the JIT. For example, the multi-index checks might (as in >> Kishor's code) be implemented on top of the single-index check, >> without themselves being intrinsics. > > We seem to be missing the opportunity to convert > > i >= 0 && i < size > > into > > (unsigned)i < (unsigned)size > > and this is, as far as I can see, the only real code-quality advantage > of the checkIndex intrinsic. Could we not do this optimization and > then drop the C2 checkIndex intrinsic? > > Andrew. > From john.r.rose at oracle.com Tue Jan 5 22:11:23 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 5 Jan 2016 14:11:23 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568B9188.6000506@redhat.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> Message-ID: On Jan 5, 2016, at 1:48 AM, Andrew Haley wrote: > > On 04/01/16 20:12, John Rose wrote: >> Corrected, thanks. They don't need to be intrinsics if they optimize well. >> The point is that the library functions have code shapes which work well >> with the JIT. For example, the multi-index checks might (as in Kishor's code) >> be implemented on top of the single-index check, without themselves being >> intrinsics. > > We seem to be missing the opportunity to convert > > i >= 0 && i < size > > into > > (unsigned)i < (unsigned)size > > and this is, as far as I can see, the only real code-quality advantage of > the checkIndex intrinsic. Could we not do this optimization and then > drop the C2 checkIndex intrinsic? ?What Paul already said about belts and braces. Of course we want the JIT to be "sufficiently smart" (tm) to discover the meaning of all such expressions. But surely, on balance, it's a good thing to encourage programmers to say what they mean. Dropping the intrinsic would prevent them from expressing their intention, forcing them to fall back on Java's expression operators. At that point, they have a variety of ways of indirectly spelling out their intention. There is no direct contract that the JIT will understand them, just a hope. That's not good engineering. Also, it's not just a matter of micro-optimizing a single expression to use unsigned arithmetic (though that is surprisingly tricky). Range checks are interesting to block-level loop transformations (iteration range reorganization). Do you really want your loop optimizations to be gated on "sufficient smarts" in the JIT's expression pattern matcher? ? John From sangheon.kim at oracle.com Wed Jan 6 00:31:05 2016 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 5 Jan 2016 16:31:05 -0800 Subject: RFR(s): 8144573: TLABWasteIncrement=max_jint fires an assert on SPARC for non-G1 GC mode Message-ID: <568C6049.5020400@oracle.com> Hi all, Could I have reviews for the below change to remove size limitation(<4096) of TLABWasteIncrement on SPARC? Current implementation uses 'add(Register, int, Register)' which has 13bit limitation for 'int' parameter. I changed to use 'set64' to load the value to register and then call 'add'. 'set64' will run cheap path as the range of TLABWasteIncrememt is (0, max_juint). This assert is only fired on non-G1 mode as G1 is the only GC that returns false from Universe::heap()->supports_inline_contig_alloc() by default option. And this decides to fall that routine. I didn't add a test as current TestOptionsWithRanges.java is enough to test this case with nightly option rotation. CR: https://bugs.openjdk.java.net/browse/JDK-8144573 Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.00/ Testing: JPRT, manual test on SPARC[1] [1]: java -XX:TLABWasteIncrement=4096(and some larger values as well) -XX:+UseConcMarkSweepGC(UseParallelGC and UseSerialGC) -version Thanks, Sangheon From john.r.rose at oracle.com Wed Jan 6 01:05:56 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 5 Jan 2016 17:05:56 -0800 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> Message-ID: <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Darn, this works against the "say what you mean" story I told for checkIndex. The bug here is very very special but is hit commonly so needs fixing. The special part is that accumulating Math.max values over a long loop almost *always* creates a series of predictable branches, which means cmov will lose on many CPUs places. (Exercise: Try to construct a long series of values for which each value is the largest so far, randomly, with 50% probability. This will not be a series found often in nature.) We need to explicitly detect accumulations on cmov ops in long loops, and convert them to branches. Also, we should continue to recommend using intrinsics instead of random logic. Fun fact: Using your own branch logic makes the JVM manage a branch profile just for you, which can mean performance. Intrinsics, if they have internal branch logic, have polluted profiles. We need better call-site profiles and/or split profiles to overcome this. ? John > On Jan 5, 2016, at 4:47 AM, Paul Sandoz wrote: > > >> On 5 Jan 2016, at 13:00, Vitaly Davidovich wrote: >> >> This is a known issue: https://bugs.openjdk.java.net/browse/JDK-8039104 > > Many thanks, i closed JDK-8146071 as a dup of JDK-8039104. > > Paul. From forax at univ-mlv.fr Wed Jan 6 02:02:46 2016 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 6 Jan 2016 03:02:46 +0100 (CET) Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: <585079517.742162.1452045766579.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Paul Sandoz" > Cc: "hotspot compiler" > Envoy?: Mercredi 6 Janvier 2016 02:05:56 > Objet: Re: Conditional moves vs. branching in unrolled loops > > Darn, this works against the "say what you mean" story I told for checkIndex. > > The bug here is very very special but is hit commonly so needs fixing. The > special part is that accumulating Math.max values over a long loop almost > *always* creates a series of predictable branches, which means cmov will > lose on many CPUs places. (Exercise: Try to construct a long series of > values for which each value is the largest so far, randomly, with 50% > probability. This will not be a series found often in nature.) > > We need to explicitly detect accumulations on cmov ops in long loops, and > convert them to branches. > > Also, we should continue to recommend using intrinsics instead of random > logic. > > Fun fact: Using your own branch logic makes the JVM manage a branch profile > just for you, which can mean performance. Intrinsics, if they have internal > branch logic, have polluted profiles. We need better call-site profiles > and/or split profiles to overcome this. we already have the first part of a kind of split profiles in tiered mode, if code is first inlined by c1, c2 could use these different profiles, but currently the profiles are shared because you have one profile for one bci. so in tiered more, we should have one profile by bci + caller path inside the same inlining blob, the VM need to keep the inlining tree created by c1 to send it to c2 (there is maybe enough info in the stackwalk info to recreate the inlining tree). > > ? John R?mi > > > On Jan 5, 2016, at 4:47 AM, Paul Sandoz wrote: > > > > > >> On 5 Jan 2016, at 13:00, Vitaly Davidovich wrote: > >> > >> This is a known issue: https://bugs.openjdk.java.net/browse/JDK-8039104 > > > > Many thanks, i closed JDK-8146071 as a dup of JDK-8039104. > > > > Paul. > From john.r.rose at oracle.com Wed Jan 6 02:52:52 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 5 Jan 2016 18:52:52 -0800 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: <585079517.742162.1452045766579.JavaMail.zimbra@u-pem.fr> References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> <585079517.742162.1452045766579.JavaMail.zimbra@u-pem.fr> Message-ID: Yep. It's a matter of data structure to keep track of the splits. ? John > On Jan 5, 2016, at 6:02 PM, Remi Forax wrote: > > so in tiered more, we should have one profile by bci + caller path inside the same inlining blob, > the VM need to keep the inlining tree created by c1 to send it to c2 > (there is maybe enough info in the stackwalk info to recreate the inlining tree). From igor.veresov at oracle.com Wed Jan 6 04:29:11 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 5 Jan 2016 20:29:11 -0800 Subject: RFR(s): 8144573: TLABWasteIncrement=max_jint fires an assert on SPARC for non-G1 GC mode In-Reply-To: <568C6049.5020400@oracle.com> References: <568C6049.5020400@oracle.com> Message-ID: <6D69BB31-A1F4-44A8-8CED-CF166CB2EB46@oracle.com> I?m not sure we care a lot about tiny bits of performance in the this instance? But, in case use wanted to keep the original code for the simm13 case you could check the range of the constant and still emit the code that was there before. It also seems suboptimal to do set64 in MacroAssembler::tlab_refill() on all paths - the result of the original add in the delay slot doesn?t seem to be used if we jump to discard_tlab, right? So, may be you could do something like: brx(Assembler::lessEqual, false, Assembler::pt, discard_tlab); if (is_simm13(ThreadLocalAllocBuffer::refill_waste_limit_increment())) { delayed()->add(t2, ThreadLocalAllocBuffer::refill_waste_limit_increment(), t2); } else { delayed()->nop(); set64(ThreadLocalAllocBuffer::refill_waste_limit_increment(), t3, G0); add(t2, t3, t2); } Similarly, tighter code can be emitted for the interpreter in templateTable_sparc.cpp. igor > On Jan 5, 2016, at 4:31 PM, sangheon wrote: > > Hi all, > > Could I have reviews for the below change to remove size limitation(<4096) of TLABWasteIncrement on SPARC? > > Current implementation uses 'add(Register, int, Register)' which has 13bit limitation for 'int' parameter. > I changed to use 'set64' to load the value to register and then call 'add'. 'set64' will run cheap path as the range of TLABWasteIncrememt is (0, max_juint). > > This assert is only fired on non-G1 mode as G1 is the only GC that returns false from Universe::heap()->supports_inline_contig_alloc() by default option. And this decides to fall that routine. > > I didn't add a test as current TestOptionsWithRanges.java is enough to test this case with nightly option rotation. > > CR: https://bugs.openjdk.java.net/browse/JDK-8144573 > Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.00/ > Testing: JPRT, manual test on SPARC[1] > > [1]: java -XX:TLABWasteIncrement=4096(and some larger values as well) -XX:+UseConcMarkSweepGC(UseParallelGC and UseSerialGC) -version > > Thanks, > Sangheon From aph at redhat.com Wed Jan 6 10:05:29 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Jan 2016 10:05:29 +0000 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568BFA90.4020807@oracle.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568BFA90.4020807@oracle.com> Message-ID: <568CE6E9.1070904@redhat.com> On 05/01/16 17:17, Vladimir Kozlov wrote: > Please, don't forget that checks in pre-intrinsic code should match > checks generated by javac (bytecode) for intrinsified > methods. Otherwise those checks will not be removed (by dominated > checks in pre-intrinsic code) when intrinsics are not support on a > platform. That is why we currently have such duplicated > pre-intrinsic code. > > On other hand when intrinsics are supported they don't have checks > so if they present we can intrinsify pre-intrinsic code as you > suggested. It may be that I'm just being very dim, but I've read this ten times and I still don't know exactly what you mean. Can you give me a pointer to an example of such duplicated pre-intrinsic code? Thanks, Andrew. From paul.sandoz at oracle.com Wed Jan 6 10:12:09 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 6 Jan 2016 11:12:09 +0100 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: > On 6 Jan 2016, at 02:05, John Rose wrote: > > Darn, this works against the "say what you mean" story I told for checkIndex. > > The bug here is very very special but is hit commonly so needs fixing. The special part is that accumulating Math.max values over a long loop almost *always* creates a series of predictable branches, which means cmov will lose on many CPUs places. (Exercise: Try to construct a long series of values for which each value is the largest so far, randomly, with 50% probability. This will not be a series found often in nature.) > Here are some results (see benchmark below, and thanks to Aleksey for hints/tips): Benchmark (bias) (dg) (size) Mode Cnt Score Error Units A.forTest_MathMax 0.1 RANDOM 1 avgt 10 3.698 ? 0.146 ns/op A.forTest_MathMax 0.1 RANDOM 10 avgt 10 9.474 ? 0.234 ns/op A.forTest_MathMax 0.1 RANDOM 100 avgt 10 84.363 ? 2.734 ns/op A.forTest_MathMax 0.1 RANDOM 1000 avgt 10 840.102 ? 22.474 ns/op A.forTest_MathMax 0.1 RANDOM 10000 avgt 10 8514.794 ? 202.722 ns/op A.forTest_MathMax 0.1 RANDOM_RAMP_UP 1 avgt 10 3.764 ? 0.166 ns/op A.forTest_MathMax 0.1 RANDOM_RAMP_UP 10 avgt 10 9.838 ? 0.428 ns/op A.forTest_MathMax 0.1 RANDOM_RAMP_UP 100 avgt 10 84.650 ? 3.155 ns/op A.forTest_MathMax 0.1 RANDOM_RAMP_UP 1000 avgt 10 844.412 ? 21.983 ns/op A.forTest_MathMax 0.1 RANDOM_RAMP_UP 10000 avgt 10 8519.292 ? 295.786 ns/op A.forTest_MathMax 0.5 RANDOM 1 avgt 10 3.667 ? 0.116 ns/op A.forTest_MathMax 0.5 RANDOM 10 avgt 10 9.527 ? 0.235 ns/op A.forTest_MathMax 0.5 RANDOM 100 avgt 10 83.318 ? 2.954 ns/op A.forTest_MathMax 0.5 RANDOM 1000 avgt 10 843.540 ? 22.051 ns/op A.forTest_MathMax 0.5 RANDOM 10000 avgt 10 8559.293 ? 333.435 ns/op A.forTest_MathMax 0.5 RANDOM_RAMP_UP 1 avgt 10 3.712 ? 0.123 ns/op A.forTest_MathMax 0.5 RANDOM_RAMP_UP 10 avgt 10 9.536 ? 0.195 ns/op A.forTest_MathMax 0.5 RANDOM_RAMP_UP 100 avgt 10 82.943 ? 2.199 ns/op A.forTest_MathMax 0.5 RANDOM_RAMP_UP 1000 avgt 10 842.282 ? 19.100 ns/op A.forTest_MathMax 0.5 RANDOM_RAMP_UP 10000 avgt 10 8454.333 ? 293.222 ns/op A.forTest_if 0.1 RANDOM 1 avgt 10 3.453 ? 0.106 ns/op A.forTest_if 0.1 RANDOM 10 avgt 10 9.156 ? 0.555 ns/op A.forTest_if 0.1 RANDOM 100 avgt 10 39.006 ? 1.575 ns/op A.forTest_if 0.1 RANDOM 1000 avgt 10 372.999 ? 20.423 ns/op A.forTest_if 0.1 RANDOM 10000 avgt 10 3613.243 ? 72.343 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 1 avgt 10 3.410 ? 0.086 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 10 avgt 10 9.236 ? 0.412 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 100 avgt 10 49.200 ? 1.642 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 1000 avgt 10 476.677 ? 16.041 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 10000 avgt 10 3774.091 ? 131.946 ns/op A.forTest_if 0.5 RANDOM 1 avgt 10 3.398 ? 0.121 ns/op A.forTest_if 0.5 RANDOM 10 avgt 10 9.565 ? 0.614 ns/op A.forTest_if 0.5 RANDOM 100 avgt 10 49.666 ? 2.257 ns/op A.forTest_if 0.5 RANDOM 1000 avgt 10 383.734 ? 22.051 ns/op A.forTest_if 0.5 RANDOM 10000 avgt 10 3624.447 ? 204.303 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 1 avgt 10 3.446 ? 0.135 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 10 avgt 10 9.330 ? 0.399 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 100 avgt 10 84.596 ? 4.132 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 1000 avgt 10 914.982 ? 30.125 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 10000 avgt 10 8991.088 ? 315.307 ns/op At least for this set of tests the results indicate conditional moves offer no major advantage over branching. For the worst case branching scenario (the ?50 cent? case) conditional moves appear marginally better, but as you say the data pattern is likely rare. Perhaps for conditional moves data dependency chains are more costly? Paul. package oracle.jmh; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.annotations.Param; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Warmup; import java.util.Arrays; import java.util.Random; import java.util.concurrent.TimeUnit; import java.util.function.BiConsumer; @State(Scope.Benchmark) @Fork(value = 1, warmups = 0) @Warmup(iterations = 10, time = 100, timeUnit = TimeUnit.MILLISECONDS) @Measurement(iterations = 10, time = 100, timeUnit = TimeUnit.MILLISECONDS) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class A { @Param({"1", "10", "100", "1000", "10000"}) int size; @Param({"0.0", "0.1", "0.2", "0.3", "0.4", "0.5"}) private double bias; @Param({"RANDOM", "RANDOM_RAMP_UP"}) DataGenerator dg; int ints[]; @Setup public void setUp() { ints = dg.generate(bias, size); } public enum DataGenerator { RANDOM((b, vs) -> { Random random = new Random(); for (int i = 0; i < vs.length; i++) if (random.nextFloat() > b) vs[i] = random.nextInt(); }), RANDOM_RAMP_UP((b, vs) -> { Random random = new Random(); for (int i = 0; i < vs.length; i++) { if (random.nextFloat() > b) vs[i] = i; } }); final BiConsumer filler; DataGenerator(BiConsumer filler) { this.filler = filler; } int[] generate(double bias, int size) { int[] vs = new int[size]; filler.accept(bias, vs); return vs; } } @Benchmark public int forTest_if() { int[] a = ints; int e = ints.length; int m = Integer.MIN_VALUE; for (int i = 0; i < e; i++) if (a[i] >= m) m = a[i]; return m; } @Benchmark public int forTest_MathMax() { int[] a = ints; int e = ints.length; int m = Integer.MIN_VALUE; for (int i = 0; i < e; i++) m = Math.max(m, a[i]); return m; } @Benchmark public int streamTest_lambda() { return Arrays.stream(ints).reduce(Integer.MIN_VALUE, (a, b) -> a >= b ? a : b); } @Benchmark public int streamTest_MathMax() { return Arrays.stream(ints).reduce(Integer.MIN_VALUE, Math::max); } } > We need to explicitly detect accumulations on cmov ops in long loops, and convert them to branches. > > Also, we should continue to recommend using intrinsics instead of random logic. > > Fun fact: Using your own branch logic makes the JVM manage a branch profile just for you, which can mean performance. Intrinsics, if they have internal branch logic, have polluted profiles. We need better call-site profiles and/or split profiles to overcome this. > > ? John > >> On Jan 5, 2016, at 4:47 AM, Paul Sandoz wrote: >> >> >>> On 5 Jan 2016, at 13:00, Vitaly Davidovich wrote: >>> >>> This is a known issue: https://bugs.openjdk.java.net/browse/JDK-8039104 >> >> Many thanks, i closed JDK-8146071 as a dup of JDK-8039104. >> >> Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From john.r.rose at oracle.com Wed Jan 6 10:31:09 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 6 Jan 2016 02:31:09 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568CE6E9.1070904@redhat.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568BFA90.4020807@oracle.com> <568CE6E9.1070904@redhat.com> Message-ID: <5C9816F5-9618-4BBF-A761-E03CCDEDB230@oracle.com> On Jan 6, 2016, at 2:05 AM, Andrew Haley wrote: > > Can you give me a pointer to an example of such duplicated > pre-intrinsic code? > It's another case of belt-and-suspenders. The internal bytecodes of a non-replaced intrinsic candidate perform hardwired range checks as part of iaload, etc. So the JVM defends itself against out-of-range access, as usual. Meanwhile, at a higher level, the intrinsic candidate (whether replaced or not) is dominated by a call to explicit range check logic. if (rangeCheckFail(array, indexes)) goto L_throw_1; /* non-replaced intrinsic, logic gets inlined as follows: */ for (index in indexes?) { if (rangeCheckFail(array, index)) goto L_throw_2; tem = iaload(array, index); ? } In the case of a replaced intrinsic, there is not guaranteed to be a full range check of the array access, so: if (rangeCheckFail(array, indexes)) goto L_throw_1; /* replaced intrinsic */ ?some vectorized assembly code works with array and indexes? In the first case, if the first "rangeCheckFail" logic is similar enough to the second "rangeCheckFail" logic, the JIT can elide the second one. But they are likely *not* to match if the programmer has written something elegant and/or clever for the first set of checks. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Wed Jan 6 10:41:31 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Jan 2016 10:41:31 +0000 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> Message-ID: <568CEF5B.5060306@redhat.com> On 05/01/16 22:11, John Rose wrote: > Dropping the intrinsic would prevent them from expressing their > intention, forcing them to fall back on Java's expression operators. I don't really understand that point: Objects.checkIndex would still exist, and hopefully people would use it, but it wouldn't need special-case handling in C2. > Also, it's not just a matter of micro-optimizing a single expression > to use unsigned arithmetic (though that is surprisingly tricky). I accept that point. > Range checks are interesting to block-level loop transformations > (iteration range reorganization). Do you really want your loop > optimizations to be gated on "sufficient smarts" in the JIT's > expression pattern matcher? Please forgive me for pushing this: I'm not arguing for the sake of it, I'm trying to understand your reasoning. As it stands we recognize a call to Objects.checkIndex and transform it into a certain pattern. I'm assuming that it's not impossible to recognize the logic inside Objects.checkIndex and transform it into the same form that the intrinsic generates. And that would have a payoff in all the places that the same logic is used in existing programs, both inside and outside the JDK. I suppose one downside of this approach is that C2 might decide not to inline Objects.checkIndex, so it would be called instead and the optimization would not be done. Andrew. From tobias.hartmann at oracle.com Wed Jan 6 11:22:29 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Jan 2016 12:22:29 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <55FBDFEC.4060405@oracle.com> References: <55FBDFEC.4060405@oracle.com> Message-ID: <568CF8F5.5090202@oracle.com> Hi, I had an off-thread discussion with Roland and we came to the conclusion that all proposed fixes essentially work around the fact that we are unable to determine if Identity is called from GVN or IGVN. As Roland pointed out, we would probably miss to adapt such a fix if we ever get the ability to check for GVN/IGVN. Here is a more robust solution not depending on any worklist ordering assumptions and not causing unexpected side effects: Since Node::Identity(PhaseTransform* phase) is always called with either PhaseGVN or PhaseIterGVN, we can change the argument to type PhaseValues* and can therefore simply use phase->is_IterGVN() to determine if we were called from GVN or IGVN. This could also be useful for other changes. Of course, this introduces an additional virtual call but we are already calling phase->is_IterGVN() at many other places in the code. In the future, these calls could be replaced by a field access (as Vladimir suggested in the RFR for 8139771). http://cr.openjdk.java.net/~thartmann/8136469/webrev.05/ What do you think? Thanks, Tobias On 18.09.2015 11:57, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8136469 > http://cr.openjdk.java.net/~thartmann/8136469/webrev.00/ > > Problem: > When creating a pre-sized StringBuilder, C2's string concatenation optimization sometimes fails to optimize the chain (see [1]). The problem is that the initial size of the StringBuilder depends on a static final boolean that is initialized to true at runtime. Therefore the string concatenation control flow chain [2] contains an IfNode with a ConI (1) as input instead of the expected BoolNode and StringConcat::validate_control_flow() silently bails out. > > Solution: > I changed the implementation to skip dead tests as they would be removed by IGVN later anyway. I added an assert to make sure we don't bail out silently if the input of the IfNode is not a bool. I also had to change validate_mem_flow() to handle dead ifs. Further, the assert in line 825 is unnecessary because we execute the same check in as_If(). > > Testing: > - New test (TestPresizedStringBuilder) > - JPRT > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/secure/attachment/53220/TestPresizedStringBuilder.java > [2] https://bugs.openjdk.java.net/secure/attachment/53218/graph.png > From tobias.hartmann at oracle.com Wed Jan 6 12:01:45 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Jan 2016 13:01:45 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings Message-ID: <568D0229.60908@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8144212 http://cr.openjdk.java.net/~thartmann/8144212/webrev.00/ An Apache Lucene test fails with Compact Strings enabled because the result of String.getChars() is invalid. The problem is a missing membar after the _inflateString intrinsic, allowing a subsequent load from the destination array to flow above and return a wrong result (see [1]: 210 LoadUS should read the result of 196 StrInflatedCopy). Tested with JPRT and failing Apache Lucene test. During my investigation, I noticed that the StringUTF16.getChars() and StringUTF16.compress/inflate intrinsics use LibraryCallKit::tightly_coupled_allocation() to skip zeroing the array elements. However, the intrinsics do not take care of zeroing remaining array elements not affected by the intrinsic operation. Currently, this is not a problem because all (String API internal) usages of the intrinsics that have a tightly coupled allocation make sure that the entire array is initialized. However, we should fix this to avoid potential bugs. I filed JDK-8146547 and will take care of it. Thanks, Tobias [1] https://bugs.openjdk.java.net/secure/attachment/56238/Graph.png From aph at redhat.com Wed Jan 6 12:07:26 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Jan 2016 12:07:26 +0000 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568D0229.60908@oracle.com> References: <568D0229.60908@oracle.com> Message-ID: <568D037E.7000105@redhat.com> On 06/01/16 12:01, Tobias Hartmann wrote: > An Apache Lucene test fails with Compact Strings enabled because the > result of String.getChars() is invalid. The problem is a missing > membar after the _inflateString intrinsic, allowing a subsequent > load from the destination array to flow above and return a wrong > result (see [1]: 210 LoadUS should read the result of 196 > StrInflatedCopy). > > Tested with JPRT and failing Apache Lucene test. Is a MemBarCPUOrder sufficient for machines with relaxed memory ordering? Andrew. From vitalyd at gmail.com Wed Jan 6 12:20:58 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 07:20:58 -0500 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568CEF5B.5060306@redhat.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> Message-ID: I agree with Andrew. We had a similarly themed discussion a few months back when someone wanted to make Integer/Long::compareTo an intrinsic; the sentiment there was that there's nothing "special" about compareTo, and instead the JIT can be taught to pick up the pattern used in the bytecode for those methods. Objects::checkIndex seems no different in that regard. I realize that there may always be a user-specified shape that the JIT doesn't understand, but straightforward cases should hopefully Just Work(tm) as those patterns can be picked up elsewhere in code and performance improves without changing a line of code. On Wednesday, January 6, 2016, Andrew Haley wrote: > On 05/01/16 22:11, John Rose wrote: > > > Dropping the intrinsic would prevent them from expressing their > > intention, forcing them to fall back on Java's expression operators. > > I don't really understand that point: Objects.checkIndex would still > exist, and hopefully people would use it, but it wouldn't need > special-case handling in C2. > > > Also, it's not just a matter of micro-optimizing a single expression > > to use unsigned arithmetic (though that is surprisingly tricky). > > I accept that point. > > > Range checks are interesting to block-level loop transformations > > (iteration range reorganization). Do you really want your loop > > optimizations to be gated on "sufficient smarts" in the JIT's > > expression pattern matcher? > > Please forgive me for pushing this: I'm not arguing for the sake of it, > I'm trying to understand your reasoning. > > As it stands we recognize a call to Objects.checkIndex and transform > it into a certain pattern. I'm assuming that it's not impossible to > recognize the logic inside Objects.checkIndex and transform it into > the same form that the intrinsic generates. And that would have a > payoff in all the places that the same logic is used in existing > programs, both inside and outside the JDK. > > I suppose one downside of this approach is that C2 might decide > not to inline Objects.checkIndex, so it would be called instead > and the optimization would not be done. > > Andrew. > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 6 12:38:20 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 07:38:20 -0500 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: On Wednesday, January 6, 2016, Paul Sandoz wrote: > > On 6 Jan 2016, at 02:05, John Rose > wrote: > > Darn, this works against the "say what you mean" story I told for > checkIndex. > > The bug here is very very special but is hit commonly so needs fixing. The > special part is that accumulating Math.max values over a long loop almost > *always* creates a series of predictable branches, which means cmov will > lose on many CPUs places. (Exercise: Try to construct a long series of > values for which each value is the largest so far, randomly, with 50% > probability. This will not be a series found often in nature.) > > > Here are some results (see benchmark below, and thanks to Aleksey for > hints/tips): > > Benchmark (bias) (dg) (size) Mode Cnt Score > Error Units > A.forTest_MathMax 0.1 RANDOM 1 avgt 10 3.698 ? > 0.146 ns/op > A.forTest_MathMax 0.1 RANDOM 10 avgt 10 9.474 ? > 0.234 ns/op > A.forTest_MathMax 0.1 RANDOM 100 avgt 10 84.363 ? > 2.734 ns/op > A.forTest_MathMax 0.1 RANDOM 1000 avgt 10 840.102 ? > 22.474 ns/op > A.forTest_MathMax 0.1 RANDOM 10000 avgt 10 8514.794 ? > 202.722 ns/op > A.forTest_MathMax 0.1 RANDOM_RAMP_UP 1 avgt 10 3.764 ? > 0.166 ns/op > A.forTest_MathMax 0.1 RANDOM_RAMP_UP 10 avgt 10 9.838 ? > 0.428 ns/op > A.forTest_MathMax 0.1 RANDOM_RAMP_UP 100 avgt 10 84.650 ? > 3.155 ns/op > A.forTest_MathMax 0.1 RANDOM_RAMP_UP 1000 avgt 10 844.412 ? > 21.983 ns/op > A.forTest_MathMax 0.1 RANDOM_RAMP_UP 10000 avgt 10 8519.292 ? > 295.786 ns/op > A.forTest_MathMax 0.5 RANDOM 1 avgt 10 3.667 ? > 0.116 ns/op > A.forTest_MathMax 0.5 RANDOM 10 avgt 10 9.527 ? > 0.235 ns/op > A.forTest_MathMax 0.5 RANDOM 100 avgt 10 83.318 ? > 2.954 ns/op > A.forTest_MathMax 0.5 RANDOM 1000 avgt 10 843.540 ? > 22.051 ns/op > A.forTest_MathMax 0.5 RANDOM 10000 avgt 10 8559.293 ? > 333.435 ns/op > A.forTest_MathMax 0.5 RANDOM_RAMP_UP 1 avgt 10 3.712 ? > 0.123 ns/op > A.forTest_MathMax 0.5 RANDOM_RAMP_UP 10 avgt 10 9.536 ? > 0.195 ns/op > A.forTest_MathMax 0.5 RANDOM_RAMP_UP 100 avgt 10 82.943 ? > 2.199 ns/op > A.forTest_MathMax 0.5 RANDOM_RAMP_UP 1000 avgt 10 842.282 ? > 19.100 ns/op > A.forTest_MathMax 0.5 RANDOM_RAMP_UP 10000 avgt 10 8454.333 ? > 293.222 ns/op > A.forTest_if 0.1 RANDOM 1 avgt 10 3.453 ? > 0.106 ns/op > A.forTest_if 0.1 RANDOM 10 avgt 10 9.156 ? > 0.555 ns/op > A.forTest_if 0.1 RANDOM 100 avgt 10 39.006 ? > 1.575 ns/op > A.forTest_if 0.1 RANDOM 1000 avgt 10 372.999 ? > 20.423 ns/op > A.forTest_if 0.1 RANDOM 10000 avgt 10 3613.243 ? > 72.343 ns/op > A.forTest_if 0.1 RANDOM_RAMP_UP 1 avgt 10 3.410 ? > 0.086 ns/op > A.forTest_if 0.1 RANDOM_RAMP_UP 10 avgt 10 9.236 ? > 0.412 ns/op > A.forTest_if 0.1 RANDOM_RAMP_UP 100 avgt 10 49.200 ? > 1.642 ns/op > A.forTest_if 0.1 RANDOM_RAMP_UP 1000 avgt 10 476.677 ? > 16.041 ns/op > A.forTest_if 0.1 RANDOM_RAMP_UP 10000 avgt 10 3774.091 ? > 131.946 ns/op > A.forTest_if 0.5 RANDOM 1 avgt 10 3.398 ? > 0.121 ns/op > A.forTest_if 0.5 RANDOM 10 avgt 10 9.565 ? > 0.614 ns/op > A.forTest_if 0.5 RANDOM 100 avgt 10 49.666 ? > 2.257 ns/op > A.forTest_if 0.5 RANDOM 1000 avgt 10 383.734 ? > 22.051 ns/op > A.forTest_if 0.5 RANDOM 10000 avgt 10 3624.447 ? > 204.303 ns/op > A.forTest_if 0.5 RANDOM_RAMP_UP 1 avgt 10 3.446 ? > 0.135 ns/op > A.forTest_if 0.5 RANDOM_RAMP_UP 10 avgt 10 9.330 ? > 0.399 ns/op > A.forTest_if 0.5 RANDOM_RAMP_UP 100 avgt 10 84.596 ? > 4.132 ns/op > A.forTest_if 0.5 RANDOM_RAMP_UP 1000 avgt 10 914.982 ? > 30.125 ns/op > A.forTest_if 0.5 RANDOM_RAMP_UP 10000 avgt 10 8991.088 ? > 315.307 ns/op > > At least for this set of tests the results indicate conditional moves > offer no major advantage over branching. For the worst case branching > scenario (the ?50 cent? case) conditional moves appear marginally better, > but as you say the data pattern is likely rare. > > Perhaps for conditional moves data dependency chains are more costly? > cmov carries a dependency on both inputs, making it more likely to stall when at least one isn't available whereas the branch still allows cpu to continue with speculative execution. In a tight loop with a memory access as one input to cmov, the memory op has to retire before cmov can proceed; using cmov when both inputs are already ready (e.g. values in registers) is pretty harmless though and avoids a branch entirely. cmov also has larger encoding than a branch. As the original jira on this issue states, cmov should only be used when the branch is profiled to be unpredictable. I'm not sure why loops with a max/min accumulator need to be called out separately in this regard - wouldn't the branch profile dictate this anyway? This of course assumes that profile pollution is addressed in some manner. > > Paul. > > package oracle.jmh; > > import org.openjdk.jmh.annotations.Benchmark; > import org.openjdk.jmh.annotations.BenchmarkMode; > import org.openjdk.jmh.annotations.Fork; > import org.openjdk.jmh.annotations.Measurement; > import org.openjdk.jmh.annotations.Mode; > import org.openjdk.jmh.annotations.OutputTimeUnit; > import org.openjdk.jmh.annotations.Param; > import org.openjdk.jmh.annotations.Scope; > import org.openjdk.jmh.annotations.Setup; > import org.openjdk.jmh.annotations.State; > import org.openjdk.jmh.annotations.Warmup; > > import java.util.Arrays; > import java.util.Random; > import java.util.concurrent.TimeUnit; > import java.util.function.BiConsumer; > > > @State(Scope.Benchmark) > @Fork(value = 1, warmups = 0) > @Warmup(iterations = 10, time = 100, timeUnit = TimeUnit.MILLISECONDS) > @Measurement(iterations = 10, time = 100, timeUnit = TimeUnit.MILLISECONDS) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > public class A { > > @Param({"1", "10", "100", "1000", "10000"}) > int size; > > @Param({"0.0", "0.1", "0.2", "0.3", "0.4", "0.5"}) > private double bias; > > @Param({"RANDOM", "RANDOM_RAMP_UP"}) > DataGenerator dg; > > int ints[]; > > @Setup > public void setUp() { > ints = dg.generate(bias, size); > } > > public enum DataGenerator { > RANDOM((b, vs) -> { > Random random = new Random(); > for (int i = 0; i < vs.length; i++) > if (random.nextFloat() > b) > vs[i] = random.nextInt(); > }), > > RANDOM_RAMP_UP((b, vs) -> { > Random random = new Random(); > for (int i = 0; i < vs.length; i++) { > if (random.nextFloat() > b) > vs[i] = i; > } > }); > > final BiConsumer filler; > > DataGenerator(BiConsumer filler) { > this.filler = filler; > } > > int[] generate(double bias, int size) { > int[] vs = new int[size]; > filler.accept(bias, vs); > return vs; > } > } > > @Benchmark > public int forTest_if() { > int[] a = ints; > int e = ints.length; > int m = Integer.MIN_VALUE; > for (int i = 0; i < e; i++) > if (a[i] >= m) > m = a[i]; > return m; > } > > @Benchmark > public int forTest_MathMax() { > int[] a = ints; > int e = ints.length; > int m = Integer.MIN_VALUE; > for (int i = 0; i < e; i++) > m = Math.max(m, a[i]); > return m; > } > > @Benchmark > public int streamTest_lambda() { > return Arrays.stream(ints).reduce(Integer.MIN_VALUE, (a, b) -> a >= b ? a : b); > } > > @Benchmark > public int streamTest_MathMax() { > return Arrays.stream(ints).reduce(Integer.MIN_VALUE, Math::max); > } > } > > > > We need to explicitly detect accumulations on cmov ops in long loops, and > convert them to branches. > > Also, we should continue to recommend using intrinsics instead of random > logic. > > Fun fact: Using your own branch logic makes the JVM manage a branch > profile just for you, which can mean performance. Intrinsics, if they have > internal branch logic, have polluted profiles. We need better call-site > profiles and/or split profiles to overcome this. > > ? John > > On Jan 5, 2016, at 4:47 AM, Paul Sandoz > wrote: > > > On 5 Jan 2016, at 13:00, Vitaly Davidovich > wrote: > > This is a known issue: https://bugs.openjdk.java.net/browse/JDK-8039104 > > > Many thanks, i closed JDK-8146071 as a dup of JDK-8039104. > > Paul. > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 6 12:43:36 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 07:43:36 -0500 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: <585079517.742162.1452045766579.JavaMail.zimbra@u-pem.fr> References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> <585079517.742162.1452045766579.JavaMail.zimbra@u-pem.fr> Message-ID: Ideally profile pollution could be solved/improved without requiring tiered; tiered has its own wrinkles, and many places simply use C2 alone. On Tuesday, January 5, 2016, Remi Forax wrote: > ----- Mail original ----- > > De: "John Rose" > > > ?: "Paul Sandoz" > > > Cc: "hotspot compiler" > > > Envoy?: Mercredi 6 Janvier 2016 02:05:56 > > Objet: Re: Conditional moves vs. branching in unrolled loops > > > > Darn, this works against the "say what you mean" story I told for > checkIndex. > > > > The bug here is very very special but is hit commonly so needs fixing. > The > > special part is that accumulating Math.max values over a long loop almost > > *always* creates a series of predictable branches, which means cmov will > > lose on many CPUs places. (Exercise: Try to construct a long series of > > values for which each value is the largest so far, randomly, with 50% > > probability. This will not be a series found often in nature.) > > > > We need to explicitly detect accumulations on cmov ops in long loops, and > > convert them to branches. > > > > Also, we should continue to recommend using intrinsics instead of random > > logic. > > > > Fun fact: Using your own branch logic makes the JVM manage a branch > profile > > just for you, which can mean performance. Intrinsics, if they have > internal > > branch logic, have polluted profiles. We need better call-site profiles > > and/or split profiles to overcome this. > > we already have the first part of a kind of split profiles in tiered mode, > if code is first inlined by c1, c2 could use these different profiles, > but currently the profiles are shared because you have one profile for one > bci. > > so in tiered more, we should have one profile by bci + caller path inside > the same inlining blob, > the VM need to keep the inlining tree created by c1 to send it to c2 > (there is maybe enough info in the stackwalk info to recreate the inlining > tree). > > > > > ? John > > R?mi > > > > > > On Jan 5, 2016, at 4:47 AM, Paul Sandoz > wrote: > > > > > > > > >> On 5 Jan 2016, at 13:00, Vitaly Davidovich > wrote: > > >> > > >> This is a known issue: > https://bugs.openjdk.java.net/browse/JDK-8039104 > > > > > > Many thanks, i closed JDK-8146071 as a dup of JDK-8039104. > > > > > > Paul. > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Wed Jan 6 13:06:16 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 6 Jan 2016 14:06:16 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568D037E.7000105@redhat.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> Message-ID: <568D1148.1030901@oracle.com> Hi Andrew, On 06.01.2016 13:07, Andrew Haley wrote: > On 06/01/16 12:01, Tobias Hartmann wrote: > >> An Apache Lucene test fails with Compact Strings enabled because the >> result of String.getChars() is invalid. The problem is a missing >> membar after the _inflateString intrinsic, allowing a subsequent >> load from the destination array to flow above and return a wrong >> result (see [1]: 210 LoadUS should read the result of 196 >> StrInflatedCopy). >> >> Tested with JPRT and failing Apache Lucene test. > > Is a MemBarCPUOrder sufficient for machines with relaxed memory > ordering? The problem here is that C2 reorders memory instructions and moves an array load before an array store. The MemBarCPUOrder is now used (compiler internally) to prevent this. We do the same for normal array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual code is emitted. See also the comment in memnode.hpp: // Ordering within the same CPU. Used to order unsafe memory references // inside the compiler when we lack alias info. Not needed "outside" the // compiler because the CPU does all the ordering for us. "CPU does all the ordering for us" means that even with a relaxed memory ordering, loads are never moved before dependent stores. Or did I misunderstand your question? Thanks, Tobias From aph at redhat.com Wed Jan 6 13:34:28 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Jan 2016 13:34:28 +0000 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568D1148.1030901@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> Message-ID: <568D17E4.90301@redhat.com> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: > The problem here is that C2 reorders memory instructions and moves > an array load before an array store. The MemBarCPUOrder is now used > (compiler internally) to prevent this. We do the same for normal > array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual > code is emitted. See also the comment in memnode.hpp: > > // Ordering within the same CPU. Used to order unsafe memory references > // inside the compiler when we lack alias info. Not needed "outside" the > // compiler because the CPU does all the ordering for us. > > "CPU does all the ordering for us" means that even with a relaxed > memory ordering, loads are never moved before dependent stores. > > Or did I misunderstand your question? No, I don't think so. I was just checking: I am very aware that HotSpot has presented those of use with relaxed memory order machines with some interesting gotchas over the years, that's all. I'm a bit surprised that C2 needs this barrier, given that there is a read-after-write dependency, but never mind. Thanks, Andrew. From paul.sandoz at oracle.com Wed Jan 6 14:14:40 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 6 Jan 2016 15:14:40 +0100 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: <9D66933E-DF61-45F6-94C9-4A314A9AC6F3@oracle.com> > On 6 Jan 2016, at 11:12, Paul Sandoz wrote: > At least for this set of tests the results indicate conditional moves offer no major advantage over branching. For the worst case branching scenario (the ?50 cent? case) conditional moves appear marginally better, but as you say the data pattern is likely rare. Scrap that, cmoves are kicking for "A.forTest_if 0.5 RANDOM_RAMP_UP" Disabling them with -XX:ConditionalMoveLimit==0 (thanks Roland), muddies the waters a bit: # VM options: -XX:-TieredCompilation Benchmark (bias) (dg) (size) Mode Cnt Score Error Units A.forTest_if 0.1 RANDOM_RAMP_UP 1 avgt 5 3.535 ? 0.083 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 10 avgt 5 7.478 ? 0.232 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 100 avgt 5 42.348 ? 0.922 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 1000 avgt 5 460.924 ? 12.692 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 10000 avgt 5 3708.576 ? 110.138 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 1 avgt 5 3.557 ? 0.172 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 10 avgt 5 9.860 ? 0.135 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 100 avgt 5 82.380 ? 1.971 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 1000 avgt 5 832.391 ? 23.629 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 10000 avgt 5 8325.406 ? 206.872 ns/op # VM options: -XX:ConditionalMoveLimit=0 -XX:-TieredCompilation Benchmark (bias) (dg) (size) Mode Cnt Score Error Units A.forTest_if 0.1 RANDOM_RAMP_UP 1 avgt 5 3.554 ? 0.049 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 10 avgt 5 9.382 ? 0.062 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 100 avgt 5 37.483 ? 0.696 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 1000 avgt 5 369.375 ? 9.780 ns/op A.forTest_if 0.1 RANDOM_RAMP_UP 10000 avgt 5 3712.492 ? 128.310 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 1 avgt 5 3.546 ? 0.053 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 10 avgt 5 7.488 ? 0.118 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 100 avgt 5 52.889 ? 5.328 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 1000 avgt 5 447.437 ? 14.273 ns/op A.forTest_if 0.5 RANDOM_RAMP_UP 10000 avgt 5 10040.920 ? 993.644 ns/op Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From paul.sandoz at oracle.com Wed Jan 6 14:34:29 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 6 Jan 2016 15:34:29 +0100 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: > On 6 Jan 2016, at 13:38, Vitaly Davidovich wrote: > > Perhaps for conditional moves data dependency chains are more costly? > > cmov carries a dependency on both inputs, making it more likely to stall when at least one isn't available whereas the branch still allows cpu to continue with speculative execution. In a tight loop with a memory access as one input to cmov, the memory op has to retire before cmov can proceed; using cmov when both inputs are already ready (e.g. values in registers) is pretty harmless though and avoids a branch entirely. cmov also has larger encoding than a branch. > Ok. The generated code for an unrolled loop firsts load array elements into registers before performing the cmovs. > As the original jira on this issue states, cmov should only be used when the branch is profiled to be unpredictable. I'm not sure why loops with a max/min accumulator need to be called out separately in this regard - wouldn't the branch profile dictate this anyway? Yes, that was me not understanding the underlying branch profiling mechanisms. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From david.buck at oracle.com Mon Jan 4 08:35:51 2016 From: david.buck at oracle.com (david buck) Date: Mon, 4 Jan 2016 17:35:51 +0900 Subject: [8u] Request for approval: Backport of 8144487 and 8145754 In-Reply-To: <568A2A20.7030601@oracle.com> References: <568A2A20.7030601@oracle.com> Message-ID: <568A2EE7.4030600@oracle.com> Hi Tobias! Would you please include links to the code review threads on mail.openjdk.java.net? [ JDK 8 Updates: Push Approval Request Template ] http://openjdk.java.net/projects/jdk8u/approval-template.html Cheers, -Buck On 2016/01/04 17:15, Tobias Hartmann wrote: > Hi, > > please approve and review the following backports to 8u. > > 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true > https://bugs.openjdk.java.net/browse/JDK-8144487 > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 > > 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI > https://bugs.openjdk.java.net/browse/JDK-8145754 > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 > > Nightly testing showed no problems and the changes apply cleanly to 8u-dev. > > Thanks, > Tobias > From david.buck at oracle.com Mon Jan 4 09:56:39 2016 From: david.buck at oracle.com (david buck) Date: Mon, 4 Jan 2016 18:56:39 +0900 Subject: [8u] Request for approval: Backport of 8144487 and 8145754 In-Reply-To: <568A3BB9.1010501@oracle.com> References: <568A2A20.7030601@oracle.com> <568A2EE7.4030600@oracle.com> <568A3BB9.1010501@oracle.com> Message-ID: <568A41D7.2030503@oracle.com> approved for backport to 8u-dev Thank you for adding the review links. Cheers, -Buck On 2016/01/04 18:30, Tobias Hartmann wrote: > Hi David, > > sure, I included the links to the code review: > > 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true > https://bugs.openjdk.java.net/browse/JDK-8144487 > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020503.html > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 > > 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI > https://bugs.openjdk.java.net/browse/JDK-8145754 > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020502.html > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 > > Thanks, > Tobias > > On 04.01.2016 09:35, david buck wrote: >> Hi Tobias! >> >> Would you please include links to the code review threads on mail.openjdk.java.net? >> >> [ JDK 8 Updates: Push Approval Request Template ] >> http://openjdk.java.net/projects/jdk8u/approval-template.html >> >> Cheers, >> -Buck >> >> On 2016/01/04 17:15, Tobias Hartmann wrote: >>> Hi, >>> >>> please approve and review the following backports to 8u. >>> >>> 8144487: PhaseIdealLoop::build_and_optimize() must restore major_progress flag if skip_loop_opts is true >>> https://bugs.openjdk.java.net/browse/JDK-8144487 >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/21689239c407 >>> >>> 8145754: PhaseIdealLoop::is_scaled_iv_plus_offset() does not match AddI >>> https://bugs.openjdk.java.net/browse/JDK-8145754 >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/0e9d64117522 >>> >>> Nightly testing showed no problems and the changes apply cleanly to 8u-dev. >>> >>> Thanks, >>> Tobias >>> From vitalyd at gmail.com Wed Jan 6 14:45:35 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 09:45:35 -0500 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: > > Ok. The generated code for an unrolled loop firsts load array elements > into registers before performing the cmovs. Yes, but the cmov cannot proceed until that load retires. If you had a normal branch, speculation can continue past the branch and put more instructions into the pipeline barring other hazards/dependencies. By "available in registers" I meant a cmov executed against 2 values in registers that are already available (i.e. the loads which put the values into registers have already completed, or the registers were set with immediates, etc). Basically, if the cost of branch misprediction is higher than waiting for both inputs to cmov to be available, then cmov is better. For very predictable branches, cmov is a loss (as we've already established in this thread) and I think always will be (i.e. cpu vendors seem to be putting more and more smarts into branch prediction instead). Yes, that was me not understanding the underlying branch profiling > mechanisms. Actually, that question of mine was more aimed at John who said we should do something special for loops with max/min accumulators :). On Wed, Jan 6, 2016 at 9:34 AM, Paul Sandoz wrote: > > > On 6 Jan 2016, at 13:38, Vitaly Davidovich wrote: > > > > Perhaps for conditional moves data dependency chains are more costly? > > > > cmov carries a dependency on both inputs, making it more likely to stall > when at least one isn't available whereas the branch still allows cpu to > continue with speculative execution. In a tight loop with a memory access > as one input to cmov, the memory op has to retire before cmov can proceed; > using cmov when both inputs are already ready (e.g. values in registers) is > pretty harmless though and avoids a branch entirely. cmov also has larger > encoding than a branch. > > > > Ok. The generated code for an unrolled loop firsts load array elements > into registers before performing the cmovs. > > > > As the original jira on this issue states, cmov should only be used when > the branch is profiled to be unpredictable. I'm not sure why loops with a > max/min accumulator need to be called out separately in this regard - > wouldn't the branch profile dictate this anyway? > > Yes, that was me not understanding the underlying branch profiling > mechanisms. > > Paul. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.sandoz at oracle.com Wed Jan 6 15:01:52 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 6 Jan 2016 16:01:52 +0100 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: <60AACA78-1F15-4B35-84CB-6BCED8172324@oracle.com> > On 6 Jan 2016, at 15:45, Vitaly Davidovich wrote: > > Ok. The generated code for an unrolled loop firsts load array elements into registers before performing the cmovs. > > Yes, but the cmov cannot proceed until that load retires. If you had a normal branch, speculation can continue past the branch and put more instructions into the pipeline barring other hazards/dependencies. By "available in registers" I meant a cmov executed against 2 values in registers that are already available (i.e. the loads which put the values into registers have already completed, or the registers were set with immediates, etc). > > Basically, if the cost of branch misprediction is higher than waiting for both inputs to cmov to be available, then cmov is better. For very predictable branches, cmov is a loss (as we've already established in this thread) and I think always will be (i.e. cpu vendors seem to be putting more and more smarts into branch prediction instead). > Thanks for the explanations. It?s helpful. > Yes, that was me not understanding the underlying branch profiling mechanisms. > > Actually, that question of mine was more aimed at John who said we should do something special for loops with max/min accumulators :). > Oh, ok :-) Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From john.r.rose at oracle.com Wed Jan 6 17:22:53 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 6 Jan 2016 09:22:53 -0800 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: On Jan 6, 2016, at 6:45 AM, Vitaly Davidovich wrote: > > Basically, if the cost of branch misprediction is higher than waiting for both inputs to cmov to be available, then cmov is better. For very predictable branches, cmov is a loss (as we've already established in this thread) and I think always will be (i.e. cpu vendors seem to be putting more and more smarts into branch prediction instead). > > Yes, that was me not understanding the underlying branch profiling mechanisms. > > Actually, that question of mine was more aimed at John who said we should do something special for loops with max/min accumulators :). Buried in the bug comments is the following insight: Branch profiling by the JVM is different from branch profiling by the CPU, and the difference is significant for the specific use case of an accumulated max (or min). The CPU's profiling has a much shorter time scale: It collects information (many times) over the course of a single loop. The JVM's profiling has a long time scale, usually the whole application execution. If a loop has bursty behavior (high short-span correlation) the CPU can predict branches very well, even though the JVM sees just noise. (Fun fact: The JVM could also profile auto-correlation and other statistics, but we have avoided doing this so far.) So, usually, the branch profiling done in software by the JVM (interpreter or profiled tier) gives enough information to predict what the CPU will experience. In this very special case (a = max(a, x) for loop-varying x), almost all inputs "settle down" to an almost 100% branch profile, in favor of 'a'. For random data, you expect to find your max half way through the loop. That means that the second half of the loop can be speculated as "a = a" instead of "a = max(a, x)". This, in turn, can be detected in the JIT by pattern-matching locally on the max node, to see if it is of the form phi = max(phi, x). Fair enough? ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jan 6 17:32:54 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 6 Jan 2016 09:32:54 -0800 Subject: RFR(XS): 8144852: Corrupted oop in nmethod In-Reply-To: <5671F5F6.9060605@oracle.com> References: <566A44AA.1040101@oracle.com> <566AB84C.1000603@oracle.com> <566B216B.1020204@oracle.com> <5671CC94.2080205@oracle.com> <5671F5F6.9060605@oracle.com> Message-ID: <457FC936-D24F-4486-8D99-E4D8B55528CC@oracle.com> On Dec 16, 2015, at 3:38 PM, Ioi Lam wrote: > > Adding non_oop_word to oopDesc::print_*_on would imply that it's OK to assign this value in a more general context, which is not true. So put in a comment. The print_on stuff is for us to use in debuggers and tracing code, not for end users who might be confused. > So I would suggest keeping knowledge of non_oop_word inside nmethod for now, and we can revisit this if other places start to use non_oop_word. Either way is OK with me. But I like my print functions to be as forgiving as possible; don't you? ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jan 6 17:42:44 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 6 Jan 2016 09:42:44 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> Message-ID: <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> On Jan 6, 2016, at 4:20 AM, Vitaly Davidovich wrote: > > I realize that there may always be a user-specified shape that the JIT doesn't understand, but straightforward cases should hopefully Just Work(tm) as those patterns can be picked up elsewhere in code and performance improves without changing a line of code. Where we differ is this: I am skeptical that there is a well-defined set of "straightforward cases", which all reasonable coders, who expect optimization, will use. It is better to point out one case for favorable treatment, and say "if you really expect best optimization, use this name". Followed by, "if you don't choose to use that name, we'll still try to optimize all the straightforward cases, but don't expect us to prioritize them as highly as the best practice we suggested". ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Jan 6 17:51:06 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 6 Jan 2016 09:51:06 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568CEF5B.5060306@redhat.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> Message-ID: <8AF84C4E-7ECA-4A0F-8CF1-24C38B294C72@oracle.com> On Jan 6, 2016, at 2:41 AM, Andrew Haley wrote: > >> >> Range checks are interesting to block-level loop transformations >> (iteration range reorganization). Do you really want your loop >> optimizations to be gated on "sufficient smarts" in the JIT's >> expression pattern matcher? > > Please forgive me for pushing this: I'm not arguing for the sake of it, > I'm trying to understand your reasoning. > > As it stands we recognize a call to Objects.checkIndex and transform > it into a certain pattern. I'm assuming that it's not impossible to > recognize the logic inside Objects.checkIndex and transform it into > the same form that the intrinsic generates. And that would have a > payoff in all the places that the same logic is used in existing > programs, both inside and outside the JDK. Sure, and we do this as much as possible. But there are too many degrees of freedom in user-coded range check expressions. So we give the users a clearer target to aim at if they want best perf. on range checks. You could say (as Vitaly pointed out for Integer.compareTo), that we don't need an intrinsic as long as the bytecoded body of Objects.checkIndex has the Best Possible Formulation (tm) of a range check, which naturally will always be maximally optimized by the JIT. The specific problem with range checking is that (as I said before) the JVM inserts its own range checks into bytecode semantics (iaload etc.), and we need to make the user-written ones fold up with the JVM-inserted ones. That is a hard coupling between the JDK and JVM, much harder than just "yes, we are all using the same math". An intrinsic properly expresses and enforces this coupling. Using a similar expression does not. > I suppose one downside of this approach is that C2 might decide > not to inline Objects.checkIndex, so it would be called instead > and the optimization would not be done. Yes, come to think of it, one "super power" of an intrinsic is that the inlining heuristics apply to it more favorably. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Wed Jan 6 17:54:20 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 6 Jan 2016 07:54:20 -1000 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: <6C07E8DD-50D4-4B2E-BD8E-B131579A9664@oracle.com> References: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> <8DE14AF8-90A4-4DF2-9CC2-98EE2E4F8670@oracle.com> <1297DA97-3C65-403D-AB46-16E203A74F26@oracle.com> <6C07E8DD-50D4-4B2E-BD8E-B131579A9664@oracle.com> Message-ID: I just noticed this code in HotSpotResolvedJavaMethodImpl: private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); The only other direct usage of System.getProperty is: hotspot/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java 167: if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { I think both of them should be using the same mechanism as introduced by this change. > On Jan 4, 2016, at 12:47 PM, Christian Thalinger wrote: > >> >> On Jan 4, 2016, at 12:31 PM, Doug Simon > wrote: >> >>> >>> On 04 Jan 2016, at 18:41, Christian Thalinger > wrote: >>> >>>> >>>> On Jan 4, 2016, at 7:19 AM, Christian Thalinger > wrote: >>>> >>>>> >>>>> On Jan 4, 2016, at 7:16 AM, Christian Thalinger > wrote: >>>>> >>>>>> >>>>>> On Dec 22, 2015, at 4:50 AM, Doug Simon > wrote: >>>>>> >>>>>> The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: >>>>>> >>>>>> 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. >>>>>> 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. >>>>>> >>>>>> This change removes the JVMCI command line option support. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8146001 >>>>>> http://cr.openjdk.java.net/~dnsimon/8146001/ >>>>> >>>>> + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); >>>>> >>>>> + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); >>>>> >>>>> We should either use the jvmci. prefix or not. >>>> >>>> Sorry, I was reading the patch wrong. Of course both use the jvmci. prefix. >>> >>> I think we should prefix the property name in getBooleanProperty: >>> >>> + public static boolean getBooleanProperty(String name, boolean def) { >>> + String value = VM.getSavedProperty("jvmci." + name); >> >> Ok, sounds reasonable. >> >>> >>> and I put UseProfilingInformation back: >>> >>> diff -r 0fcfe4b07f7e src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Tue Dec 29 18:30:51 2015 +0100 >>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 04 07:40:46 2016 -1000 >>> @@ -24,7 +24,6 @@ package jdk.vm.ci.hotspot; >>> >>> import static jdk.vm.ci.hotspot.CompilerToVM.compilerToVM; >>> import static jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime; >>> -import static jdk.vm.ci.hotspot.HotSpotResolvedJavaMethod.Options.UseProfilingInformation; >>> import static jdk.vm.ci.hotspot.HotSpotVMConfig.config; >>> import static jdk.vm.ci.hotspot.UnsafeAccess.UNSAFE; >>> >>> @@ -65,6 +64,11 @@ import jdk.vm.ci.meta.TriState; >>> final class HotSpotResolvedJavaMethodImpl extends HotSpotMethod implements HotSpotResolvedJavaMethod, HotSpotProxified, MetaspaceWrapperObject { >>> >>> /** >>> + * Whether to use profiling information. >>> + */ >>> + private static final boolean UseProfilingInformation = HotSpotJVMCIRuntime.getBooleanProperty("UseProfilingInformation", true); >>> + >>> + /** >>> * Reference to metaspace Method object. >>> */ >>> private final long metaspaceMethod; >>> @@ -424,7 +428,7 @@ final class HotSpotResolvedJavaMethodImp >>> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >>> ProfilingInfo info; >>> >>> - if (UseProfilingInformation.getValue() && methodData == null) { >>> + if (UseProfilingInformation && methodData == null) { >>> long metaspaceMethodData = UNSAFE.getAddress(metaspaceMethod + config().methodDataOffset); >>> if (metaspaceMethodData != 0) { >>> methodData = new HotSpotMethodData(metaspaceMethodData, this); >> >> JVMCI should unconditionally return available profiling information. It's up to the compiler whether or not to use it. For example, this is now compilation local in Graal: >> >> http://hg.openjdk.java.net/graal/graal-compiler/rev/f35e653aa876#l16.16 > > Oh, I missed that. Yes, that works for us as well. Thanks for pointing that out. > >> >> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 6 17:56:29 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 12:56:29 -0500 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> Message-ID: > > Where we differ is this: I am skeptical that there is a well-defined set > of "straightforward cases", which all reasonable coders, who expect > optimization, will use. I'm not sure we differ here. A "straightforward case" of bytecode like Object::checkIndex is a good start. It is better to point out one case for favorable treatment, and say "if you > really expect best optimization, use this name". Followed by, "if you > don't choose to use that name, we'll still try to optimize all the > straightforward cases, but don't expect us to prioritize them as highly as > the best practice we suggested". For new code or code being modified, using a known method to guarantee optimization is great; unfortunately that doesn't work for existing code. And of course existing profile pollution problem makes using common entry points a bit unpleasant if there's risk the profile doesn't match your particular call. More generally, I'd expect you guys would also prefer to keep # of intrinsics down and rely on better canonicalization and pattern matching? This has, as mentioned, the added side benefit that it will match existing code shapes without requiring any changes. Over time, provided people report missed optimizations, hopefully the set of patterns that get matched increases and the # of "clever" cases that fail to optimize goes down. On Wed, Jan 6, 2016 at 12:42 PM, John Rose wrote: > On Jan 6, 2016, at 4:20 AM, Vitaly Davidovich wrote: > > > I realize that there may always be a user-specified shape that the JIT > doesn't understand, but straightforward cases should hopefully Just > Work(tm) as those patterns can be picked up elsewhere in code and > performance improves without changing a line of code. > > > Where we differ is this: I am skeptical that there is a well-defined set > of "straightforward cases", which all reasonable coders, who expect > optimization, will use. > > It is better to point out one case for favorable treatment, and say "if > you really expect best optimization, use this name". Followed by, "if you > don't choose to use that name, we'll still try to optimize all the > straightforward cases, but don't expect us to prioritize them as highly as > the best practice we suggested". > > ? John > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 6 18:00:22 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 13:00:22 -0500 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: <775F44DC-A0A1-42D5-BB2E-AE861A855125@oracle.com> <79E3DB9F-5425-4A93-A8C8-5223337D9346@oracle.com> Message-ID: > > The CPU's profiling has a much shorter time scale: It collects > information (many times) over the course of a single loop. The JVM's > profiling has a long time scale, usually the whole application execution. > If a loop has bursty behavior (high short-span correlation) the CPU can > predict branches very well, even though the JVM sees just noise. That's a good point. This almost implies that branches within loops shouldn't even use JVM collected profiles -- just emit a branch -- since software doesn't model the hardware as well (and even if it attempted, it would be a moving target with many different targets). On Wed, Jan 6, 2016 at 12:22 PM, John Rose wrote: > On Jan 6, 2016, at 6:45 AM, Vitaly Davidovich wrote: > > > Basically, if the cost of branch misprediction is higher than waiting for > both inputs to cmov to be available, then cmov is better. For very > predictable branches, cmov is a loss (as we've already established in this > thread) and I think always will be (i.e. cpu vendors seem to be putting > more and more smarts into branch prediction instead). > > Yes, that was me not understanding the underlying branch profiling >> mechanisms. > > > Actually, that question of mine was more aimed at John who said we should > do something special for loops with max/min accumulators :). > > > Buried in the bug comments is the following insight: Branch profiling by > the JVM is different from branch profiling by the CPU, and the difference > is significant for the specific use case of an accumulated max (or min). > > The CPU's profiling has a much shorter time scale: It collects > information (many times) over the course of a single loop. The JVM's > profiling has a long time scale, usually the whole application execution. > If a loop has bursty behavior (high short-span correlation) the CPU can > predict branches very well, even though the JVM sees just noise. > > (Fun fact: The JVM could also profile auto-correlation and other > statistics, but we have avoided doing this so far.) > > So, usually, the branch profiling done in software by the JVM (interpreter > or profiled tier) gives enough information to predict what the CPU will > experience. In this very special case (a = max(a, x) for loop-varying x), > almost all inputs "settle down" to an almost 100% branch profile, in favor > of 'a'. For random data, you expect to find your max half way through the > loop. That means that the second half of the loop can be speculated as "a > = a" instead of "a = max(a, x)". > > This, in turn, can be detected in the JIT by pattern-matching locally on > the max node, to see if it is of the form phi = max(phi, x). > > Fair enough? > > ? John > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Wed Jan 6 18:04:19 2016 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 6 Jan 2016 19:04:19 +0100 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: References: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> <8DE14AF8-90A4-4DF2-9CC2-98EE2E4F8670@oracle.com> <1297DA97-3C65-403D-AB46-16E203A74F26@oracle.com> <6C07E8DD-50D4-4B2E-BD8E-B131579A9664@oracle.com> Message-ID: <0BB3D050-7E42-4777-BB7B-E4D7DC2A6605@oracle.com> > On 06 Jan 2016, at 18:54, Christian Thalinger wrote: > > I just noticed this code in HotSpotResolvedJavaMethodImpl: > > private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); > > The only other direct usage of System.getProperty is: > > hotspot/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java > 167: if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { > > I think both of them should be using the same mechanism as introduced by this change. I agree (assuming you mean the HotSpotJVMCIRuntime.getBooleanProperty mechanism). There?s also: hotspot/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java 70: private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); But we will have to leave that as is given that HotSpotJVMCIRuntime is not visible from this code. We could also remove the (legacy) ?jvmci.runtime.TimeInit? alias. -Doug > >> On Jan 4, 2016, at 12:47 PM, Christian Thalinger wrote: >> >>> >>> On Jan 4, 2016, at 12:31 PM, Doug Simon wrote: >>> >>>> >>>> On 04 Jan 2016, at 18:41, Christian Thalinger wrote: >>>> >>>>> >>>>> On Jan 4, 2016, at 7:19 AM, Christian Thalinger wrote: >>>>> >>>>>> >>>>>> On Jan 4, 2016, at 7:16 AM, Christian Thalinger wrote: >>>>>> >>>>>>> >>>>>>> On Dec 22, 2015, at 4:50 AM, Doug Simon wrote: >>>>>>> >>>>>>> The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: >>>>>>> >>>>>>> 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. >>>>>>> 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. >>>>>>> >>>>>>> This change removes the JVMCI command line option support. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8146001 >>>>>>> http://cr.openjdk.java.net/~dnsimon/8146001/ >>>>>> >>>>>> + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); >>>>>> >>>>>> + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); >>>>>> >>>>>> We should either use the jvmci. prefix or not. >>>>> >>>>> Sorry, I was reading the patch wrong. Of course both use the jvmci. prefix. >>>> >>>> I think we should prefix the property name in getBooleanProperty: >>>> >>>> + public static boolean getBooleanProperty(String name, boolean def) { >>>> + String value = VM.getSavedProperty("jvmci." + name); >>> >>> Ok, sounds reasonable. >>> >>>> >>>> and I put UseProfilingInformation back: >>>> >>>> diff -r 0fcfe4b07f7e src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Tue Dec 29 18:30:51 2015 +0100 >>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 04 07:40:46 2016 -1000 >>>> @@ -24,7 +24,6 @@ package jdk.vm.ci.hotspot; >>>> >>>> import static jdk.vm.ci.hotspot.CompilerToVM.compilerToVM; >>>> import static jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime; >>>> -import static jdk.vm.ci.hotspot.HotSpotResolvedJavaMethod.Options.UseProfilingInformation; >>>> import static jdk.vm.ci.hotspot.HotSpotVMConfig.config; >>>> import static jdk.vm.ci.hotspot.UnsafeAccess.UNSAFE; >>>> >>>> @@ -65,6 +64,11 @@ import jdk.vm.ci.meta.TriState; >>>> final class HotSpotResolvedJavaMethodImpl extends HotSpotMethod implements HotSpotResolvedJavaMethod, HotSpotProxified, MetaspaceWrapperObject { >>>> >>>> /** >>>> + * Whether to use profiling information. >>>> + */ >>>> + private static final boolean UseProfilingInformation = HotSpotJVMCIRuntime.getBooleanProperty("UseProfilingInformation", true); >>>> + >>>> + /** >>>> * Reference to metaspace Method object. >>>> */ >>>> private final long metaspaceMethod; >>>> @@ -424,7 +428,7 @@ final class HotSpotResolvedJavaMethodImp >>>> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >>>> ProfilingInfo info; >>>> >>>> - if (UseProfilingInformation.getValue() && methodData == null) { >>>> + if (UseProfilingInformation && methodData == null) { >>>> long metaspaceMethodData = UNSAFE.getAddress(metaspaceMethod + config().methodDataOffset); >>>> if (metaspaceMethodData != 0) { >>>> methodData = new HotSpotMethodData(metaspaceMethodData, this); >>> >>> JVMCI should unconditionally return available profiling information. It's up to the compiler whether or not to use it. For example, this is now compilation local in Graal: >>> >>> http://hg.openjdk.java.net/graal/graal-compiler/rev/f35e653aa876#l16.16 >> >> Oh, I missed that. Yes, that works for us as well. Thanks for pointing that out. >> >>> >>> -Doug > From sergey.kuksenko at oracle.com Wed Jan 6 18:59:33 2016 From: sergey.kuksenko at oracle.com (Sergey Kuksenko) Date: Wed, 6 Jan 2016 10:59:33 -0800 Subject: Conditional moves vs. branching in unrolled loops In-Reply-To: References: Message-ID: <568D6415.2070306@oracle.com> Hi, Move under branch if always faster than cmov (due to additional data dependencies) in case of predicted branch. So the key point here how HW deal with unpredicted branches. Here (on slides 40-41) http://www.slideshare.net/SergeyKuksenko/quantum-performance-effects-44390719 you can find some measurements for predicted/unpredicted cases for different HW. On Intel x86 cost of unpredicted branch is quite low starting from Sandy Bridge micro-architecture, but only when the loop is small enough to fit into uop-cache. On AMD x86 cost of unpredicted branch is higher and cmov was winner, but I didn't check it on modern AMD CPUs. On 01/05/2016 03:51 AM, Paul Sandoz wrote: > Hi, > > Recent investigation comparing for loops with streams exposed what appears to be an issue with Math.max and generated code in unrolled loops. > > Namely this: > > @Benchmark > public int forTest_if() { > int[] a = ints; > int e = ints.length; > int m = Integer.MIN_VALUE; > for (int i = 0; i < e; i++) > if (a[i] >= m) > m = a[i]; > return m; > } > > is faster than this: > > @Benchmark > public int forTest_MathMax() { > int[] a = ints; > int e = ints.length; > int m = Integer.MIN_VALUE; > for (int i = 0; i < e; i++) > m = Math.max(m, a[i]); > return m; > } > > Or this: > > Arrays.stream(ints).reduce(Integer.MIN_VALUE, (a, b) -> a >= b ? a : b); > > is faster than this: > > Arrays.stream(ints).reduce(Integer.MIN_VALUE, Math::max); > > at least on an x86 i5 processor. > > See the following links for more details: > > https://bugs.openjdk.java.net/browse/JDK-8146071 > https://bugs.openjdk.java.net/browse/JDK-8146071?focusedCommentId=13883495&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13883495 > > For generated code in the for loop cases above see: > > https://bugs.openjdk.java.net/secure/attachment/56221/mathMax.perfasm.txt > > I am not familiar enough with the x86 architecture to fully explain why, but i presume branch prediction is trumping the conditional moves, which suggests that on certain processors the generated code for the Math.max intrinsic (and others) in unrolled loops should not use conditional moves. > > Thanks, > Paul. From christian.thalinger at oracle.com Wed Jan 6 19:19:35 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 6 Jan 2016 09:19:35 -1000 Subject: RFR (S): 8146246: JVMCICompiler::abort_on_pending_exception: assert(!thread->owns_locks()) failed: must release all locks when leaving VM Message-ID: https://bugs.openjdk.java.net/browse/JDK-8146246 The problem is that https://bugs.openjdk.java.net/browse/JDK-8145435 introduced ttyLocker to synchronize the exception output but java_lang_Throwable::print_stack_trace can call out to Java to get the cause. There are two solutions: 1) Remove ttyLocker and deal with some possible scrambling in the rare case of an exception: diff -r df8d635f2296 -r e87e187552fb src/share/vm/jvmci/jvmciCompiler.cpp --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 11:24:01 2015 -0800 +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Thu Dec 31 09:20:16 2015 -0800 @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const Handle exception(THREAD, PENDING_EXCEPTION); CLEAR_PENDING_EXCEPTION; - { - ttyLocker ttyl; - java_lang_Throwable::print_stack_trace(exception, tty); - } + java_lang_Throwable::print_stack_trace(exception, tty); // Something went wrong so disable compilation at this level method->set_not_compilable(CompLevel_full_optimization); @@ -181,11 +178,8 @@ void JVMCICompiler::abort_on_pending_exc Thread* THREAD = Thread::current(); CLEAR_PENDING_EXCEPTION; - { - ttyLocker ttyl; - tty->print_raw_cr(message); - java_lang_Throwable::print_stack_trace(exception, tty); - } + tty->print_raw_cr(message); + java_lang_Throwable::print_stack_trace(exception, tty); // Give other aborting threads to also print their stack traces. // This can be very useful when debugging class initialization diff -r df8d635f2296 -r e87e187552fb src/share/vm/runtime/java.cpp --- a/src/share/vm/runtime/java.cpp Tue Dec 29 11:24:01 2015 -0800 +++ b/src/share/vm/runtime/java.cpp Thu Dec 31 09:20:16 2015 -0800 @@ -432,7 +432,6 @@ void before_exit(JavaThread* thread) { if (HAS_PENDING_EXCEPTION) { Handle exception(THREAD, PENDING_EXCEPTION); CLEAR_PENDING_EXCEPTION; - ttyLocker ttyl; java_lang_Throwable::print_stack_trace(exception, tty); } #endif or 2) Call out to Java and let the Java code do the printing: diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.cpp --- a/src/share/vm/classfile/javaClasses.cpp Tue Dec 29 18:30:51 2015 +0100 +++ b/src/share/vm/classfile/javaClasses.cpp Wed Jan 06 09:12:00 2016 -1000 @@ -1784,6 +1784,20 @@ void java_lang_Throwable::print_stack_tr } } +/** + * Print the throwable stack trace by calling the Java method java.lang.Throwable.printStackTrace(). + */ +void java_lang_Throwable::java_printStackTrace(Handle throwable, TRAPS) { + assert(throwable->is_a(SystemDictionary::Throwable_klass()), "Throwable instance expected"); + JavaValue result(T_VOID); + JavaCalls::call_virtual(&result, + throwable, + KlassHandle(THREAD, SystemDictionary::Throwable_klass()), + vmSymbols::printStackTrace_name(), + vmSymbols::void_method_signature(), + THREAD); +} + void java_lang_Throwable::fill_in_stack_trace(Handle throwable, const methodHandle& method, TRAPS) { if (!StackTraceInThrowable) return; ResourceMark rm(THREAD); diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.hpp --- a/src/share/vm/classfile/javaClasses.hpp Tue Dec 29 18:30:51 2015 +0100 +++ b/src/share/vm/classfile/javaClasses.hpp Wed Jan 06 09:12:00 2016 -1000 @@ -554,6 +554,7 @@ class java_lang_Throwable: AllStatic { // Printing static void print(Handle throwable, outputStream* st); static void print_stack_trace(Handle throwable, outputStream* st); + static void java_printStackTrace(Handle throwable, TRAPS); // Debugging friend class JavaClasses; }; diff -r 0fcfe4b07f7e src/share/vm/jvmci/jvmciCompiler.cpp --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 18:30:51 2015 +0100 +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Wed Jan 06 09:12:00 2016 -1000 @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const Handle exception(THREAD, PENDING_EXCEPTION); CLEAR_PENDING_EXCEPTION; - { - ttyLocker ttyl; - java_lang_Throwable::print_stack_trace(exception, tty); - } + java_lang_Throwable::java_printStackTrace(exception, THREAD); // Something went wrong so disable compilation at this level method->set_not_compilable(CompLevel_full_optimization); @@ -181,11 +178,7 @@ void JVMCICompiler::abort_on_pending_exc Thread* THREAD = Thread::current(); CLEAR_PENDING_EXCEPTION; - { - ttyLocker ttyl; - tty->print_raw_cr(message); - java_lang_Throwable::print_stack_trace(exception, tty); - } + java_lang_Throwable::java_printStackTrace(exception, THREAD); // Give other aborting threads to also print their stack traces. // This can be very useful when debugging class initialization diff -r 0fcfe4b07f7e src/share/vm/runtime/java.cpp --- a/src/share/vm/runtime/java.cpp Tue Dec 29 18:30:51 2015 +0100 +++ b/src/share/vm/runtime/java.cpp Wed Jan 06 09:12:00 2016 -1000 @@ -433,7 +433,7 @@ void before_exit(JavaThread* thread) { Handle exception(THREAD, PENDING_EXCEPTION); CLEAR_PENDING_EXCEPTION; ttyLocker ttyl; - java_lang_Throwable::print_stack_trace(exception, tty); + java_lang_Throwable::java_printStackTrace(exception, THREAD); } #endif From vladimir.kozlov at oracle.com Wed Jan 6 19:34:59 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Jan 2016 11:34:59 -0800 Subject: RFR (S): 8146246: JVMCICompiler::abort_on_pending_exception: assert(!thread->owns_locks()) failed: must release all locks when leaving VM In-Reply-To: References: Message-ID: <568D6C63.5000403@oracle.com> I would go with "Java code do the printing". You left ttyLocker in case 2) in src/share/vm/runtime/java.cpp Thanks, Vladimir On 1/6/16 11:19 AM, Christian Thalinger wrote: > https://bugs.openjdk.java.net/browse/JDK-8146246 > > The problem is that https://bugs.openjdk.java.net/browse/JDK-8145435 introduced ttyLocker to synchronize the exception output but java_lang_Throwable::print_stack_trace can call out to Java to get the cause. > > There are two solutions: > > 1) Remove ttyLocker and deal with some possible scrambling in the rare case of an exception: > > diff -r df8d635f2296 -r e87e187552fb src/share/vm/jvmci/jvmciCompiler.cpp > --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 11:24:01 2015 -0800 > +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Thu Dec 31 09:20:16 2015 -0800 > @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const > Handle exception(THREAD, PENDING_EXCEPTION); > CLEAR_PENDING_EXCEPTION; > > - { > - ttyLocker ttyl; > - java_lang_Throwable::print_stack_trace(exception, tty); > - } > + java_lang_Throwable::print_stack_trace(exception, tty); > > // Something went wrong so disable compilation at this level > method->set_not_compilable(CompLevel_full_optimization); > @@ -181,11 +178,8 @@ void JVMCICompiler::abort_on_pending_exc > Thread* THREAD = Thread::current(); > CLEAR_PENDING_EXCEPTION; > > - { > - ttyLocker ttyl; > - tty->print_raw_cr(message); > - java_lang_Throwable::print_stack_trace(exception, tty); > - } > + tty->print_raw_cr(message); > + java_lang_Throwable::print_stack_trace(exception, tty); > > // Give other aborting threads to also print their stack traces. > // This can be very useful when debugging class initialization > diff -r df8d635f2296 -r e87e187552fb src/share/vm/runtime/java.cpp > --- a/src/share/vm/runtime/java.cpp Tue Dec 29 11:24:01 2015 -0800 > +++ b/src/share/vm/runtime/java.cpp Thu Dec 31 09:20:16 2015 -0800 > @@ -432,7 +432,6 @@ void before_exit(JavaThread* thread) { > if (HAS_PENDING_EXCEPTION) { > Handle exception(THREAD, PENDING_EXCEPTION); > CLEAR_PENDING_EXCEPTION; > - ttyLocker ttyl; > java_lang_Throwable::print_stack_trace(exception, tty); > } > #endif > > or > > 2) Call out to Java and let the Java code do the printing: > > diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.cpp > --- a/src/share/vm/classfile/javaClasses.cpp Tue Dec 29 18:30:51 2015 +0100 > +++ b/src/share/vm/classfile/javaClasses.cpp Wed Jan 06 09:12:00 2016 -1000 > @@ -1784,6 +1784,20 @@ void java_lang_Throwable::print_stack_tr > } > } > > +/** > + * Print the throwable stack trace by calling the Java method java.lang.Throwable.printStackTrace(). > + */ > +void java_lang_Throwable::java_printStackTrace(Handle throwable, TRAPS) { > + assert(throwable->is_a(SystemDictionary::Throwable_klass()), "Throwable instance expected"); > + JavaValue result(T_VOID); > + JavaCalls::call_virtual(&result, > + throwable, > + KlassHandle(THREAD, SystemDictionary::Throwable_klass()), > + vmSymbols::printStackTrace_name(), > + vmSymbols::void_method_signature(), > + THREAD); > +} > + > void java_lang_Throwable::fill_in_stack_trace(Handle throwable, const methodHandle& method, TRAPS) { > if (!StackTraceInThrowable) return; > ResourceMark rm(THREAD); > diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.hpp > --- a/src/share/vm/classfile/javaClasses.hpp Tue Dec 29 18:30:51 2015 +0100 > +++ b/src/share/vm/classfile/javaClasses.hpp Wed Jan 06 09:12:00 2016 -1000 > @@ -554,6 +554,7 @@ class java_lang_Throwable: AllStatic { > // Printing > static void print(Handle throwable, outputStream* st); > static void print_stack_trace(Handle throwable, outputStream* st); > + static void java_printStackTrace(Handle throwable, TRAPS); > // Debugging > friend class JavaClasses; > }; > diff -r 0fcfe4b07f7e src/share/vm/jvmci/jvmciCompiler.cpp > --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 18:30:51 2015 +0100 > +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Wed Jan 06 09:12:00 2016 -1000 > @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const > Handle exception(THREAD, PENDING_EXCEPTION); > CLEAR_PENDING_EXCEPTION; > > - { > - ttyLocker ttyl; > - java_lang_Throwable::print_stack_trace(exception, tty); > - } > + java_lang_Throwable::java_printStackTrace(exception, THREAD); > > // Something went wrong so disable compilation at this level > method->set_not_compilable(CompLevel_full_optimization); > @@ -181,11 +178,7 @@ void JVMCICompiler::abort_on_pending_exc > Thread* THREAD = Thread::current(); > CLEAR_PENDING_EXCEPTION; > > - { > - ttyLocker ttyl; > - tty->print_raw_cr(message); > - java_lang_Throwable::print_stack_trace(exception, tty); > - } > + java_lang_Throwable::java_printStackTrace(exception, THREAD); > > // Give other aborting threads to also print their stack traces. > // This can be very useful when debugging class initialization > diff -r 0fcfe4b07f7e src/share/vm/runtime/java.cpp > --- a/src/share/vm/runtime/java.cpp Tue Dec 29 18:30:51 2015 +0100 > +++ b/src/share/vm/runtime/java.cpp Wed Jan 06 09:12:00 2016 -1000 > @@ -433,7 +433,7 @@ void before_exit(JavaThread* thread) { > Handle exception(THREAD, PENDING_EXCEPTION); > CLEAR_PENDING_EXCEPTION; > ttyLocker ttyl; > - java_lang_Throwable::print_stack_trace(exception, tty); > + java_lang_Throwable::java_printStackTrace(exception, THREAD); > } > #endif > From john.r.rose at oracle.com Wed Jan 6 19:50:22 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 6 Jan 2016 11:50:22 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> Message-ID: <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich wrote: > > better canonicalization That's our first and most important tactic. (Actually inlining is.) But the various idioms for checkIndex do not canonicalize easily. In this case the correct trade-off is not to invest more time and research and code into stronger canonicalization. We do have canonicalization of if-expressions. It's just that in this case strengthening it to cover range checks reliably is harder than the reasonable alternative. ? John PS. I am tempted to write out a list of 20 different ways to code a range check but will leave that as a exercise. From vitalyd at gmail.com Wed Jan 6 20:39:47 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 15:39:47 -0500 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> Message-ID: I don't think there's a need to write out 20 different ways to do a range check -- I think nobody would expect all 20 to be covered by the optimizer. Some of those variations may not map cleanly to Object::checkIndex either, nor is there any guarantee that people will update all their existing range checks (or even know about) to use Object::checkIndex -- some code will be left unoptimized no matter what. But my point is the same as Andrew's, I think; instead of making checkIndex an intrinsic, simply add a pattern match against that exact bytecode shape (perhaps with basic canonicalization) and then still encourage people to use Object::checkIndex. This is better than intrinsic (modulo profile pollution) since any other code that happens to use same pattern will match as well, and not require an update to use checkIndex. Then, if someone comes to this list with an unoptimized example with a different bytecode shape and has a convincing argument that the code shape is "common", you guys can consider pattern matching that as well. On Wed, Jan 6, 2016 at 2:50 PM, John Rose wrote: > > > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich wrote: > > > > better canonicalization > > That's our first and most important tactic. (Actually inlining is.) > > But the various idioms for checkIndex do not canonicalize easily. In this > case the correct trade-off is not to invest more time and research and code > into stronger canonicalization. > > We do have canonicalization of if-expressions. It's just that in this case > strengthening it to cover range checks reliably is harder than the > reasonable alternative. > > ? John > > PS. I am tempted to write out a list of 20 different ways to code a range > check but will leave that as a exercise. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Jan 6 20:57:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Jan 2016 12:57:05 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> Message-ID: <568D7FA1.4040707@oracle.com> Note, we already have range check pattern matching code in C2 (thanks to Roland): https://bugs.openjdk.java.net/browse/JDK-8137168 Vladimir On 1/6/16 12:39 PM, Vitaly Davidovich wrote: > I don't think there's a need to write out 20 different ways to do a > range check -- I think nobody would expect all 20 to be covered by the > optimizer. Some of those variations may not map cleanly to > Object::checkIndex either, nor is there any guarantee that people will > update all their existing range checks (or even know about) to use > Object::checkIndex -- some code will be left unoptimized no matter what. > > But my point is the same as Andrew's, I think; instead of making > checkIndex an intrinsic, simply add a pattern match against that exact > bytecode shape (perhaps with basic canonicalization) and then still > encourage people to use Object::checkIndex. This is better than > intrinsic (modulo profile pollution) since any other code that happens > to use same pattern will match as well, and not require an update to use > checkIndex. Then, if someone comes to this list with an unoptimized > example with a different bytecode shape and has a convincing argument > that the code shape is "common", you guys can consider pattern matching > that as well. > > On Wed, Jan 6, 2016 at 2:50 PM, John Rose > wrote: > > > > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich > wrote: > > > > better canonicalization > > That's our first and most important tactic. (Actually inlining is.) > > But the various idioms for checkIndex do not canonicalize easily. In > this case the correct trade-off is not to invest more time and > research and code into stronger canonicalization. > > We do have canonicalization of if-expressions. It's just that in > this case strengthening it to cover range checks reliably is harder > than the reasonable alternative. > > ? John > > PS. I am tempted to write out a list of 20 different ways to code a > range check but will leave that as a exercise. > > From vladimir.kozlov at oracle.com Wed Jan 6 22:25:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Jan 2016 14:25:40 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <565E4A28.5010008@oracle.com> <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568BFA90.4020807@oracle.com> Message-ID: <568D9464.2090008@oracle.com> Hi Kishor, Please send this as separate RFR for 8135250. RFR should be sent to jdk9-dev at openjdk.java.net since it is JDK changes. And CC to paul.sandoz at oracle.com who is assigned to the bug. Thanks, Vladimir On 1/5/16 1:39 PM, Kharbas, Kishor wrote: > Thank you guys for the in detail discussion and review. > > I have patched the JDK, performing bound checking using Objects.checkFromIndexSize() in CounterMode.crypt() and AESCrypt.encryptBlock(), AESCrypt.decryptBlock() > Here is the link - http://cr.openjdk.java.net/~vdeshpande/8135250/webrev.00/ > > Let me know if it looks correct. > > -Kishor > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Tuesday, January 05, 2016 9:17 AM > To: Andrew Haley; John Rose > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES > > > On 31 Dec 2015, at 22:33, John Rose wrote: > > > > When performing explicit range checks in pre-intrinsic code, > let's try to use the new intrinsic functions in java.util.Objects, > called checkIndex, checkFromToIndex, and checkFromIndexSize. > > Please, don't forget that checks in pre-intrinsic code should match checks generated by javac (bytecode) for intrinsified methods. Otherwise those checks will not be removed (by dominated checks in pre-intrinsic code) when intrinsics are not support on a platform. That is why we currently have such duplicated pre-intrinsic code. > > On other hand when intrinsics are supported they don't have checks so if they present we can intrinsify pre-intrinsic code as you suggested. > > Thanks, > Vladimir > > On 1/5/16 1:48 AM, Andrew Haley wrote: >> On 04/01/16 20:12, John Rose wrote: >>> Corrected, thanks. They don't need to be intrinsics if they optimize well. >>> The point is that the library functions have code shapes which work >>> well with the JIT. For example, the multi-index checks might (as in >>> Kishor's code) be implemented on top of the single-index check, >>> without themselves being intrinsics. >> >> We seem to be missing the opportunity to convert >> >> i >= 0 && i < size >> >> into >> >> (unsigned)i < (unsigned)size >> >> and this is, as far as I can see, the only real code-quality advantage >> of the checkIndex intrinsic. Could we not do this optimization and >> then drop the C2 checkIndex intrinsic? >> >> Andrew. >> From christian.thalinger at oracle.com Wed Jan 6 22:57:39 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 6 Jan 2016 12:57:39 -1000 Subject: RFR (S): 8146246: JVMCICompiler::abort_on_pending_exception: assert(!thread->owns_locks()) failed: must release all locks when leaving VM In-Reply-To: <568D6C63.5000403@oracle.com> References: <568D6C63.5000403@oracle.com> Message-ID: <0C62FED5-F3F8-44CE-B1DB-095F9170370B@oracle.com> > On Jan 6, 2016, at 9:34 AM, Vladimir Kozlov wrote: > > I would go with "Java code do the printing?. Yeah, it might be better. > You left ttyLocker in case 2) in src/share/vm/runtime/java.cpp Right. Thanks for pointing that out. > > Thanks, > Vladimir > > On 1/6/16 11:19 AM, Christian Thalinger wrote: >> https://bugs.openjdk.java.net/browse/JDK-8146246 >> >> The problem is that https://bugs.openjdk.java.net/browse/JDK-8145435 introduced ttyLocker to synchronize the exception output but java_lang_Throwable::print_stack_trace can call out to Java to get the cause. >> >> There are two solutions: >> >> 1) Remove ttyLocker and deal with some possible scrambling in the rare case of an exception: >> >> diff -r df8d635f2296 -r e87e187552fb src/share/vm/jvmci/jvmciCompiler.cpp >> --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 11:24:01 2015 -0800 >> +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Thu Dec 31 09:20:16 2015 -0800 >> @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const >> Handle exception(THREAD, PENDING_EXCEPTION); >> CLEAR_PENDING_EXCEPTION; >> >> - { >> - ttyLocker ttyl; >> - java_lang_Throwable::print_stack_trace(exception, tty); >> - } >> + java_lang_Throwable::print_stack_trace(exception, tty); >> >> // Something went wrong so disable compilation at this level >> method->set_not_compilable(CompLevel_full_optimization); >> @@ -181,11 +178,8 @@ void JVMCICompiler::abort_on_pending_exc >> Thread* THREAD = Thread::current(); >> CLEAR_PENDING_EXCEPTION; >> >> - { >> - ttyLocker ttyl; >> - tty->print_raw_cr(message); >> - java_lang_Throwable::print_stack_trace(exception, tty); >> - } >> + tty->print_raw_cr(message); >> + java_lang_Throwable::print_stack_trace(exception, tty); >> >> // Give other aborting threads to also print their stack traces. >> // This can be very useful when debugging class initialization >> diff -r df8d635f2296 -r e87e187552fb src/share/vm/runtime/java.cpp >> --- a/src/share/vm/runtime/java.cpp Tue Dec 29 11:24:01 2015 -0800 >> +++ b/src/share/vm/runtime/java.cpp Thu Dec 31 09:20:16 2015 -0800 >> @@ -432,7 +432,6 @@ void before_exit(JavaThread* thread) { >> if (HAS_PENDING_EXCEPTION) { >> Handle exception(THREAD, PENDING_EXCEPTION); >> CLEAR_PENDING_EXCEPTION; >> - ttyLocker ttyl; >> java_lang_Throwable::print_stack_trace(exception, tty); >> } >> #endif >> >> or >> >> 2) Call out to Java and let the Java code do the printing: >> >> diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.cpp >> --- a/src/share/vm/classfile/javaClasses.cpp Tue Dec 29 18:30:51 2015 +0100 >> +++ b/src/share/vm/classfile/javaClasses.cpp Wed Jan 06 09:12:00 2016 -1000 >> @@ -1784,6 +1784,20 @@ void java_lang_Throwable::print_stack_tr >> } >> } >> >> +/** >> + * Print the throwable stack trace by calling the Java method java.lang.Throwable.printStackTrace(). >> + */ >> +void java_lang_Throwable::java_printStackTrace(Handle throwable, TRAPS) { >> + assert(throwable->is_a(SystemDictionary::Throwable_klass()), "Throwable instance expected"); >> + JavaValue result(T_VOID); >> + JavaCalls::call_virtual(&result, >> + throwable, >> + KlassHandle(THREAD, SystemDictionary::Throwable_klass()), >> + vmSymbols::printStackTrace_name(), >> + vmSymbols::void_method_signature(), >> + THREAD); >> +} >> + >> void java_lang_Throwable::fill_in_stack_trace(Handle throwable, const methodHandle& method, TRAPS) { >> if (!StackTraceInThrowable) return; >> ResourceMark rm(THREAD); >> diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.hpp >> --- a/src/share/vm/classfile/javaClasses.hpp Tue Dec 29 18:30:51 2015 +0100 >> +++ b/src/share/vm/classfile/javaClasses.hpp Wed Jan 06 09:12:00 2016 -1000 >> @@ -554,6 +554,7 @@ class java_lang_Throwable: AllStatic { >> // Printing >> static void print(Handle throwable, outputStream* st); >> static void print_stack_trace(Handle throwable, outputStream* st); >> + static void java_printStackTrace(Handle throwable, TRAPS); >> // Debugging >> friend class JavaClasses; >> }; >> diff -r 0fcfe4b07f7e src/share/vm/jvmci/jvmciCompiler.cpp >> --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 18:30:51 2015 +0100 >> +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Wed Jan 06 09:12:00 2016 -1000 >> @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const >> Handle exception(THREAD, PENDING_EXCEPTION); >> CLEAR_PENDING_EXCEPTION; >> >> - { >> - ttyLocker ttyl; >> - java_lang_Throwable::print_stack_trace(exception, tty); >> - } >> + java_lang_Throwable::java_printStackTrace(exception, THREAD); >> >> // Something went wrong so disable compilation at this level >> method->set_not_compilable(CompLevel_full_optimization); >> @@ -181,11 +178,7 @@ void JVMCICompiler::abort_on_pending_exc >> Thread* THREAD = Thread::current(); >> CLEAR_PENDING_EXCEPTION; >> >> - { >> - ttyLocker ttyl; >> - tty->print_raw_cr(message); >> - java_lang_Throwable::print_stack_trace(exception, tty); >> - } >> + java_lang_Throwable::java_printStackTrace(exception, THREAD); >> >> // Give other aborting threads to also print their stack traces. >> // This can be very useful when debugging class initialization >> diff -r 0fcfe4b07f7e src/share/vm/runtime/java.cpp >> --- a/src/share/vm/runtime/java.cpp Tue Dec 29 18:30:51 2015 +0100 >> +++ b/src/share/vm/runtime/java.cpp Wed Jan 06 09:12:00 2016 -1000 >> @@ -433,7 +433,7 @@ void before_exit(JavaThread* thread) { >> Handle exception(THREAD, PENDING_EXCEPTION); >> CLEAR_PENDING_EXCEPTION; >> ttyLocker ttyl; >> - java_lang_Throwable::print_stack_trace(exception, tty); >> + java_lang_Throwable::java_printStackTrace(exception, THREAD); >> } >> #endif >> From sangheon.kim at oracle.com Wed Jan 6 23:50:35 2016 From: sangheon.kim at oracle.com (sangheon) Date: Wed, 6 Jan 2016 15:50:35 -0800 Subject: RFR(s): 8144573: TLABWasteIncrement=max_jint fires an assert on SPARC for non-G1 GC mode In-Reply-To: <6D69BB31-A1F4-44A8-8CED-CF166CB2EB46@oracle.com> References: <568C6049.5020400@oracle.com> <6D69BB31-A1F4-44A8-8CED-CF166CB2EB46@oracle.com> Message-ID: <568DA84B.9050309@oracle.com> Hi Igor, Thank you for reviewing this. On 01/05/2016 08:29 PM, Igor Veresov wrote: > I?m not sure we care a lot about tiny bits of performance in the this instance? But, in case use wanted to keep the original code for the simm13 case you could check the range of the constant and still emit the code that was there before. It also seems suboptimal to do set64 in MacroAssembler::tlab_refill() on all paths - the result of the original add in the delay slot doesn?t seem to be used if we jump to discard_tlab, right? You are right. If the branch is taken, original add in the delay slot is not used. The reason of always calling 'set64' was to keep its behavior. i.e. same order of doing something before branch within delay slot. But as you said, it is less tighter code. > So, may be you could do something like: > > brx(Assembler::lessEqual, false, Assembler::pt, discard_tlab); > if (is_simm13(ThreadLocalAllocBuffer::refill_waste_limit_increment())) { > delayed()->add(t2, ThreadLocalAllocBuffer::refill_waste_limit_increment(), t2); > } else { > delayed()->nop(); > set64(ThreadLocalAllocBuffer::refill_waste_limit_increment(), t3, G0); > add(t2, t3, t2); > } Okay, checking its value first seems good idea. > > Similarly, tighter code can be emitted for the interpreter in templateTable_sparc.cpp. Okay, done. Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.01 Thanks, Sangheon > > igor > > >> On Jan 5, 2016, at 4:31 PM, sangheon wrote: >> >> Hi all, >> >> Could I have reviews for the below change to remove size limitation(<4096) of TLABWasteIncrement on SPARC? >> >> Current implementation uses 'add(Register, int, Register)' which has 13bit limitation for 'int' parameter. >> I changed to use 'set64' to load the value to register and then call 'add'. 'set64' will run cheap path as the range of TLABWasteIncrememt is (0, max_juint). >> >> This assert is only fired on non-G1 mode as G1 is the only GC that returns false from Universe::heap()->supports_inline_contig_alloc() by default option. And this decides to fall that routine. >> >> I didn't add a test as current TestOptionsWithRanges.java is enough to test this case with nightly option rotation. >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8144573 >> Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.00/ >> Testing: JPRT, manual test on SPARC[1] >> >> [1]: java -XX:TLABWasteIncrement=4096(and some larger values as well) -XX:+UseConcMarkSweepGC(UseParallelGC and UseSerialGC) -version >> >> Thanks, >> Sangheon From vladimir.kozlov at oracle.com Wed Jan 6 23:58:34 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Jan 2016 15:58:34 -0800 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568D17E4.90301@redhat.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> Message-ID: <568DAA2A.9070704@oracle.com> Andrew is right. GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. StrInflatedCopyNode is not memory node. Thanks, Vladimir On 1/6/16 5:34 AM, Andrew Haley wrote: > On 01/06/2016 01:06 PM, Tobias Hartmann wrote: > >> The problem here is that C2 reorders memory instructions and moves >> an array load before an array store. The MemBarCPUOrder is now used >> (compiler internally) to prevent this. We do the same for normal >> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >> code is emitted. See also the comment in memnode.hpp: >> >> // Ordering within the same CPU. Used to order unsafe memory references >> // inside the compiler when we lack alias info. Not needed "outside" the >> // compiler because the CPU does all the ordering for us. >> >> "CPU does all the ordering for us" means that even with a relaxed >> memory ordering, loads are never moved before dependent stores. >> >> Or did I misunderstand your question? > > No, I don't think so. I was just checking: I am very aware that > HotSpot has presented those of use with relaxed memory order machines > with some interesting gotchas over the years, that's all. I'm a bit > surprised that C2 needs this barrier, given that there is a > read-after-write dependency, but never mind. > > Thanks, > > Andrew. > From igor.veresov at oracle.com Thu Jan 7 00:01:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 6 Jan 2016 16:01:48 -0800 Subject: RFR(s): 8144573: TLABWasteIncrement=max_jint fires an assert on SPARC for non-G1 GC mode In-Reply-To: <568DA84B.9050309@oracle.com> References: <568C6049.5020400@oracle.com> <6D69BB31-A1F4-44A8-8CED-CF166CB2EB46@oracle.com> <568DA84B.9050309@oracle.com> Message-ID: That looks good to me. igor > On Jan 6, 2016, at 3:50 PM, sangheon wrote: > > Hi Igor, > > Thank you for reviewing this. > > On 01/05/2016 08:29 PM, Igor Veresov wrote: >> I?m not sure we care a lot about tiny bits of performance in the this instance? But, in case use wanted to keep the original code for the simm13 case you could check the range of the constant and still emit the code that was there before. It also seems suboptimal to do set64 in MacroAssembler::tlab_refill() on all paths - the result of the original add in the delay slot doesn?t seem to be used if we jump to discard_tlab, right? > You are right. > If the branch is taken, original add in the delay slot is not used. > > The reason of always calling 'set64' was to keep its behavior. i.e. same order of doing something before branch within delay slot. But as you said, it is less tighter code. > >> So, may be you could do something like: >> >> brx(Assembler::lessEqual, false, Assembler::pt, discard_tlab); >> if (is_simm13(ThreadLocalAllocBuffer::refill_waste_limit_increment())) { >> delayed()->add(t2, ThreadLocalAllocBuffer::refill_waste_limit_increment(), t2); >> } else { >> delayed()->nop(); >> set64(ThreadLocalAllocBuffer::refill_waste_limit_increment(), t3, G0); >> add(t2, t3, t2); >> } > Okay, checking its value first seems good idea. > >> >> Similarly, tighter code can be emitted for the interpreter in templateTable_sparc.cpp. > Okay, done. > > Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.01 > > Thanks, > Sangheon > > >> >> igor >> >> >>> On Jan 5, 2016, at 4:31 PM, sangheon wrote: >>> >>> Hi all, >>> >>> Could I have reviews for the below change to remove size limitation(<4096) of TLABWasteIncrement on SPARC? >>> >>> Current implementation uses 'add(Register, int, Register)' which has 13bit limitation for 'int' parameter. >>> I changed to use 'set64' to load the value to register and then call 'add'. 'set64' will run cheap path as the range of TLABWasteIncrememt is (0, max_juint). >>> >>> This assert is only fired on non-G1 mode as G1 is the only GC that returns false from Universe::heap()->supports_inline_contig_alloc() by default option. And this decides to fall that routine. >>> >>> I didn't add a test as current TestOptionsWithRanges.java is enough to test this case with nightly option rotation. >>> >>> CR: https://bugs.openjdk.java.net/browse/JDK-8144573 >>> Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.00/ >>> Testing: JPRT, manual test on SPARC[1] >>> >>> [1]: java -XX:TLABWasteIncrement=4096(and some larger values as well) -XX:+UseConcMarkSweepGC(UseParallelGC and UseSerialGC) -version >>> >>> Thanks, >>> Sangheon > From vitalyd at gmail.com Thu Jan 7 00:11:39 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 6 Jan 2016 19:11:39 -0500 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <568D7FA1.4040707@oracle.com> References: <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> <568D7FA1.4040707@oracle.com> Message-ID: Does checkIndex match on it? If so, is there a reason to proceed with intrinsifying checkIndex? On Wednesday, January 6, 2016, Vladimir Kozlov wrote: > Note, we already have range check pattern matching code in C2 (thanks to > Roland): > > https://bugs.openjdk.java.net/browse/JDK-8137168 > > Vladimir > > On 1/6/16 12:39 PM, Vitaly Davidovich wrote: > >> I don't think there's a need to write out 20 different ways to do a >> range check -- I think nobody would expect all 20 to be covered by the >> optimizer. Some of those variations may not map cleanly to >> Object::checkIndex either, nor is there any guarantee that people will >> update all their existing range checks (or even know about) to use >> Object::checkIndex -- some code will be left unoptimized no matter what. >> >> But my point is the same as Andrew's, I think; instead of making >> checkIndex an intrinsic, simply add a pattern match against that exact >> bytecode shape (perhaps with basic canonicalization) and then still >> encourage people to use Object::checkIndex. This is better than >> intrinsic (modulo profile pollution) since any other code that happens >> to use same pattern will match as well, and not require an update to use >> checkIndex. Then, if someone comes to this list with an unoptimized >> example with a different bytecode shape and has a convincing argument >> that the code shape is "common", you guys can consider pattern matching >> that as well. >> >> On Wed, Jan 6, 2016 at 2:50 PM, John Rose > > wrote: >> >> >> > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich > > wrote: >> > >> > better canonicalization >> >> That's our first and most important tactic. (Actually inlining is.) >> >> But the various idioms for checkIndex do not canonicalize easily. In >> this case the correct trade-off is not to invest more time and >> research and code into stronger canonicalization. >> >> We do have canonicalization of if-expressions. It's just that in >> this case strengthening it to cover range checks reliably is harder >> than the reasonable alternative. >> >> ? John >> >> PS. I am tempted to write out a list of 20 different ways to code a >> range check but will leave that as a exercise. >> >> >> -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From sangheon.kim at oracle.com Thu Jan 7 00:12:12 2016 From: sangheon.kim at oracle.com (sangheon) Date: Wed, 6 Jan 2016 16:12:12 -0800 Subject: RFR(s): 8144573: TLABWasteIncrement=max_jint fires an assert on SPARC for non-G1 GC mode In-Reply-To: References: <568C6049.5020400@oracle.com> <6D69BB31-A1F4-44A8-8CED-CF166CB2EB46@oracle.com> <568DA84B.9050309@oracle.com> Message-ID: <568DAD5C.9080409@oracle.com> Thanks for the review. Sangheon On 01/06/2016 04:01 PM, Igor Veresov wrote: > That looks good to me. > > igor > >> On Jan 6, 2016, at 3:50 PM, sangheon wrote: >> >> Hi Igor, >> >> Thank you for reviewing this. >> >> On 01/05/2016 08:29 PM, Igor Veresov wrote: >>> I?m not sure we care a lot about tiny bits of performance in the this instance? But, in case use wanted to keep the original code for the simm13 case you could check the range of the constant and still emit the code that was there before. It also seems suboptimal to do set64 in MacroAssembler::tlab_refill() on all paths - the result of the original add in the delay slot doesn?t seem to be used if we jump to discard_tlab, right? >> You are right. >> If the branch is taken, original add in the delay slot is not used. >> >> The reason of always calling 'set64' was to keep its behavior. i.e. same order of doing something before branch within delay slot. But as you said, it is less tighter code. >> >>> So, may be you could do something like: >>> >>> brx(Assembler::lessEqual, false, Assembler::pt, discard_tlab); >>> if (is_simm13(ThreadLocalAllocBuffer::refill_waste_limit_increment())) { >>> delayed()->add(t2, ThreadLocalAllocBuffer::refill_waste_limit_increment(), t2); >>> } else { >>> delayed()->nop(); >>> set64(ThreadLocalAllocBuffer::refill_waste_limit_increment(), t3, G0); >>> add(t2, t3, t2); >>> } >> Okay, checking its value first seems good idea. >> >>> Similarly, tighter code can be emitted for the interpreter in templateTable_sparc.cpp. >> Okay, done. >> >> Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.01 >> >> Thanks, >> Sangheon >> >> >>> igor >>> >>> >>>> On Jan 5, 2016, at 4:31 PM, sangheon wrote: >>>> >>>> Hi all, >>>> >>>> Could I have reviews for the below change to remove size limitation(<4096) of TLABWasteIncrement on SPARC? >>>> >>>> Current implementation uses 'add(Register, int, Register)' which has 13bit limitation for 'int' parameter. >>>> I changed to use 'set64' to load the value to register and then call 'add'. 'set64' will run cheap path as the range of TLABWasteIncrememt is (0, max_juint). >>>> >>>> This assert is only fired on non-G1 mode as G1 is the only GC that returns false from Universe::heap()->supports_inline_contig_alloc() by default option. And this decides to fall that routine. >>>> >>>> I didn't add a test as current TestOptionsWithRanges.java is enough to test this case with nightly option rotation. >>>> >>>> CR: https://bugs.openjdk.java.net/browse/JDK-8144573 >>>> Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.00/ >>>> Testing: JPRT, manual test on SPARC[1] >>>> >>>> [1]: java -XX:TLABWasteIncrement=4096(and some larger values as well) -XX:+UseConcMarkSweepGC(UseParallelGC and UseSerialGC) -version >>>> >>>> Thanks, >>>> Sangheon From vivek.r.deshpande at intel.com Thu Jan 7 00:31:46 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 7 Jan 2016 00:31:46 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <5684A5B8.7070407@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569E1902@ORSMSX106.amr.corp.intel.com> <5684A5B8.7070407@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569F23FC@ORSMSX106.amr.corp.intel.com> HI Vladimir, Yes, the macroAssembler_x86_libm.cpp file is getting large, I could look into splitting it into two files macroAssembler_libm_x86_64.cpp and macroAssembler_libm_x86_32.cpp. Please let me know if that sounds good to you. The 64 bit code takes advantage of additional general purpose registers and 64 bit integer arithmetic and so we have two different versions for 32 bit and 64 bit. Regarding the FPU usage in cos/sin, we talked with the LIBM algorithm experts and they came back with the following: "It would not be easy to remove FPU x87 instructions from libm_sincos_huge and libm_reduced_pi04l, they are designed with using extended precision from FPU in mind. The performance for 32bit implementation for these that do not use x87 instructions may not be optimal. These two are only used for very large input arguments." Thank you. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, December 30, 2015 7:49 PM To: Deshpande, Vivek R; Joseph D. Darcy Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib Hi Vivek, Why 32-bit code is so different from 64-bit code? You only use it if sse2 is available so XMM registers are present. Why to use FPU if you have SSE? 32-bit: 582 movsd(Address(rsp, 8), xmm0); 583 fld_d(Address(rsp, 8)); 584 movsd(Address(rsp, 16), xmm6); 585 fld_d(Address(rsp, 16)); 586 fmula(1); 64-bit: 295 mulsd(xmm0, xmm2); It is concerned to all LIBM 32-bit intrinsics. The main concern is that macroAssembler_x86_libm.cpp file become too large and it would be nice if 32-bit and 64-bit reuse the same code. Thanks, Vladimir On 12/24/15 6:10 PM, Deshpande, Vivek R wrote: > HI Vladimir > > I have updated the libm sin cos intrinsics for x86 for hotspot. > The updated webrev for the same is at this location for your review. > http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/web > rev.00/ > Could you please review it. > > Regards, > Vivek > > > -----Original Message----- > From: Deshpande, Vivek R > Sent: Tuesday, December 22, 2015 5:42 PM > To: 'Joseph D. Darcy'; Vladimir Kozlov > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > HI All > > I have uploaded the patch for sin and cos tests with input and allowed outputs at this location for your review. > http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev. > 00/ Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 > Thank you. > > Regards, > Vivek > > -----Original Message----- > From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] > Sent: Friday, December 04, 2015 4:50 PM > To: Deshpande, Vivek R; Vladimir Kozlov > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > Hi Vivek, > > On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >> Hi >> >> Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. >> Let me know your thoughts. > > As a rough test of another sin/cos implementation, StrictMath.{sin, > cos} can be used a reference with the following caveat: there isn't an > indication of which why the error is in a StrictMath result. Let me > given an example, if > > StrictMath.sin(x) => y > > then one of the following should be true > > Math.sin(x) => y > Math.sin(x) => Math.nextUp(y) > Math.sin(x) => Math.nextDown(y) > > That is, Math.sin(x) should either be the same as StrictMath.sin(x) OR > equal to one of the floating-point numbers adjacent to that result. Of > these three options, only two area allowed by the accuracy > requirements of the StrictMath.sin specification. However, since > StrictMath.sin doesn't give an indication of which way its error went > (if it rounded up or down), there is no indication without additional > work which of > nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't buggy). > > HTH, > > -Joe > > >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: joe darcy [mailto:joe.darcy at oracle.com] >> Sent: Thursday, December 03, 2015 1:29 PM >> To: Vladimir Kozlov; Deshpande, Vivek R >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hello, >> >> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>> Vivek, >>> >>> I think Joe is asking you to write these tests as hotspot regression >>> test in hotspot/test/compiler. >> Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. >> >> Thanks, >> >> -Joe >> >>> Vladimir >>> >>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>> Hi Joe >>>> >>>> It would be great if you would please share the additional tests >>>> with us. >>>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>> Sent: Thursday, December 03, 2015 1:17 PM >>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> I think it is unwise for this large of an implementation change to >>>> be pushed with no tests targeting the specifics of the new implementation. >>>> >>>> The worst-case tests in the jdk repo are the mathematical worst >>>> cases for floating-point approximations, in other words the cases >>>> were the exact mathematical answer is closes to half-way between >>>> two representation floating-point numbers. Passing such tests is >>>> necessary but not sufficient condition for a new implementation. >>>> >>>> Chers, >>>> >>>> -Joe >>>> >>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>> Okay, looks reasonable to me. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> This is the link for the updated webrev with latest hotspot >>>>>> source as base for your review. >>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>> Thank you. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: Deshpande, Vivek R >>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> Hi Vladimir >>>>>> >>>>>> This is the link for the updated webrev for your review. >>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>> Thank you. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>> To: Deshpande, Vivek R; joe darcy >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> Please send link to new webrev on cr server. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> Please find the webrev with your suggested updates attached with >>>>>>> the mail. >>>>>>> We will update it in the jbs entry soon. >>>>>>> Please let me know if it needs further changes. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Deshpande, Vivek R >>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> HI Vladimir, Joe >>>>>>> >>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>> have mentioned. It passed those tests. >>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>> The performance gain is 3.2x over base jdk, that is over current >>>>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>>>> >>>>>>> Could I get those tests around the boundary values. Would >>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>> If yes, then it has passed those boundary cases. >>>>>>> >>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>> for libm and send out the webrev soon. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Just getting added to the thread.. >>>>>>> >>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>> Thank you, for explanation, Vivek. >>>>>>>> >>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>>>> Hotspot tests. >>>>>>>> >>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi Vladimir >>>>>>>>> >>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>>>> switch between FDLIBM and LIBM. >>>>>>>>> >>>>>>>>> Quick explanation: >>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>> should be = 0.19457293629570216 >>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>> 0.1945729362957022 >>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>> library result is between the above two values and Exact >>>>>>>>> result would be pretty close to it. >>>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>>> StrictMath >>>>>>>>> - 1ulp, according to our test. >>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both >>>>>>>> direction, I >>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>> generated by JIT compilers: >>>>>>>> >>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#si >>>>>>>> n >>>>>>>> % >>>>>>>> 28 >>>>>>>> do >>>>>>>> u >>>>>>>> ble%29 >>>>>>>> >>>>>>> That interpretation of the spec is not quite right. For the Math >>>>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>>>> closest to the exact result must be returned. For the methods >>>>>>> with a >>>>>>> 1 ulp error bound, either of the floating-point result >>>>>>> bracketing the true result can be returned, subject to the >>>>>>> monotonicity constraints of the specification of the particular method. >>>>>>> >>>>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>> I was thinking about using existing >>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add >>>>>>>> additional versions of functions which accept intrinsic ID >>>>>>>> instead of methodHandle. >>>>>>>> >>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>> >>>>>>>>> Also the performance gain ~4x is with >>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>> LIBM code and compilers use FDLIB? >>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>> >>>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>> boost to the StrictMath methods that have been ported. >>>>>>> >>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>> testing. >>>>>>> For example, part of patch says >>>>>>> >>>>>>> # For sin >>>>>>> >>>>>>> +// This means that the main path is actually only taken for >>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>> >>>>>>> # For cos >>>>>>> >>>>>>> +// This means that the main path is actually only taken for >>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>> >>>>>>> If nothing else, there are no tests at around those boundary >>>>>>> values, which is unacceptable. There should also be some tests >>>>>>> of values of interest to the algorithm in question. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> -Joe >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>> questions and give more data if needed. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>> log() changes did not have flags. >>>>>>>>>> >>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>> >>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>> Hi Vivek, >>>>>>>>> >>>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>>> file bugs and fixed them after FC. >>>>>>>>> >>>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>>> the only thing holding it from push. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>> Hi all >>>>>>>>>>> >>>>>>>>>>> I would like to contribute a patch which optimizes >>>>>>>>>>> Math.sin() and >>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>>> implementation. >>>>>>>>>>> >>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>>>> >>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>> >>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>> >>>>>>>>>>> Bug-id: >>>>>>>>>>> >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>> webrev: >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>> >>>>>>>>>>> Thanks and regards, >>>>>>>>>>> >>>>>>>>>>> Vivek >>>>>>>>>>> > From vladimir.kozlov at oracle.com Thu Jan 7 00:35:43 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Jan 2016 16:35:43 -0800 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568CF8F5.5090202@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> Message-ID: <568DB2DF.4010305@oracle.com> Nope. Too much unrelated changes. If you want to go this road - file separate RFE to change phase argument type of Identity() and Value(). And why use PhaseValue and not PhaseGVN as in Ideal()? So I agree to do your change in IfNode::Identity(). But as separate fix after general change. Thanks, Vladimir On 1/6/16 3:22 AM, Tobias Hartmann wrote: > Hi, > > I had an off-thread discussion with Roland and we came to the conclusion that all proposed fixes essentially work around the fact that we are unable to determine if Identity is called from GVN or IGVN. As Roland pointed out, we would probably miss to adapt such a fix if we ever get the ability to check for GVN/IGVN. > > Here is a more robust solution not depending on any worklist ordering assumptions and not causing unexpected side effects: > Since Node::Identity(PhaseTransform* phase) is always called with either PhaseGVN or PhaseIterGVN, we can change the argument to type PhaseValues* and can therefore simply use phase->is_IterGVN() to determine if we were called from GVN or IGVN. This could also be useful for other changes. Of course, this introduces an additional virtual call but we are already calling phase->is_IterGVN() at many other places in the code. In the future, these calls could be replaced by a field access (as Vladimir suggested in the RFR for 8139771). > > http://cr.openjdk.java.net/~thartmann/8136469/webrev.05/ > > What do you think? > > Thanks, > Tobias > > > On 18.09.2015 11:57, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8136469 >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.00/ >> >> Problem: >> When creating a pre-sized StringBuilder, C2's string concatenation optimization sometimes fails to optimize the chain (see [1]). The problem is that the initial size of the StringBuilder depends on a static final boolean that is initialized to true at runtime. Therefore the string concatenation control flow chain [2] contains an IfNode with a ConI (1) as input instead of the expected BoolNode and StringConcat::validate_control_flow() silently bails out. >> >> Solution: >> I changed the implementation to skip dead tests as they would be removed by IGVN later anyway. I added an assert to make sure we don't bail out silently if the input of the IfNode is not a bool. I also had to change validate_mem_flow() to handle dead ifs. Further, the assert in line 825 is unnecessary because we execute the same check in as_If(). >> >> Testing: >> - New test (TestPresizedStringBuilder) >> - JPRT >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/secure/attachment/53220/TestPresizedStringBuilder.java >> [2] https://bugs.openjdk.java.net/secure/attachment/53218/graph.png >> From vladimir.kozlov at oracle.com Thu Jan 7 00:54:07 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Jan 2016 16:54:07 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569F23FC@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569E1902@ORSMSX106.amr.corp.intel.com> <5684A5B8.7070407@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569F23FC@ORSMSX106.amr.corp.intel.com> Message-ID: <568DB72F.6010408@oracle.com> On 1/6/16 4:31 PM, Deshpande, Vivek R wrote: > HI Vladimir, > > Yes, the macroAssembler_x86_libm.cpp file is getting large, I could look into splitting it into two files macroAssembler_libm_x86_64.cpp and macroAssembler_libm_x86_32.cpp. Please let me know if that sounds good to you. Yes, if we keep separate code we should split the file (and adjust make files). > > The 64 bit code takes advantage of additional general purpose registers and 64 bit integer arithmetic and so we have two different versions for 32 bit and 64 bit. Okay, this is valid argument. Even so we may use push/pop on 32-bit to preserve registers. > > Regarding the FPU usage in cos/sin, we talked with the LIBM algorithm experts and they came back with the following: > "It would not be easy to remove FPU x87 instructions from libm_sincos_huge and libm_reduced_pi04l, they are designed with using extended precision from FPU in mind. The performance for 32bit implementation for these that do not use x87 instructions may not be optimal. These two are only used for very large input arguments." I don't buy this argument. Do they mean that 64-bit code, which does not use FPU, produces less precise result for very large input arguments" ? Very large input arguments is very rare case, I think. Should we care about its performance? Note, 32-bit performance become less and less important. Okay, for now lets split the file. Late we can try to simplify/combine/factor out the code. Thanks, Vladimir > > Thank you. > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, December 30, 2015 7:49 PM > To: Deshpande, Vivek R; Joseph D. Darcy > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > Hi Vivek, > > Why 32-bit code is so different from 64-bit code? You only use it if sse2 is available so XMM registers are present. Why to use FPU if you have SSE? > > 32-bit: > > 582 movsd(Address(rsp, 8), xmm0); > 583 fld_d(Address(rsp, 8)); > 584 movsd(Address(rsp, 16), xmm6); > 585 fld_d(Address(rsp, 16)); > 586 fmula(1); > > 64-bit: > > 295 mulsd(xmm0, xmm2); > > It is concerned to all LIBM 32-bit intrinsics. > > The main concern is that macroAssembler_x86_libm.cpp file become too large and it would be nice if 32-bit and 64-bit reuse the same code. > > Thanks, > Vladimir > > On 12/24/15 6:10 PM, Deshpande, Vivek R wrote: >> HI Vladimir >> >> I have updated the libm sin cos intrinsics for x86 for hotspot. >> The updated webrev for the same is at this location for your review. >> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/web >> rev.00/ >> Could you please review it. >> >> Regards, >> Vivek >> >> >> -----Original Message----- >> From: Deshpande, Vivek R >> Sent: Tuesday, December 22, 2015 5:42 PM >> To: 'Joseph D. Darcy'; Vladimir Kozlov >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> HI All >> >> I have uploaded the patch for sin and cos tests with input and allowed outputs at this location for your review. >> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev. >> 00/ Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 >> Thank you. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] >> Sent: Friday, December 04, 2015 4:50 PM >> To: Deshpande, Vivek R; Vladimir Kozlov >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hi Vivek, >> >> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >>> Hi >>> >>> Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. >>> Let me know your thoughts. >> >> As a rough test of another sin/cos implementation, StrictMath.{sin, >> cos} can be used a reference with the following caveat: there isn't an >> indication of which why the error is in a StrictMath result. Let me >> given an example, if >> >> StrictMath.sin(x) => y >> >> then one of the following should be true >> >> Math.sin(x) => y >> Math.sin(x) => Math.nextUp(y) >> Math.sin(x) => Math.nextDown(y) >> >> That is, Math.sin(x) should either be the same as StrictMath.sin(x) OR >> equal to one of the floating-point numbers adjacent to that result. Of >> these three options, only two area allowed by the accuracy >> requirements of the StrictMath.sin specification. However, since >> StrictMath.sin doesn't give an indication of which way its error went >> (if it rounded up or down), there is no indication without additional >> work which of >> nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't buggy). >> >> HTH, >> >> -Joe >> >> >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: joe darcy [mailto:joe.darcy at oracle.com] >>> Sent: Thursday, December 03, 2015 1:29 PM >>> To: Vladimir Kozlov; Deshpande, Vivek R >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> Hello, >>> >>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>>> Vivek, >>>> >>>> I think Joe is asking you to write these tests as hotspot regression >>>> test in hotspot/test/compiler. >>> Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. >>> >>> Thanks, >>> >>> -Joe >>> >>>> Vladimir >>>> >>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>>> Hi Joe >>>>> >>>>> It would be great if you would please share the additional tests >>>>> with us. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> -----Original Message----- >>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>> Sent: Thursday, December 03, 2015 1:17 PM >>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> I think it is unwise for this large of an implementation change to >>>>> be pushed with no tests targeting the specifics of the new implementation. >>>>> >>>>> The worst-case tests in the jdk repo are the mathematical worst >>>>> cases for floating-point approximations, in other words the cases >>>>> were the exact mathematical answer is closes to half-way between >>>>> two representation floating-point numbers. Passing such tests is >>>>> necessary but not sufficient condition for a new implementation. >>>>> >>>>> Chers, >>>>> >>>>> -Joe >>>>> >>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>>> Okay, looks reasonable to me. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> This is the link for the updated webrev with latest hotspot >>>>>>> source as base for your review. >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Deshpande, Vivek R >>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Hi Vladimir >>>>>>> >>>>>>> This is the link for the updated webrev for your review. >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>>> To: Deshpande, Vivek R; joe darcy >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Please send link to new webrev on cr server. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> Please find the webrev with your suggested updates attached with >>>>>>>> the mail. >>>>>>>> We will update it in the jbs entry soon. >>>>>>>> Please let me know if it needs further changes. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Deshpande, Vivek R >>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> HI Vladimir, Joe >>>>>>>> >>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>>> have mentioned. It passed those tests. >>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>>> The performance gain is 3.2x over base jdk, that is over current >>>>>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>>>>> >>>>>>>> Could I get those tests around the boundary values. Would >>>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>>> If yes, then it has passed those boundary cases. >>>>>>>> >>>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>>> for libm and send out the webrev soon. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Just getting added to the thread.. >>>>>>>> >>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>>> Thank you, for explanation, Vivek. >>>>>>>>> >>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>>>>> Hotspot tests. >>>>>>>>> >>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>>> Hi Vladimir >>>>>>>>>> >>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>>>>> switch between FDLIBM and LIBM. >>>>>>>>>> >>>>>>>>>> Quick explanation: >>>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>>> should be = 0.19457293629570216 >>>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>>> 0.1945729362957022 >>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>>> library result is between the above two values and Exact >>>>>>>>>> result would be pretty close to it. >>>>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>>>> StrictMath >>>>>>>>>> - 1ulp, according to our test. >>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both >>>>>>>>> direction, I >>>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>>> generated by JIT compilers: >>>>>>>>> >>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#si >>>>>>>>> n >>>>>>>>> % >>>>>>>>> 28 >>>>>>>>> do >>>>>>>>> u >>>>>>>>> ble%29 >>>>>>>>> >>>>>>>> That interpretation of the spec is not quite right. For the Math >>>>>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>>>>> closest to the exact result must be returned. For the methods >>>>>>>> with a >>>>>>>> 1 ulp error bound, either of the floating-point result >>>>>>>> bracketing the true result can be returned, subject to the >>>>>>>> monotonicity constraints of the specification of the particular method. >>>>>>>> >>>>>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>> I was thinking about using existing >>>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add >>>>>>>>> additional versions of functions which accept intrinsic ID >>>>>>>>> instead of methodHandle. >>>>>>>>> >>>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>>> >>>>>>>>>> Also the performance gain ~4x is with >>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>>> LIBM code and compilers use FDLIB? >>>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>>> >>>>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>>> boost to the StrictMath methods that have been ported. >>>>>>>> >>>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>>> testing. >>>>>>>> For example, part of patch says >>>>>>>> >>>>>>>> # For sin >>>>>>>> >>>>>>>> +// This means that the main path is actually only taken for >>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>> >>>>>>>> # For cos >>>>>>>> >>>>>>>> +// This means that the main path is actually only taken for >>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>> >>>>>>>> If nothing else, there are no tests at around those boundary >>>>>>>> values, which is unacceptable. There should also be some tests >>>>>>>> of values of interest to the algorithm in question. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> -Joe >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>>> questions and give more data if needed. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Vivek >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>>> the math lib >>>>>>>>>> >>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>>> log() changes did not have flags. >>>>>>>>>>> >>>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>>> >>>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>>> Hi Vivek, >>>>>>>>>> >>>>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>>>> file bugs and fixed them after FC. >>>>>>>>>> >>>>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>>>> the only thing holding it from push. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>>> Hi all >>>>>>>>>>>> >>>>>>>>>>>> I would like to contribute a patch which optimizes >>>>>>>>>>>> Math.sin() and >>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>>>> implementation. >>>>>>>>>>>> >>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>>>>> >>>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>>> >>>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>>> >>>>>>>>>>>> Bug-id: >>>>>>>>>>>> >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>>> webrev: >>>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>>> >>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>> >>>>>>>>>>>> Vivek >>>>>>>>>>>> >> From thomas.schatzl at oracle.com Thu Jan 7 09:11:05 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 07 Jan 2016 10:11:05 +0100 Subject: RFR(s): 8144573: TLABWasteIncrement=max_jint fires an assert on SPARC for non-G1 GC mode In-Reply-To: <568DA84B.9050309@oracle.com> References: <568C6049.5020400@oracle.com> <6D69BB31-A1F4-44A8-8CED-CF166CB2EB46@oracle.com> <568DA84B.9050309@oracle.com> Message-ID: <1452157865.2611.2.camel@oracle.com> Hi Sangheon, On Wed, 2016-01-06 at 15:50 -0800, sangheon wrote: > Hi Igor, > > Thank you for reviewing this. > > On 01/05/2016 08:29 PM, Igor Veresov wrote: > > I?m not sure we care a lot about tiny bits of performance in the > > this instance? But, in case use wanted to keep the original code > > for the simm13 case you could check the range of the constant and > > still emit the code that was there before. It also seems suboptimal > > to do set64 in MacroAssembler::tlab_refill() on all paths - the > > result of the original add in the delay slot doesn?t seem to be > > used if we jump to discard_tlab, right? > You are right. > If the branch is taken, original add in the delay slot is not used. > > The reason of always calling 'set64' was to keep its behavior. i.e. > same > order of doing something before branch within delay slot. But as you > said, it is less tighter code. > > > So, may be you could do something like: > > > > brx(Assembler::lessEqual, false, Assembler::pt, discard_tlab); > > if > > (is_simm13(ThreadLocalAllocBuffer::refill_waste_limit_increment())) > > { > > delayed()->add(t2, > > ThreadLocalAllocBuffer::refill_waste_limit_increment(), t2); > > } else { > > delayed()->nop(); > > set64(ThreadLocalAllocBuffer::refill_waste_limit_increment(), > > t3, G0); > > add(t2, t3, t2); > > } > Okay, checking its value first seems good idea. > > > > > Similarly, tighter code can be emitted for the interpreter in > > templateTable_sparc.cpp. > Okay, done. > > Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.01 looks good. Could you move the "// increment waste limit to prevent getting stuck on this slow path" above the if-clause in both cases and remove the other mentions of that to make the comments in both macroAssembler and templateTable uniform? I do not need another review for the comment change. Thanks, Thomas From roland.westrelin at oracle.com Thu Jan 7 09:29:07 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 7 Jan 2016 10:29:07 +0100 Subject: Request for Reviews (S): JDK-8003585 strength reduce or eliminate range checks for power-of-two sized arrays In-Reply-To: References: <440F2280-4B25-4AE6-A4F6-DDD4EB529636@oracle.com> <52FC129D.7040409@oracle.com> <52FE6A08.20400@oracle.com> <52FE7313.3060404@oracle.com> <530209A8.1020501@oracle.com> <38EE6922-0B9C-49A6-B54D-E78BA0EFECB1@oracle.com> <8232A81B-6B78-4F61-A8EC-1A3DF3938648@oracle.com> Message-ID: <70FBA4CF-CF05-4232-AFEC-202E93BFA930@oracle.com> Can I get a review for this? Roland. > On Oct 5, 2015, at 12:51 PM, Roland Westrelin wrote: > > Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8003585/webrev.01/ > > Roland. > >> On Oct 2, 2015, at 3:30 PM, Roland Westrelin wrote: >> >> Hi Chris, >> >>> Thanks for picking it up! It mostly looks good to me. (Not a Reviewer) >> >> Thanks for looking at this again. >> >>> What I really needed with my earlier webrev was some instructions as to what test to write -- since the Java corelibs can come across this optimization a lot (e.g. HashMap), I didn't have a good idea of what kind of test really needs to be written. >>> >>> A couple of issues with this webrev: >>> >>> 1. In subnode.cpp, line 1346: >>> >>> 1344 } else if (_test._test == BoolTest::lt && >>> 1345 cmp2->Opcode() == Op_AddI && >>> 1346 cmp2->in(2)->find_int_con(1)) { >>> 1347 bound = cmp2->in(1); >>> 1348 } >>> >>> I think it should be >>> cmp2->in(2)->find_int_con(0) == 1 >>> instead, because the value passed into this function is actually for a "fallback when no int constant is found". Passing the expected value (1) to it defeats the purpose. >> >> You?re right. Thanks for spotting that. >> >>> jint find_int_con(jint value_if_unknown) const { >>> const TypeInt* t = find_int_type(); >>> return (t != NULL && t->is_con()) ? t->get_con() : value_if_unknown; >>> } >>> >>> 2. Formattign nitpick: could you please trim the spaces before the new's on lines 1368, 1369 and 1387 >> >> Sure. >> >> I?ll send an updated webrev. >> >> Roland. >> >>> >>> Thanks, >>> Kris (OpenJDK username: krismo) >>> >>> On Wed, Sep 30, 2015 at 1:34 AM, Roland Westrelin wrote: >>> I?m picking that one up. Here is a new webrev: >>> >>> http://cr.openjdk.java.net/~roland/8003585/webrev.00/ >>> >>> The only change to c2 compared to the previous webrev is that ((x & m) u< m+1) is optimized the same way ((x & m) u<= m) is. Actually, I don?t think that C2 currently produces the ((x & m) u<= m) shape. The IfNode::fold_compares() logic produces the ((x & m) u< m+1) variant. I also added a test case to check the validity of the transformations and ran usual testing on the change. >>> >>> Roland. > From martin.doerr at sap.com Thu Jan 7 13:45:19 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Jan 2016 13:45:19 +0000 Subject: RFR(M): 8146612: C2: Precedence edges specification violated Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228AAB8@DEWDFEMB19C.global.corp.sap> Hi, some time ago, we found out, that C2 doesn't treat precedence edges as specified. The description of precedence edges in node.hpp says: "They are unordered and not duplicated; they have no embedded NULLs." Some functions in the current implementation violate this specification. I have fixed this in the following webrev: http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.00/ Please review. I will need a sponsor, please. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Thu Jan 7 13:55:10 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 7 Jan 2016 13:55:10 +0000 Subject: RFR(M): 8146613: PPC64: C2 does no longer respect int to long conversion for stub calls Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228AAD8@DEWDFEMB19C.global.corp.sap> Hi, I have created a webrev which introduces int to long conversion in PPC64 functions which are called by C2 runtime calls. I also added assertions to arraycopy stubs which are already called correctly. Background: 8086069 removed too much code. Only the native wrapper performed the conversion after this change. However, it is required to convert ints to longs for all C calls and some runtime calls. 8144466 reintroduced the platform variable CCallingConventionRequiresIntsAsLongs and the conversion for the runtime calls for which C2 calls shared C functions on PPC64. Some PPC64 runtime functions which rely on proper 64 bit arguments are still called without conversion. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8146613_ppc64_int2long/webrev.00/ It only touches PPC64 files. Please review and sponsor. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Thu Jan 7 14:52:27 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 7 Jan 2016 15:52:27 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568DAA2A.9070704@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> Message-ID: <568E7BAB.5070908@oracle.com> Hi Vladimir, On 07.01.2016 00:58, Vladimir Kozlov wrote: > Andrew is right. Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. I fixed this for the inflate and compress intrinsics. > GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. > StrInflatedCopyNode is not memory node. Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: // This class defines a projection of the memory state of a store conditional node. // These nodes return a value, but also update memory. But inflate does not return any value. Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ Related question: In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: char[int:>=0]:exact+any * which is equal to the type of the char load. I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: char[int:1]:NotNull:exact * Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? Thanks, Tobias > Thanks, > Vladimir > > On 1/6/16 5:34 AM, Andrew Haley wrote: >> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >> >>> The problem here is that C2 reorders memory instructions and moves >>> an array load before an array store. The MemBarCPUOrder is now used >>> (compiler internally) to prevent this. We do the same for normal >>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>> code is emitted. See also the comment in memnode.hpp: >>> >>> // Ordering within the same CPU. Used to order unsafe memory references >>> // inside the compiler when we lack alias info. Not needed "outside" the >>> // compiler because the CPU does all the ordering for us. >>> >>> "CPU does all the ordering for us" means that even with a relaxed >>> memory ordering, loads are never moved before dependent stores. >>> >>> Or did I misunderstand your question? >> >> No, I don't think so. I was just checking: I am very aware that >> HotSpot has presented those of use with relaxed memory order machines >> with some interesting gotchas over the years, that's all. I'm a bit >> surprised that C2 needs this barrier, given that there is a >> read-after-write dependency, but never mind. >> >> Thanks, >> >> Andrew. >> From sangheon.kim at oracle.com Thu Jan 7 15:36:47 2016 From: sangheon.kim at oracle.com (sangheon) Date: Thu, 7 Jan 2016 07:36:47 -0800 Subject: RFR(s): 8144573: TLABWasteIncrement=max_jint fires an assert on SPARC for non-G1 GC mode In-Reply-To: <1452157865.2611.2.camel@oracle.com> References: <568C6049.5020400@oracle.com> <6D69BB31-A1F4-44A8-8CED-CF166CB2EB46@oracle.com> <568DA84B.9050309@oracle.com> <1452157865.2611.2.camel@oracle.com> Message-ID: <568E860F.7010409@oracle.com> Hi Thomas, Thanks for looking at this. On 01/07/2016 01:11 AM, Thomas Schatzl wrote: > Hi Sangheon, > > On Wed, 2016-01-06 at 15:50 -0800, sangheon wrote: >> Hi Igor, >> >> Thank you for reviewing this. >> >> On 01/05/2016 08:29 PM, Igor Veresov wrote: >>> I?m not sure we care a lot about tiny bits of performance in the >>> this instance? But, in case use wanted to keep the original code >>> for the simm13 case you could check the range of the constant and >>> still emit the code that was there before. It also seems suboptimal >>> to do set64 in MacroAssembler::tlab_refill() on all paths - the >>> result of the original add in the delay slot doesn?t seem to be >>> used if we jump to discard_tlab, right? >> You are right. >> If the branch is taken, original add in the delay slot is not used. >> >> The reason of always calling 'set64' was to keep its behavior. i.e. >> same >> order of doing something before branch within delay slot. But as you >> said, it is less tighter code. >> >>> So, may be you could do something like: >>> >>> brx(Assembler::lessEqual, false, Assembler::pt, discard_tlab); >>> if >>> (is_simm13(ThreadLocalAllocBuffer::refill_waste_limit_increment())) >>> { >>> delayed()->add(t2, >>> ThreadLocalAllocBuffer::refill_waste_limit_increment(), t2); >>> } else { >>> delayed()->nop(); >>> set64(ThreadLocalAllocBuffer::refill_waste_limit_increment(), >>> t3, G0); >>> add(t2, t3, t2); >>> } >> Okay, checking its value first seems good idea. >> >>> Similarly, tighter code can be emitted for the interpreter in >>> templateTable_sparc.cpp. >> Okay, done. >> >> Webrev: http://cr.openjdk.java.net/~sangheki/8144573/webrev.01 > looks good. > > Could you move the "// increment waste limit to prevent getting stuck > on this slow path" above the if-clause in both cases and remove the > other mentions of that to make the comments in both macroAssembler and > templateTable uniform? Okay, I will fix them before pushing. Thanks, Sangheon > > I do not need another review for the comment change. > > Thanks, > Thomas > From thomas.schatzl at oracle.com Thu Jan 7 15:47:35 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 07 Jan 2016 16:47:35 +0100 Subject: RFR(s): 8144949: TestOptionsWithRanges -XX:NUMAInterleaveGranularity=2147483648 crashes VM In-Reply-To: <56709DD0.80808@oracle.com> References: <56709DD0.80808@oracle.com> Message-ID: <1452181655.2611.39.camel@oracle.com> Hi, On Tue, 2015-12-15 at 15:10 -0800, sangheon wrote: > I think the constraint function can be removed with maximum range of > 2G/8192G. These are the maximum available memory on Windows and > smaller > values can be used but I wanted to avoid adding artificial limit. > With > this limitation, current constraint function for overflow check is > not > needed. > And we need to check allocation failure. looks good. Thomas From tobias.hartmann at oracle.com Thu Jan 7 18:29:08 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 7 Jan 2016 19:29:08 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568DB2DF.4010305@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> <568DB2DF.4010305@oracle.com> Message-ID: <568EAE74.6020507@oracle.com> Hi Vladimir, On 07.01.2016 01:35, Vladimir Kozlov wrote: > Nope. Too much unrelated changes. If you want to go this road - file separate RFE to change phase argument type of Identity() and Value(). Okay, I agree. I filed JDK-8146629 [1]. > And why use PhaseValue and not PhaseGVN as in Ideal()? Right, we can use PhaseGVN. > So I agree to do your change in IfNode::Identity(). But as separate fix after general change. Here is the updated webrev based on JDK-8146629: http://cr.openjdk.java.net/~thartmann/8136469/webrev.06/ Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8146629 > Thanks, > Vladimir > > On 1/6/16 3:22 AM, Tobias Hartmann wrote: >> Hi, >> >> I had an off-thread discussion with Roland and we came to the conclusion that all proposed fixes essentially work around the fact that we are unable to determine if Identity is called from GVN or IGVN. As Roland pointed out, we would probably miss to adapt such a fix if we ever get the ability to check for GVN/IGVN. >> >> Here is a more robust solution not depending on any worklist ordering assumptions and not causing unexpected side effects: >> Since Node::Identity(PhaseTransform* phase) is always called with either PhaseGVN or PhaseIterGVN, we can change the argument to type PhaseValues* and can therefore simply use phase->is_IterGVN() to determine if we were called from GVN or IGVN. This could also be useful for other changes. Of course, this introduces an additional virtual call but we are already calling phase->is_IterGVN() at many other places in the code. In the future, these calls could be replaced by a field access (as Vladimir suggested in the RFR for 8139771). >> >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.05/ >> >> What do you think? >> >> Thanks, >> Tobias >> >> >> On 18.09.2015 11:57, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8136469 >>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.00/ >>> >>> Problem: >>> When creating a pre-sized StringBuilder, C2's string concatenation optimization sometimes fails to optimize the chain (see [1]). The problem is that the initial size of the StringBuilder depends on a static final boolean that is initialized to true at runtime. Therefore the string concatenation control flow chain [2] contains an IfNode with a ConI (1) as input instead of the expected BoolNode and StringConcat::validate_control_flow() silently bails out. >>> >>> Solution: >>> I changed the implementation to skip dead tests as they would be removed by IGVN later anyway. I added an assert to make sure we don't bail out silently if the input of the IfNode is not a bool. I also had to change validate_mem_flow() to handle dead ifs. Further, the assert in line 825 is unnecessary because we execute the same check in as_If(). >>> >>> Testing: >>> - New test (TestPresizedStringBuilder) >>> - JPRT >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/secure/attachment/53220/TestPresizedStringBuilder.java >>> [2] https://bugs.openjdk.java.net/secure/attachment/53218/graph.png >>> From tobias.hartmann at oracle.com Thu Jan 7 18:51:12 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 7 Jan 2016 19:51:12 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value Message-ID: <568EB3A0.3040909@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8146629 http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ Currently, there is no way to determine in Node::Identity() and Node::Value() if we were called from GVN or IGVN but sometimes we would like to do optimizations based on this information (for example, see discussion in RFR for JDK-8136469 [1]). I changed the arguments of Node::Identity() and Node::Value() from PhaseTransform* to PhaseGVN*. Like this, we can simply call PhaseValues::is_IterGVN() from both methods. Thanks, Tobias [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html From vladimir.kozlov at oracle.com Thu Jan 7 19:08:54 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Jan 2016 11:08:54 -0800 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <568EB3A0.3040909@oracle.com> References: <568EB3A0.3040909@oracle.com> Message-ID: <568EB7C6.5030701@oracle.com> Perfect. Thanks, Vladimir On 1/7/16 10:51 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8146629 > http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ > > Currently, there is no way to determine in Node::Identity() and Node::Value() if we were called from GVN or IGVN but sometimes we would like to do optimizations based on this information (for example, see discussion in RFR for JDK-8136469 [1]). I changed the arguments of Node::Identity() and Node::Value() from PhaseTransform* to PhaseGVN*. Like this, we can simply call PhaseValues::is_IterGVN() from both methods. > > Thanks, > Tobias > > [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html > From vladimir.kozlov at oracle.com Thu Jan 7 19:24:54 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Jan 2016 11:24:54 -0800 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568EAE74.6020507@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> <568DB2DF.4010305@oracle.com> <568EAE74.6020507@oracle.com> Message-ID: <568EBB86.1060108@oracle.com> On 1/7/16 10:29 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 07.01.2016 01:35, Vladimir Kozlov wrote: >> Nope. Too much unrelated changes. If you want to go this road - file separate RFE to change phase argument type of Identity() and Value(). > > Okay, I agree. I filed JDK-8146629 [1]. > >> And why use PhaseValue and not PhaseGVN as in Ideal()? > > Right, we can use PhaseGVN. > >> So I agree to do your change in IfNode::Identity(). But as separate fix after general change. > > Here is the updated webrev based on JDK-8146629: > http://cr.openjdk.java.net/~thartmann/8136469/webrev.06/ So for IGVN we wait until dead branch is removed and only one IfProj node left before we do this Identity optimization. And for GVN (Parse phase) we don't wait because during this phase we don't remove nodes. The comment should say something about GVN/Parse phase to understand !phase->is_IterGVN() condition. Thanks, Vladimir > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8146629 > > >> Thanks, >> Vladimir >> >> On 1/6/16 3:22 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> I had an off-thread discussion with Roland and we came to the conclusion that all proposed fixes essentially work around the fact that we are unable to determine if Identity is called from GVN or IGVN. As Roland pointed out, we would probably miss to adapt such a fix if we ever get the ability to check for GVN/IGVN. >>> >>> Here is a more robust solution not depending on any worklist ordering assumptions and not causing unexpected side effects: >>> Since Node::Identity(PhaseTransform* phase) is always called with either PhaseGVN or PhaseIterGVN, we can change the argument to type PhaseValues* and can therefore simply use phase->is_IterGVN() to determine if we were called from GVN or IGVN. This could also be useful for other changes. Of course, this introduces an additional virtual call but we are already calling phase->is_IterGVN() at many other places in the code. In the future, these calls could be replaced by a field access (as Vladimir suggested in the RFR for 8139771). >>> >>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.05/ >>> >>> What do you think? >>> >>> Thanks, >>> Tobias >>> >>> >>> On 18.09.2015 11:57, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8136469 >>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.00/ >>>> >>>> Problem: >>>> When creating a pre-sized StringBuilder, C2's string concatenation optimization sometimes fails to optimize the chain (see [1]). The problem is that the initial size of the StringBuilder depends on a static final boolean that is initialized to true at runtime. Therefore the string concatenation control flow chain [2] contains an IfNode with a ConI (1) as input instead of the expected BoolNode and StringConcat::validate_control_flow() silently bails out. >>>> >>>> Solution: >>>> I changed the implementation to skip dead tests as they would be removed by IGVN later anyway. I added an assert to make sure we don't bail out silently if the input of the IfNode is not a bool. I also had to change validate_mem_flow() to handle dead ifs. Further, the assert in line 825 is unnecessary because we execute the same check in as_If(). >>>> >>>> Testing: >>>> - New test (TestPresizedStringBuilder) >>>> - JPRT >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://bugs.openjdk.java.net/secure/attachment/53220/TestPresizedStringBuilder.java >>>> [2] https://bugs.openjdk.java.net/secure/attachment/53218/graph.png >>>> From vladimir.kozlov at oracle.com Thu Jan 7 20:49:32 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Jan 2016 12:49:32 -0800 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568E7BAB.5070908@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> Message-ID: <568ECF5C.6090407@oracle.com> On 1/7/16 6:52 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 07.01.2016 00:58, Vladimir Kozlov wrote: >> Andrew is right. > > Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. Right. It was the root of this bug, see below. > > I fixed this for the inflate and compress intrinsics. > >> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >> StrInflatedCopyNode is not memory node. > > Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: I did not get the question. Is it before your webrev.01 change? Or even with the change? > > // This class defines a projection of the memory state of a store conditional node. > // These nodes return a value, but also update memory. > > But inflate does not return any value. Hmm, according to bottom type inflate produce memory: StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } So it really does not need SCMemProjNode. Sorry about that. So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. Instead of SCMemProjNode we should have to change the idx of your dst_type: set_memory(str, dst_type); And you should rollback part of changes in escape.cpp and macro.cpp. > > Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: > http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. > > Related question: > In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: > char[int:>=0]:exact+any * > > which is equal to the type of the char load. Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? Should we also be more careful in inflate_string_slow()? Is it used? > > I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: > char[int:1]:NotNull:exact * > > Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? It is indeed strange. What memory type of LoadUS? It could be bug. Thanks, Vladimir > > Thanks, > Tobias > > >> Thanks, >> Vladimir >> >> On 1/6/16 5:34 AM, Andrew Haley wrote: >>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>> >>>> The problem here is that C2 reorders memory instructions and moves >>>> an array load before an array store. The MemBarCPUOrder is now used >>>> (compiler internally) to prevent this. We do the same for normal >>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>> code is emitted. See also the comment in memnode.hpp: >>>> >>>> // Ordering within the same CPU. Used to order unsafe memory references >>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>> // compiler because the CPU does all the ordering for us. >>>> >>>> "CPU does all the ordering for us" means that even with a relaxed >>>> memory ordering, loads are never moved before dependent stores. >>>> >>>> Or did I misunderstand your question? >>> >>> No, I don't think so. I was just checking: I am very aware that >>> HotSpot has presented those of use with relaxed memory order machines >>> with some interesting gotchas over the years, that's all. I'm a bit >>> surprised that C2 needs this barrier, given that there is a >>> read-after-write dependency, but never mind. >>> >>> Thanks, >>> >>> Andrew. >>> From vladimir.kozlov at oracle.com Thu Jan 7 21:21:36 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Jan 2016 13:21:36 -0800 Subject: RFR(M): 8146613: PPC64: C2 does no longer respect int to long conversion for stub calls In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228AAD8@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116567228AAD8@DEWDFEMB19C.global.corp.sap> Message-ID: <568ED6E0.7030907@oracle.com> Looks fine to me. Thanks, Vladimir On 1/7/16 5:55 AM, Doerr, Martin wrote: > Hi, > > I have created a webrev which introduces int to long conversion in PPC64 functions which are called by C2 runtime calls. > > I also added assertions to arraycopy stubs which are already called correctly. > > Background: > 8086069 removed too much code. Only the native wrapper performed the conversion after this change. However, it is > required to convert ints to longs for all C calls and some runtime calls. > 8144466 reintroduced the platform variable CCallingConventionRequiresIntsAsLongs and the conversion for the runtime > calls for which C2 calls shared C functions on PPC64. > Some PPC64 runtime functions which rely on proper 64 bit arguments are still called without conversion. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8146613_ppc64_int2long/webrev.00/ > > It only touches PPC64 files. > > Please review and sponsor. > > Best regards, > > Martin > From kishor.kharbas at intel.com Thu Jan 7 22:05:49 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Thu, 7 Jan 2016 22:05:49 +0000 Subject: RFR (M): 8146581: Minor corrections to the patch submitted for earlier bug id - 8143925 Message-ID: Hi Vladimir, For the CounterMode.crypt() patch (https://bugs.openjdk.java.net/browse/JDK-8143925) which was committed earlier, I found a minor correction in the checks performed for AES support in vm_version_x86.cpp. Basically, a condition check was missing in a else if() block, and some old code was left by accident. I also took this opportunity to add some more comments to make the stub code more readable/maintainable. Bug - https://bugs.openjdk.java.net/browse/JDK-8146581 Patch - http://cr.openjdk.java.net/~vdeshpande/8146581/webrev.00/ Regards Kishor Kharbas -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Jan 7 22:07:34 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Jan 2016 14:07:34 -0800 Subject: RFR(M): 8146612: C2: Precedence edges specification violated In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228AAB8@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116567228AAB8@DEWDFEMB19C.global.corp.sap> Message-ID: <568EE1A6.3050202@oracle.com> // Avoid spec violation: multiple prec edge. I think should be: // Avoid spec violation: duplicated prec edge. Should we add assert to rm_prec()?: assert(j >= _cnt, "not a precedence edge"); Also we may need to check that input index is < _max in set_prec() and rm_prec(). Next access will be outside _in array if j == _max-1 (in rm_prec()): _in[i] = NULL; // NULL out last element unless we guarantee that there is always NULL at the end. Which I don't see because set_prec() may set the last prec edge to not NULL. Please factor out similar code (search for last non-NULL prec edge) in del_req(), del_req_ordered() and rm_prec() into separate method. Thanks, Vladimir On 1/7/16 5:45 AM, Doerr, Martin wrote: > Hi, > > some time ago, we found out, that C2 doesn?t treat precedence edges as specified. > > The description of precedence edges in node.hpp says: > > "They are unordered and not duplicated; they have no embedded NULLs." > > Some functions in the current implementation violate this specification. > > I have fixed this in the following webrev: > > http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.00/ > > Please review. I will need a sponsor, please. > > Best regards, > > Martin > From vladimir.kozlov at oracle.com Thu Jan 7 22:10:08 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Jan 2016 14:10:08 -0800 Subject: RFR (M): 8146581: Minor corrections to the patch submitted for earlier bug id - 8143925 In-Reply-To: References: Message-ID: <568EE240.7000607@oracle.com> Looks good. Thanks, Vladimir On 1/7/16 2:05 PM, Kharbas, Kishor wrote: > Hi Vladimir, > > For the CounterMode.crypt() patch (https://bugs.openjdk.java.net/browse/JDK-8143925) which was committed earlier, I > found a minor correction in the checks performed for AES support in vm_version_x86.cpp. > > Basically, a condition check was missing in a else if() block, and some old code was left by accident. > > I also took this opportunity to add some more comments to make the stub code more readable/maintainable. > > Bug - https://bugs.openjdk.java.net/browse/JDK-8146581 > > Patch - http://cr.openjdk.java.net/~vdeshpande/8146581/webrev.00/ > > Regards > > Kishor Kharbas > From kishor.kharbas at intel.com Thu Jan 7 22:12:52 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Thu, 7 Jan 2016 22:12:52 +0000 Subject: RFR (M): 8146581: Minor corrections to the patch submitted for earlier bug id - 8143925 In-Reply-To: <568EE240.7000607@oracle.com> References: <568EE240.7000607@oracle.com> Message-ID: Wow! That was quick.. thanks :) -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, January 07, 2016 2:10 PM To: Kharbas, Kishor; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M): 8146581: Minor corrections to the patch submitted for earlier bug id - 8143925 Looks good. Thanks, Vladimir On 1/7/16 2:05 PM, Kharbas, Kishor wrote: > Hi Vladimir, > > For the CounterMode.crypt() patch > (https://bugs.openjdk.java.net/browse/JDK-8143925) which was committed earlier, I found a minor correction in the checks performed for AES support in vm_version_x86.cpp. > > Basically, a condition check was missing in a else if() block, and some old code was left by accident. > > I also took this opportunity to add some more comments to make the stub code more readable/maintainable. > > Bug - https://bugs.openjdk.java.net/browse/JDK-8146581 > > Patch - http://cr.openjdk.java.net/~vdeshpande/8146581/webrev.00/ > > Regards > > Kishor Kharbas > From rednaxelafx at gmail.com Thu Jan 7 22:40:48 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Thu, 7 Jan 2016 14:40:48 -0800 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <568EB7C6.5030701@oracle.com> References: <568EB3A0.3040909@oracle.com> <568EB7C6.5030701@oracle.com> Message-ID: That's interesting. Out of curiosity, would adding a "bool can_reshape" argument to Identity() and Value() do the job, just like the way Ideal() does it? If so, what was the trade off that led to this change as opposed to adding an argument? Thanks, Kris On Thursday, January 7, 2016, Vladimir Kozlov wrote: > Perfect. > > Thanks, > Vladimir > > On 1/7/16 10:51 AM, Tobias Hartmann wrote: > >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8146629 >> http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ >> >> Currently, there is no way to determine in Node::Identity() and >> Node::Value() if we were called from GVN or IGVN but sometimes we would >> like to do optimizations based on this information (for example, see >> discussion in RFR for JDK-8136469 [1]). I changed the arguments of >> Node::Identity() and Node::Value() from PhaseTransform* to PhaseGVN*. Like >> this, we can simply call PhaseValues::is_IterGVN() from both methods. >> >> Thanks, >> Tobias >> >> [1] >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Jan 7 22:51:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 7 Jan 2016 14:51:44 -0800 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: References: <568EB3A0.3040909@oracle.com> <568EB7C6.5030701@oracle.com> Message-ID: <568EEC00.2070909@oracle.com> PhaseValues has additional data which we may access. As Tobias said we may consider using additional field in PhaseGVN to check for IGVN instead of using virtual method is_IterGVN() and can_reshape parameter. It would be next step. Thanks, Vladimir On 1/7/16 2:40 PM, Krystal Mok wrote: > That's interesting. Out of curiosity, would adding a "bool can_reshape" argument to Identity() and Value() do the job, > just like the way Ideal() does it? > If so, what was the trade off that led to this change as opposed to adding an argument? > > Thanks, > Kris > > On Thursday, January 7, 2016, Vladimir Kozlov > wrote: > > Perfect. > > Thanks, > Vladimir > > On 1/7/16 10:51 AM, Tobias Hartmann wrote: > > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8146629 > http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ > > Currently, there is no way to determine in Node::Identity() and Node::Value() if we were called from GVN or IGVN > but sometimes we would like to do optimizations based on this information (for example, see discussion in RFR > for JDK-8136469 [1]). I changed the arguments of Node::Identity() and Node::Value() from PhaseTransform* to > PhaseGVN*. Like this, we can simply call PhaseValues::is_IterGVN() from both methods. > > Thanks, > Tobias > > [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html > From rednaxelafx at gmail.com Thu Jan 7 23:20:09 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Thu, 7 Jan 2016 15:20:09 -0800 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <568EEC00.2070909@oracle.com> References: <568EB3A0.3040909@oracle.com> <568EB7C6.5030701@oracle.com> <568EEC00.2070909@oracle.com> Message-ID: I see. Thanks a lot for the explanation, Vladimir! Best regards, Kris On Thu, Jan 7, 2016 at 2:51 PM, Vladimir Kozlov wrote: > PhaseValues has additional data which we may access. > As Tobias said we may consider using additional field in PhaseGVN to check > for IGVN instead of using virtual method is_IterGVN() and can_reshape > parameter. It would be next step. > > Thanks, > Vladimir > > On 1/7/16 2:40 PM, Krystal Mok wrote: > >> That's interesting. Out of curiosity, would adding a "bool can_reshape" >> argument to Identity() and Value() do the job, >> just like the way Ideal() does it? >> If so, what was the trade off that led to this change as opposed to >> adding an argument? >> >> Thanks, >> Kris >> >> On Thursday, January 7, 2016, Vladimir Kozlov > > wrote: >> >> Perfect. >> >> Thanks, >> Vladimir >> >> On 1/7/16 10:51 AM, Tobias Hartmann wrote: >> >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8146629 >> http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ >> >> Currently, there is no way to determine in Node::Identity() and >> Node::Value() if we were called from GVN or IGVN >> but sometimes we would like to do optimizations based on this >> information (for example, see discussion in RFR >> for JDK-8136469 [1]). I changed the arguments of Node::Identity() >> and Node::Value() from PhaseTransform* to >> PhaseGVN*. Like this, we can simply call >> PhaseValues::is_IterGVN() from both methods. >> >> Thanks, >> Tobias >> >> [1] >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Fri Jan 8 07:35:54 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 8 Jan 2016 07:35:54 +0000 Subject: RFR(M): 8146613: PPC64: C2 does no longer respect int to long conversion for stub calls In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228AAD8@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116567228AAD8@DEWDFEMB19C.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41F0F5AC@DEWDFEMB12A.global.corp.sap> Hi Martin, thanks for doing these fixes, they look good. I especially like the trick of doing an int cast that won't be optimized by the C compiler in the montgomery intrinsics. You should be able to push this yourselves as you are now Committer and it only touches ppc files. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Donnerstag, 7. Januar 2016 14:55 > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR(M): 8146613: PPC64: C2 does no longer respect int to long > conversion for stub calls > > Hi, > > > > I have created a webrev which introduces int to long conversion in PPC64 > functions which are called by C2 runtime calls. > > I also added assertions to arraycopy stubs which are already called correctly. > > > > Background: > 8086069 removed too much code. Only the native wrapper performed the > conversion after this change. However, it is required to convert ints to longs > for all C calls and some runtime calls. > 8144466 reintroduced the platform variable > CCallingConventionRequiresIntsAsLongs and the conversion for the runtime > calls for which C2 calls shared C functions on PPC64. > Some PPC64 runtime functions which rely on proper 64 bit arguments are still > called without conversion. > > Webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8146613_ppc64_int2long/webrev.00/ > > > > It only touches PPC64 files. > > > > Please review and sponsor. > > > > Best regards, > > Martin > > From tobias.hartmann at oracle.com Fri Jan 8 08:03:47 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Jan 2016 09:03:47 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <568EB7C6.5030701@oracle.com> References: <568EB3A0.3040909@oracle.com> <568EB7C6.5030701@oracle.com> Message-ID: <568F6D63.9050603@oracle.com> Thanks for the review, Vladimir. Best, Tobias On 07.01.2016 20:08, Vladimir Kozlov wrote: > Perfect. > > Thanks, > Vladimir > > On 1/7/16 10:51 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8146629 >> http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ >> >> Currently, there is no way to determine in Node::Identity() and Node::Value() if we were called from GVN or IGVN but sometimes we would like to do optimizations based on this information (for example, see discussion in RFR for JDK-8136469 [1]). I changed the arguments of Node::Identity() and Node::Value() from PhaseTransform* to PhaseGVN*. Like this, we can simply call PhaseValues::is_IterGVN() from both methods. >> >> Thanks, >> Tobias >> >> [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html >> From tobias.hartmann at oracle.com Fri Jan 8 09:05:24 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Jan 2016 10:05:24 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568EBB86.1060108@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> <568DB2DF.4010305@oracle.com> <568EAE74.6020507@oracle.com> <568EBB86.1060108@oracle.com> Message-ID: <568F7BD4.1070000@oracle.com> Hi Vladimir, On 07.01.2016 20:24, Vladimir Kozlov wrote: > On 1/7/16 10:29 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 07.01.2016 01:35, Vladimir Kozlov wrote: >>> Nope. Too much unrelated changes. If you want to go this road - file separate RFE to change phase argument type of Identity() and Value(). >> >> Okay, I agree. I filed JDK-8146629 [1]. >> >>> And why use PhaseValue and not PhaseGVN as in Ideal()? >> >> Right, we can use PhaseGVN. >> >>> So I agree to do your change in IfNode::Identity(). But as separate fix after general change. >> >> Here is the updated webrev based on JDK-8146629: >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.06/ > > So for IGVN we wait until dead branch is removed and only one IfProj node left before we do this Identity optimization. > And for GVN (Parse phase) we don't wait because during this phase we don't remove nodes. > The comment should say something about GVN/Parse phase to understand !phase->is_IterGVN() condition. Right, I updated the comment. Does this look good to you? http://cr.openjdk.java.net/~thartmann/8136469/webrev.07 Thanks, Tobias > > Thanks, > Vladimir > >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8146629 >> >> >>> Thanks, >>> Vladimir >>> >>> On 1/6/16 3:22 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> I had an off-thread discussion with Roland and we came to the conclusion that all proposed fixes essentially work around the fact that we are unable to determine if Identity is called from GVN or IGVN. As Roland pointed out, we would probably miss to adapt such a fix if we ever get the ability to check for GVN/IGVN. >>>> >>>> Here is a more robust solution not depending on any worklist ordering assumptions and not causing unexpected side effects: >>>> Since Node::Identity(PhaseTransform* phase) is always called with either PhaseGVN or PhaseIterGVN, we can change the argument to type PhaseValues* and can therefore simply use phase->is_IterGVN() to determine if we were called from GVN or IGVN. This could also be useful for other changes. Of course, this introduces an additional virtual call but we are already calling phase->is_IterGVN() at many other places in the code. In the future, these calls could be replaced by a field access (as Vladimir suggested in the RFR for 8139771). >>>> >>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.05/ >>>> >>>> What do you think? >>>> >>>> Thanks, >>>> Tobias >>>> >>>> >>>> On 18.09.2015 11:57, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> please review the following patch. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8136469 >>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.00/ >>>>> >>>>> Problem: >>>>> When creating a pre-sized StringBuilder, C2's string concatenation optimization sometimes fails to optimize the chain (see [1]). The problem is that the initial size of the StringBuilder depends on a static final boolean that is initialized to true at runtime. Therefore the string concatenation control flow chain [2] contains an IfNode with a ConI (1) as input instead of the expected BoolNode and StringConcat::validate_control_flow() silently bails out. >>>>> >>>>> Solution: >>>>> I changed the implementation to skip dead tests as they would be removed by IGVN later anyway. I added an assert to make sure we don't bail out silently if the input of the IfNode is not a bool. I also had to change validate_mem_flow() to handle dead ifs. Further, the assert in line 825 is unnecessary because we execute the same check in as_If(). >>>>> >>>>> Testing: >>>>> - New test (TestPresizedStringBuilder) >>>>> - JPRT >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> [1] https://bugs.openjdk.java.net/secure/attachment/53220/TestPresizedStringBuilder.java >>>>> [2] https://bugs.openjdk.java.net/secure/attachment/53218/graph.png >>>>> From roland.westrelin at oracle.com Fri Jan 8 09:33:32 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 8 Jan 2016 10:33:32 +0100 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> <568D7FA1.4040707@oracle.com> Message-ID: <1BC8C0B0-E8EF-4D6B-B9EE-D374E2FC3E04@oracle.com> > Does checkIndex match on it? If so, is there a reason to proceed with intrinsifying checkIndex? I expect it would in some cases but not all. The pattern matching needs profiling to tell the branches that would trigger an exception are never taken, then only can the tests be folded and made to look like a range check for the next optimization passes. Profiling can be polluted or not mature enough. The intrinsic assumes the exception path are never taken and doesn't rely on profiling (then if the check does fail we recompile and don't use the intrinsic). We take the use of the checkIndex API as a hint that the checks are not expected to fail. Also, for the pattern matching to work, in i <0 || i >= length the compiler needs to know enough on the range of values taken by length to be able to fold. Again we see checkIndex as an indication that length is positive and if we can't prove it we compile a predicate to verify that it is so we can safely use an unsigned compare. Again we take the use of checkIndex as a hint that the length argument is positive. Roland. > > On Wednesday, January 6, 2016, Vladimir Kozlov wrote: > Note, we already have range check pattern matching code in C2 (thanks to Roland): > > https://bugs.openjdk.java.net/browse/JDK-8137168 > > Vladimir > > On 1/6/16 12:39 PM, Vitaly Davidovich wrote: > I don't think there's a need to write out 20 different ways to do a > range check -- I think nobody would expect all 20 to be covered by the > optimizer. Some of those variations may not map cleanly to > Object::checkIndex either, nor is there any guarantee that people will > update all their existing range checks (or even know about) to use > Object::checkIndex -- some code will be left unoptimized no matter what. > > But my point is the same as Andrew's, I think; instead of making > checkIndex an intrinsic, simply add a pattern match against that exact > bytecode shape (perhaps with basic canonicalization) and then still > encourage people to use Object::checkIndex. This is better than > intrinsic (modulo profile pollution) since any other code that happens > to use same pattern will match as well, and not require an update to use > checkIndex. Then, if someone comes to this list with an unoptimized > example with a different bytecode shape and has a convincing argument > that the code shape is "common", you guys can consider pattern matching > that as well. > > On Wed, Jan 6, 2016 at 2:50 PM, John Rose > wrote: > > > > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich > wrote: > > > > better canonicalization > > That's our first and most important tactic. (Actually inlining is.) > > But the various idioms for checkIndex do not canonicalize easily. In > this case the correct trade-off is not to invest more time and > research and code into stronger canonicalization. > > We do have canonicalization of if-expressions. It's just that in > this case strengthening it to cover range checks reliably is harder > than the reasonable alternative. > > ? John > > PS. I am tempted to write out a list of 20 different ways to code a > range check but will leave that as a exercise. > > > > > -- > Sent from my phone From roland.westrelin at oracle.com Fri Jan 8 10:32:00 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 8 Jan 2016 11:32:00 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <568EB3A0.3040909@oracle.com> References: <568EB3A0.3040909@oracle.com> Message-ID: <4A5ECDA0-6F08-4AA1-AEBC-202042F6707E@oracle.com> > http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ That looks good to me. Roland. From roland.westrelin at oracle.com Fri Jan 8 10:33:56 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 8 Jan 2016 11:33:56 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568F7BD4.1070000@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> <568DB2DF.4010305@oracle.com> <568EAE74.6020507@oracle.com> <568EBB86.1060108@oracle.com> <568F7BD4.1070000@oracle.com> Message-ID: <63EB93BD-3E8A-4681-AF8F-0A005E61BE1C@oracle.com> > http://cr.openjdk.java.net/~thartmann/8136469/webrev.07 That looks good to me. Roland. From tobias.hartmann at oracle.com Fri Jan 8 10:35:56 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Jan 2016 11:35:56 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <4A5ECDA0-6F08-4AA1-AEBC-202042F6707E@oracle.com> References: <568EB3A0.3040909@oracle.com> <4A5ECDA0-6F08-4AA1-AEBC-202042F6707E@oracle.com> Message-ID: <568F910C.6030701@oracle.com> Thanks, Roland. Best, Tobias On 08.01.2016 11:32, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ > > That looks good to me. > > Roland. > From tobias.hartmann at oracle.com Fri Jan 8 10:36:13 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Jan 2016 11:36:13 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <63EB93BD-3E8A-4681-AF8F-0A005E61BE1C@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> <568DB2DF.4010305@oracle.com> <568EAE74.6020507@oracle.com> <568EBB86.1060108@oracle.com> <568F7BD4.1070000@oracle.com> <63EB93BD-3E8A-4681-AF8F-0A005E61BE1C@oracle.com> Message-ID: <568F911D.1050203@oracle.com> Thanks, Roland. Best, Tobias On 08.01.2016 11:33, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.07 > > That looks good to me. > > Roland. > From tobias.hartmann at oracle.com Fri Jan 8 10:37:55 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 8 Jan 2016 11:37:55 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568ECF5C.6090407@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> Message-ID: <568F9183.9070909@oracle.com> On 07.01.2016 21:49, Vladimir Kozlov wrote: > On 1/7/16 6:52 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>> Andrew is right. >> >> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. > > Right. It was the root of this bug, see below. > >> >> I fixed this for the inflate and compress intrinsics. >> >>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>> StrInflatedCopyNode is not memory node. >> >> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: > > I did not get the question. Is it before your webrev.01 change? Or even with the change? I meant with webrev.01 but you answered my question below. >> // This class defines a projection of the memory state of a store conditional node. >> // These nodes return a value, but also update memory. >> >> But inflate does not return any value. > > Hmm, according to bottom type inflate produce memory: > > StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } > > So it really does not need SCMemProjNode. Sorry about that. > So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. Exactly. > Instead of SCMemProjNode we should have to change the idx of your dst_type: > > set_memory(str, dst_type); Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. > And you should rollback part of changes in escape.cpp and macro.cpp. Okay, I'll to that. >> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ > > In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. Okay, should we then use BOTTOM for both the input and output type? >> Related question: >> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >> char[int:>=0]:exact+any * >> >> which is equal to the type of the char load. > > Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. > Should we also be more careful in inflate_string_slow()? Is it used? No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >> char[int:1]:NotNull:exact * >> >> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? > > It is indeed strange. What memory type of LoadUS? It could be bug. LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. I will look into this again and try to understand what happens. Thanks, Tobias >>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>> >>>>> The problem here is that C2 reorders memory instructions and moves >>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>> (compiler internally) to prevent this. We do the same for normal >>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>> code is emitted. See also the comment in memnode.hpp: >>>>> >>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>> // compiler because the CPU does all the ordering for us. >>>>> >>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>> memory ordering, loads are never moved before dependent stores. >>>>> >>>>> Or did I misunderstand your question? >>>> >>>> No, I don't think so. I was just checking: I am very aware that >>>> HotSpot has presented those of use with relaxed memory order machines >>>> with some interesting gotchas over the years, that's all. I'm a bit >>>> surprised that C2 needs this barrier, given that there is a >>>> read-after-write dependency, but never mind. >>>> >>>> Thanks, >>>> >>>> Andrew. >>>> From martin.doerr at sap.com Fri Jan 8 11:06:42 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 8 Jan 2016 11:06:42 +0000 Subject: RFR(M): 8146612: C2: Precedence edges specification violated In-Reply-To: <568EE1A6.3050202@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB418116567228AAB8@DEWDFEMB19C.global.corp.sap> <568EE1A6.3050202@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228ACE8@DEWDFEMB19C.global.corp.sap> Hi Vladimir, thanks for the review. I have changed the comments, added assertions and factored out the common functionality of del_req(), del_req_ordered() and rm_prec() into a new private function close_prec_gap_at(). That makes sense. About your concern about accessing outside of _in array in rm_prec(): Please note that i is decremented before it gets used: "j == _max-1", "i" will be set to "_max", but decremented in "_in[--i]" Anyway, I have replaced this code by close_prec_gap_at(), so it doesn't matter anymore. The new webrev is here: http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.01/ Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 7. Januar 2016 23:08 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8146612: C2: Precedence edges specification violated // Avoid spec violation: multiple prec edge. I think should be: // Avoid spec violation: duplicated prec edge. Should we add assert to rm_prec()?: assert(j >= _cnt, "not a precedence edge"); Also we may need to check that input index is < _max in set_prec() and rm_prec(). Next access will be outside _in array if j == _max-1 (in rm_prec()): _in[i] = NULL; // NULL out last element unless we guarantee that there is always NULL at the end. Which I don't see because set_prec() may set the last prec edge to not NULL. Please factor out similar code (search for last non-NULL prec edge) in del_req(), del_req_ordered() and rm_prec() into separate method. Thanks, Vladimir On 1/7/16 5:45 AM, Doerr, Martin wrote: > Hi, > > some time ago, we found out, that C2 doesn't treat precedence edges as specified. > > The description of precedence edges in node.hpp says: > > "They are unordered and not duplicated; they have no embedded NULLs." > > Some functions in the current implementation violate this specification. > > I have fixed this in the following webrev: > > http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.00/ > > Please review. I will need a sponsor, please. > > Best regards, > > Martin > From zoltan.majo at oracle.com Fri Jan 8 11:06:58 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 8 Jan 2016 12:06:58 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB Message-ID: <568F9852.4090806@oracle.com> Hi, please review the patch for 8086053. https://bugs.openjdk.java.net/browse/JDK-8086053 Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These inconsistencies lead to newly allocated regions not being filled with zeros. Solution: Address the following: - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. Webrev: http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ Testing: - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; - JPRT; - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. Thank you and best regards, Zoltan From edward.nevill at gmail.com Fri Jan 8 11:46:57 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 08 Jan 2016 11:46:57 +0000 Subject: RFR: 8146678: aarch64: assertion failure: call instruction in an infinite loop Message-ID: <1452253617.19405.10.camel@mint> Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8146678/webrev/ JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8146678 This fixes an assertion in Relocation::pd_set_call_destination assert(addr() != x, "call instruction in an infinite loop"); which triggers following 8146286: aarch64: guarantee failures with large code cache sizes on jtreg test java/lang/invoke/LFCaching/LFMultiThreadCachingTest.java The reason is that this change deliberately generates BL to self to avoid BL going out of range. The fix is to remove the assertion as it is no longer valid. Thanks, Ed. From vitalyd at gmail.com Fri Jan 8 12:38:18 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 8 Jan 2016 07:38:18 -0500 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: <1BC8C0B0-E8EF-4D6B-B9EE-D374E2FC3E04@oracle.com> References: <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> <568D7FA1.4040707@oracle.com> <1BC8C0B0-E8EF-4D6B-B9EE-D374E2FC3E04@oracle.com> Message-ID: Roland, thanks for elaborating; a few comments inline ... On Friday, January 8, 2016, Roland Westrelin wrote: > > Does checkIndex match on it? If so, is there a reason to proceed with > intrinsifying checkIndex? > > I expect it would in some cases but not all. > > The pattern matching needs profiling to tell the branches that would > trigger an exception are never taken, then only can the tests be folded and > made to look like a range check for the next optimization passes. Profiling > can be polluted or not mature enough. The intrinsic assumes the exception > path are never taken and doesn't rely on profiling (then if the check does > fail we recompile and don't use the intrinsic). We take the use of the > checkIndex API as a hint that the checks are not expected to fail. As a general comment, would it make sense to assume exceptional paths are not taken in most Java code? That is, for code optimization purposes it's probably a reasonable assumption. It seems like having an exceptional path is already a hint that it's not expected to fail; most Java devs know not to use exceptions for expected control flow. > > Also, for the pattern matching to work, in i <0 || i >= length the > compiler needs to know enough on the range of values taken by length to be > able to fold. Again we see checkIndex as an indication that length is > positive and if we can't prove it we compile a predicate to verify that it > is so we can safely use an unsigned compare. Again we take the use of > checkIndex as a hint that the length argument is positive. Could bytecode shape just like checkIndex be treated as same hint? Are there cases where something looks like checkIndex but really isn't? > > Roland. > > > > > On Wednesday, January 6, 2016, Vladimir Kozlov < > vladimir.kozlov at oracle.com > wrote: > > Note, we already have range check pattern matching code in C2 (thanks to > Roland): > > > > https://bugs.openjdk.java.net/browse/JDK-8137168 > > > > Vladimir > > > > On 1/6/16 12:39 PM, Vitaly Davidovich wrote: > > I don't think there's a need to write out 20 different ways to do a > > range check -- I think nobody would expect all 20 to be covered by the > > optimizer. Some of those variations may not map cleanly to > > Object::checkIndex either, nor is there any guarantee that people will > > update all their existing range checks (or even know about) to use > > Object::checkIndex -- some code will be left unoptimized no matter what. > > > > But my point is the same as Andrew's, I think; instead of making > > checkIndex an intrinsic, simply add a pattern match against that exact > > bytecode shape (perhaps with basic canonicalization) and then still > > encourage people to use Object::checkIndex. This is better than > > intrinsic (modulo profile pollution) since any other code that happens > > to use same pattern will match as well, and not require an update to use > > checkIndex. Then, if someone comes to this list with an unoptimized > > example with a different bytecode shape and has a convincing argument > > that the code shape is "common", you guys can consider pattern matching > > that as well. > > > > On Wed, Jan 6, 2016 at 2:50 PM, John Rose > > >> wrote: > > > > > > > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich > > >> wrote: > > > > > > better canonicalization > > > > That's our first and most important tactic. (Actually inlining is.) > > > > But the various idioms for checkIndex do not canonicalize easily. In > > this case the correct trade-off is not to invest more time and > > research and code into stronger canonicalization. > > > > We do have canonicalization of if-expressions. It's just that in > > this case strengthening it to cover range checks reliably is harder > > than the reasonable alternative. > > > > ? John > > > > PS. I am tempted to write out a list of 20 different ways to code a > > range check but will leave that as a exercise. > > > > > > > > > > -- > > Sent from my phone > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.v.raghavan at oracle.com Fri Jan 8 17:13:40 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 8 Jan 2016 09:13:40 -0800 (PST) Subject: FW: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: Hello, Please review the following revised patch for JDK-6378256 - http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ This revised webrev got following changes - 1) A minor, better optimized code with return 0 at initial stage (instead of continuing to 'slowCase' path), for special/rare null reference input! (as per documentation, test results confirmed it is safe to 'return 0' for null reference input, for System.identityHashCode) 2) Added similar Object.hashCode, System.identityHashCode optimization support in sharedRuntime_x86_64.cpp. Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. Thanks, Rahul > -----Original Message----- > From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan> Cc: hotspot-compiler-dev at openjdk.java.net > > > webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > > Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again would be > nice. > Shouldn't we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? > > Roland. > -----Original Message----- > From: Rahul Raghavan > Sent: Wednesday, December 09, 2015 2:43 PM > To: hotspot-compiler-dev at openjdk.java.net > > Hello, > > Please review the following patch for JDK-6378256. > > webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > > Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . > Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times > slower). > Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). > > sample unit test: > public class Jdk6378256Test > { > public static void main(String[] args) > { > Object obj = new Object(); > long time = System.nanoTime(); > for(int i = 0 ; i < 1000000 ; i++) > System.identityHashCode(obj); //compare to obj.hashCode(); > System.out.println ("Result = " + (System.nanoTime() - time)); > } > } > > Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. > (looks in the header for the hashCode before calling into the VM). > Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. > So also added required additional null check for System.identityHashCode case. > > Testing: > - successful JPRT run (-testset hotspot). > - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). > (with -client / -XX:TieredStopAtLevel=1 etc. options). > - Added 'noreg-perf' label for this performance bug. > Manual testing done and confirmed expected performance values for unit tests with fix. > > Thanks, > Rahul From vladimir.kozlov at oracle.com Fri Jan 8 19:34:09 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Jan 2016 11:34:09 -0800 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <568F7BD4.1070000@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> <568DB2DF.4010305@oracle.com> <568EAE74.6020507@oracle.com> <568EBB86.1060108@oracle.com> <568F7BD4.1070000@oracle.com> Message-ID: <56900F31.4060409@oracle.com> Very good. Thanks, Vladimir On 1/8/16 1:05 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 07.01.2016 20:24, Vladimir Kozlov wrote: >> On 1/7/16 10:29 AM, Tobias Hartmann wrote: >>> Hi Vladimir, >>> >>> On 07.01.2016 01:35, Vladimir Kozlov wrote: >>>> Nope. Too much unrelated changes. If you want to go this road - file separate RFE to change phase argument type of Identity() and Value(). >>> >>> Okay, I agree. I filed JDK-8146629 [1]. >>> >>>> And why use PhaseValue and not PhaseGVN as in Ideal()? >>> >>> Right, we can use PhaseGVN. >>> >>>> So I agree to do your change in IfNode::Identity(). But as separate fix after general change. >>> >>> Here is the updated webrev based on JDK-8146629: >>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.06/ >> >> So for IGVN we wait until dead branch is removed and only one IfProj node left before we do this Identity optimization. >> And for GVN (Parse phase) we don't wait because during this phase we don't remove nodes. >> The comment should say something about GVN/Parse phase to understand !phase->is_IterGVN() condition. > > Right, I updated the comment. Does this look good to you? > http://cr.openjdk.java.net/~thartmann/8136469/webrev.07 > > Thanks, > Tobias > >> >> Thanks, >> Vladimir >> >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8146629 >>> >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/6/16 3:22 AM, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> I had an off-thread discussion with Roland and we came to the conclusion that all proposed fixes essentially work around the fact that we are unable to determine if Identity is called from GVN or IGVN. As Roland pointed out, we would probably miss to adapt such a fix if we ever get the ability to check for GVN/IGVN. >>>>> >>>>> Here is a more robust solution not depending on any worklist ordering assumptions and not causing unexpected side effects: >>>>> Since Node::Identity(PhaseTransform* phase) is always called with either PhaseGVN or PhaseIterGVN, we can change the argument to type PhaseValues* and can therefore simply use phase->is_IterGVN() to determine if we were called from GVN or IGVN. This could also be useful for other changes. Of course, this introduces an additional virtual call but we are already calling phase->is_IterGVN() at many other places in the code. In the future, these calls could be replaced by a field access (as Vladimir suggested in the RFR for 8139771). >>>>> >>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.05/ >>>>> >>>>> What do you think? >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>> >>>>> On 18.09.2015 11:57, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> please review the following patch. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8136469 >>>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.00/ >>>>>> >>>>>> Problem: >>>>>> When creating a pre-sized StringBuilder, C2's string concatenation optimization sometimes fails to optimize the chain (see [1]). The problem is that the initial size of the StringBuilder depends on a static final boolean that is initialized to true at runtime. Therefore the string concatenation control flow chain [2] contains an IfNode with a ConI (1) as input instead of the expected BoolNode and StringConcat::validate_control_flow() silently bails out. >>>>>> >>>>>> Solution: >>>>>> I changed the implementation to skip dead tests as they would be removed by IGVN later anyway. I added an assert to make sure we don't bail out silently if the input of the IfNode is not a bool. I also had to change validate_mem_flow() to handle dead ifs. Further, the assert in line 825 is unnecessary because we execute the same check in as_If(). >>>>>> >>>>>> Testing: >>>>>> - New test (TestPresizedStringBuilder) >>>>>> - JPRT >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/53220/TestPresizedStringBuilder.java >>>>>> [2] https://bugs.openjdk.java.net/secure/attachment/53218/graph.png >>>>>> From vladimir.kozlov at oracle.com Fri Jan 8 19:41:53 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Jan 2016 11:41:53 -0800 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <568F9183.9070909@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> Message-ID: <56901101.6050503@oracle.com> On 1/8/16 2:37 AM, Tobias Hartmann wrote: > > On 07.01.2016 21:49, Vladimir Kozlov wrote: >> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>> Hi Vladimir, >>> >>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>> Andrew is right. >>> >>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >> >> Right. It was the root of this bug, see below. >> >>> >>> I fixed this for the inflate and compress intrinsics. >>> >>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>> StrInflatedCopyNode is not memory node. >>> >>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >> >> I did not get the question. Is it before your webrev.01 change? Or even with the change? > > I meant with webrev.01 but you answered my question below. > >>> // This class defines a projection of the memory state of a store conditional node. >>> // These nodes return a value, but also update memory. >>> >>> But inflate does not return any value. >> >> Hmm, according to bottom type inflate produce memory: >> >> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >> >> So it really does not need SCMemProjNode. Sorry about that. >> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. > > Exactly. > >> Instead of SCMemProjNode we should have to change the idx of your dst_type: >> >> set_memory(str, dst_type); > > Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. > >> And you should rollback part of changes in escape.cpp and macro.cpp. > > Okay, I'll to that. > >>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >> >> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. > > Okay, should we then use BOTTOM for both the input and output type? Only input. Output type corresponds to dst array type which you set correctly now. > >>> Related question: >>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>> char[int:>=0]:exact+any * >>> >>> which is equal to the type of the char load. >> >> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? > > Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: > > // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) > // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) > // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) > // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) > // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) > // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) > > I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. > >> Should we also be more careful in inflate_string_slow()? Is it used? > > No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. > >>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>> char[int:1]:NotNull:exact * >>> >>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >> >> It is indeed strange. What memory type of LoadUS? It could be bug. > > LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. > > I will look into this again and try to understand what happens. It could that aryptr is pointer to array and load type is pointer to array's element. Thanks, Vladimir > > Thanks, > Tobias > >>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>> >>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>> >>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>> // compiler because the CPU does all the ordering for us. >>>>>> >>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>> memory ordering, loads are never moved before dependent stores. >>>>>> >>>>>> Or did I misunderstand your question? >>>>> >>>>> No, I don't think so. I was just checking: I am very aware that >>>>> HotSpot has presented those of use with relaxed memory order machines >>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>> surprised that C2 needs this barrier, given that there is a >>>>> read-after-write dependency, but never mind. >>>>> >>>>> Thanks, >>>>> >>>>> Andrew. >>>>> From vladimir.kozlov at oracle.com Fri Jan 8 19:46:33 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Jan 2016 11:46:33 -0800 Subject: RFR(M): 8146612: C2: Precedence edges specification violated In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228ACE8@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116567228AAB8@DEWDFEMB19C.global.corp.sap> <568EE1A6.3050202@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116567228ACE8@DEWDFEMB19C.global.corp.sap> Message-ID: <56901219.8090805@oracle.com> Very good. I will sponsor it. Thanks, Vladimir On 1/8/16 3:06 AM, Doerr, Martin wrote: > Hi Vladimir, > > thanks for the review. > > I have changed the comments, added assertions and factored out the common functionality of del_req(), del_req_ordered() and rm_prec() into a new private function close_prec_gap_at(). That makes sense. > > About your concern about accessing outside of _in array in rm_prec(): > Please note that i is decremented before it gets used: > "j == _max-1", "i" will be set to "_max", but decremented in "_in[--i]" > > Anyway, I have replaced this code by close_prec_gap_at(), so it doesn't matter anymore. > > The new webrev is here: > http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.01/ > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 7. Januar 2016 23:08 > To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8146612: C2: Precedence edges specification violated > > // Avoid spec violation: multiple prec edge. > > I think should be: > > // Avoid spec violation: duplicated prec edge. > > Should we add assert to rm_prec()?: > assert(j >= _cnt, "not a precedence edge"); > > Also we may need to check that input index is < _max in set_prec() and rm_prec(). > > Next access will be outside _in array if j == _max-1 (in rm_prec()): > > _in[i] = NULL; // NULL out last element > > unless we guarantee that there is always NULL at the end. Which I don't see because set_prec() may set the last prec > edge to not NULL. > > Please factor out similar code (search for last non-NULL prec edge) in del_req(), del_req_ordered() and rm_prec() into > separate method. > > Thanks, > Vladimir > > > On 1/7/16 5:45 AM, Doerr, Martin wrote: >> Hi, >> >> some time ago, we found out, that C2 doesn't treat precedence edges as specified. >> >> The description of precedence edges in node.hpp says: >> >> "They are unordered and not duplicated; they have no embedded NULLs." >> >> Some functions in the current implementation violate this specification. >> >> I have fixed this in the following webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.00/ >> >> Please review. I will need a sponsor, please. >> >> Best regards, >> >> Martin >> From vladimir.kozlov at oracle.com Fri Jan 8 20:46:59 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Jan 2016 12:46:59 -0800 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <568F9852.4090806@oracle.com> References: <568F9852.4090806@oracle.com> Message-ID: <56902043.1040409@oracle.com> Looks good to me. Thanks, Vladimir On 1/8/16 3:06 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8086053. > > https://bugs.openjdk.java.net/browse/JDK-8086053 > > Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB disabled, > the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. > Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These > inconsistencies lead to newly allocated regions not being filled with zeros. > > Solution: Address the following: > - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly allocated > TLAB is not initialized with zero. Add TLAB initialization code to C1. > - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB > allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ > > Testing: > - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; > - JPRT; > - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. > > Thank you and best regards, > > > Zoltan > From vivek.r.deshpande at intel.com Sat Jan 9 02:16:01 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Sat, 9 Jan 2016 02:16:01 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <568DB72F.6010408@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569E1902@ORSMSX106.amr.corp.intel.com> <5684A5B8.7070407@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569F23FC@ORSMSX106.amr.corp.intel.com> <568DB72F.6010408@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569F50C1@ORSMSX106.amr.corp.intel.com> Hi Vladimir, I have updated the patch with latest base source and split the macroAssembler_x86_libm.cpp file into two files for your review. The patch is at this location: http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/webrev.01/ 64 bit code does not have less precise result or lower performance, by without using FPU instructions. Thank you. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, January 06, 2016 4:54 PM To: Deshpande, Vivek R; Joseph D. Darcy Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib On 1/6/16 4:31 PM, Deshpande, Vivek R wrote: > HI Vladimir, > > Yes, the macroAssembler_x86_libm.cpp file is getting large, I could look into splitting it into two files macroAssembler_libm_x86_64.cpp and macroAssembler_libm_x86_32.cpp. Please let me know if that sounds good to you. Yes, if we keep separate code we should split the file (and adjust make files). > > The 64 bit code takes advantage of additional general purpose registers and 64 bit integer arithmetic and so we have two different versions for 32 bit and 64 bit. Okay, this is valid argument. Even so we may use push/pop on 32-bit to preserve registers. > > Regarding the FPU usage in cos/sin, we talked with the LIBM algorithm experts and they came back with the following: > "It would not be easy to remove FPU x87 instructions from libm_sincos_huge and libm_reduced_pi04l, they are designed with using extended precision from FPU in mind. The performance for 32bit implementation for these that do not use x87 instructions may not be optimal. These two are only used for very large input arguments." I don't buy this argument. Do they mean that 64-bit code, which does not use FPU, produces less precise result for very large input arguments" ? Very large input arguments is very rare case, I think. Should we care about its performance? Note, 32-bit performance become less and less important. Okay, for now lets split the file. Late we can try to simplify/combine/factor out the code. Thanks, Vladimir > > Thank you. > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, December 30, 2015 7:49 PM > To: Deshpande, Vivek R; Joseph D. Darcy > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math > lib > > Hi Vivek, > > Why 32-bit code is so different from 64-bit code? You only use it if sse2 is available so XMM registers are present. Why to use FPU if you have SSE? > > 32-bit: > > 582 movsd(Address(rsp, 8), xmm0); > 583 fld_d(Address(rsp, 8)); > 584 movsd(Address(rsp, 16), xmm6); > 585 fld_d(Address(rsp, 16)); > 586 fmula(1); > > 64-bit: > > 295 mulsd(xmm0, xmm2); > > It is concerned to all LIBM 32-bit intrinsics. > > The main concern is that macroAssembler_x86_libm.cpp file become too large and it would be nice if 32-bit and 64-bit reuse the same code. > > Thanks, > Vladimir > > On 12/24/15 6:10 PM, Deshpande, Vivek R wrote: >> HI Vladimir >> >> I have updated the libm sin cos intrinsics for x86 for hotspot. >> The updated webrev for the same is at this location for your review. >> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/we >> b >> rev.00/ >> Could you please review it. >> >> Regards, >> Vivek >> >> >> -----Original Message----- >> From: Deshpande, Vivek R >> Sent: Tuesday, December 22, 2015 5:42 PM >> To: 'Joseph D. Darcy'; Vladimir Kozlov >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> HI All >> >> I have uploaded the patch for sin and cos tests with input and allowed outputs at this location for your review. >> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev. >> 00/ Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 >> Thank you. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] >> Sent: Friday, December 04, 2015 4:50 PM >> To: Deshpande, Vivek R; Vladimir Kozlov >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hi Vivek, >> >> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >>> Hi >>> >>> Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. >>> Let me know your thoughts. >> >> As a rough test of another sin/cos implementation, StrictMath.{sin, >> cos} can be used a reference with the following caveat: there isn't >> an indication of which why the error is in a StrictMath result. Let >> me given an example, if >> >> StrictMath.sin(x) => y >> >> then one of the following should be true >> >> Math.sin(x) => y >> Math.sin(x) => Math.nextUp(y) >> Math.sin(x) => Math.nextDown(y) >> >> That is, Math.sin(x) should either be the same as StrictMath.sin(x) >> OR equal to one of the floating-point numbers adjacent to that >> result. Of these three options, only two area allowed by the accuracy >> requirements of the StrictMath.sin specification. However, since >> StrictMath.sin doesn't give an indication of which way its error went >> (if it rounded up or down), there is no indication without additional >> work which of >> nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't buggy). >> >> HTH, >> >> -Joe >> >> >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: joe darcy [mailto:joe.darcy at oracle.com] >>> Sent: Thursday, December 03, 2015 1:29 PM >>> To: Vladimir Kozlov; Deshpande, Vivek R >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> Hello, >>> >>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>>> Vivek, >>>> >>>> I think Joe is asking you to write these tests as hotspot >>>> regression test in hotspot/test/compiler. >>> Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. >>> >>> Thanks, >>> >>> -Joe >>> >>>> Vladimir >>>> >>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>>> Hi Joe >>>>> >>>>> It would be great if you would please share the additional tests >>>>> with us. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> -----Original Message----- >>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>> Sent: Thursday, December 03, 2015 1:17 PM >>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> I think it is unwise for this large of an implementation change to >>>>> be pushed with no tests targeting the specifics of the new implementation. >>>>> >>>>> The worst-case tests in the jdk repo are the mathematical worst >>>>> cases for floating-point approximations, in other words the cases >>>>> were the exact mathematical answer is closes to half-way between >>>>> two representation floating-point numbers. Passing such tests is >>>>> necessary but not sufficient condition for a new implementation. >>>>> >>>>> Chers, >>>>> >>>>> -Joe >>>>> >>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>>> Okay, looks reasonable to me. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> This is the link for the updated webrev with latest hotspot >>>>>>> source as base for your review. >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Deshpande, Vivek R >>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Hi Vladimir >>>>>>> >>>>>>> This is the link for the updated webrev for your review. >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>>> To: Deshpande, Vivek R; joe darcy >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Please send link to new webrev on cr server. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> Please find the webrev with your suggested updates attached >>>>>>>> with the mail. >>>>>>>> We will update it in the jbs entry soon. >>>>>>>> Please let me know if it needs further changes. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Deshpande, Vivek R >>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>> the math lib >>>>>>>> >>>>>>>> HI Vladimir, Joe >>>>>>>> >>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>>> have mentioned. It passed those tests. >>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>>> The performance gain is 3.2x over base jdk, that is over >>>>>>>> current fsin/fcos intrinsic. This gain is more realistic. >>>>>>>> >>>>>>>> Could I get those tests around the boundary values. Would >>>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>>> If yes, then it has passed those boundary cases. >>>>>>>> >>>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>>> for libm and send out the webrev soon. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>> the math lib >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Just getting added to the thread.. >>>>>>>> >>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>>> Thank you, for explanation, Vivek. >>>>>>>>> >>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition >>>>>>>>> to Hotspot tests. >>>>>>>>> >>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>>> Hi Vladimir >>>>>>>>>> >>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>>> StrictMath result and not exact result. So I added the flag >>>>>>>>>> to switch between FDLIBM and LIBM. >>>>>>>>>> >>>>>>>>>> Quick explanation: >>>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>>> should be = 0.19457293629570216 >>>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>>> 0.1945729362957022 >>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>>> library result is between the above two values and Exact >>>>>>>>>> result would be pretty close to it. >>>>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>>>> StrictMath >>>>>>>>>> - 1ulp, according to our test. >>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both >>>>>>>>> direction, I >>>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>>> generated by JIT compilers: >>>>>>>>> >>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#s >>>>>>>>> i >>>>>>>>> n >>>>>>>>> % >>>>>>>>> 28 >>>>>>>>> do >>>>>>>>> u >>>>>>>>> ble%29 >>>>>>>>> >>>>>>>> That interpretation of the spec is not quite right. For the >>>>>>>> Math methods with a 1/2 ulp error bound, the floating-point >>>>>>>> result closest to the exact result must be returned. For the >>>>>>>> methods with a >>>>>>>> 1 ulp error bound, either of the floating-point result >>>>>>>> bracketing the true result can be returned, subject to the >>>>>>>> monotonicity constraints of the specification of the particular method. >>>>>>>> >>>>>>>>>> I have done the experiments with >>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin and >>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dcos. >>>>>>>>>> With this option, the interpreter would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>> I was thinking about using existing >>>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add >>>>>>>>> additional versions of functions which accept intrinsic ID >>>>>>>>> instead of methodHandle. >>>>>>>>> >>>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>>> >>>>>>>>>> Also the performance gain ~4x is with >>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>>> LIBM code and compilers use FDLIB? >>>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>>> >>>>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>>> boost to the StrictMath methods that have been ported. >>>>>>>> >>>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>>> testing. >>>>>>>> For example, part of patch says >>>>>>>> >>>>>>>> # For sin >>>>>>>> >>>>>>>> +// This means that the main path is actually only taken for >>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>> >>>>>>>> # For cos >>>>>>>> >>>>>>>> +// This means that the main path is actually only taken for >>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>> >>>>>>>> If nothing else, there are no tests at around those boundary >>>>>>>> values, which is unacceptable. There should also be some tests >>>>>>>> of values of interest to the algorithm in question. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> -Joe >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>>> questions and give more data if needed. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Vivek >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>>> the math lib >>>>>>>>>> >>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>>> log() changes did not have flags. >>>>>>>>>>> >>>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>>> >>>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>>> Hi Vivek, >>>>>>>>>> >>>>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>>>> file bugs and fixed them after FC. >>>>>>>>>> >>>>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>>>> the only thing holding it from push. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>>> Hi all >>>>>>>>>>>> >>>>>>>>>>>> I would like to contribute a patch which optimizes >>>>>>>>>>>> Math.sin() and >>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>>>> implementation. >>>>>>>>>>>> >>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>>>>> >>>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>>> >>>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>>> >>>>>>>>>>>> Bug-id: >>>>>>>>>>>> >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>>> webrev: >>>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>>> >>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>> >>>>>>>>>>>> Vivek >>>>>>>>>>>> >> From vladimir.kozlov at oracle.com Sat Jan 9 06:43:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 8 Jan 2016 22:43:40 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569F50C1@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569E1902@ORSMSX106.amr.corp.intel.com> <5684A5B8.7070407@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569F23FC@ORSMSX106.amr.corp.intel.com> <568DB72F.6010408@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569F50C1@ORSMSX106.amr.corp.intel.com> Message-ID: <5690AC1C.2070908@oracle.com> Good. I sponsor it. Thanks, Vladimir On 1/8/16 6:16 PM, Deshpande, Vivek R wrote: > Hi Vladimir, > > I have updated the patch with latest base source and split the macroAssembler_x86_libm.cpp file into two files for your review. > The patch is at this location: > http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/webrev.01/ > > 64 bit code does not have less precise result or lower performance, by without using FPU instructions. > > Thank you. > Regards, > Vivek > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, January 06, 2016 4:54 PM > To: Deshpande, Vivek R; Joseph D. Darcy > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > On 1/6/16 4:31 PM, Deshpande, Vivek R wrote: >> HI Vladimir, >> >> Yes, the macroAssembler_x86_libm.cpp file is getting large, I could look into splitting it into two files macroAssembler_libm_x86_64.cpp and macroAssembler_libm_x86_32.cpp. Please let me know if that sounds good to you. > > Yes, if we keep separate code we should split the file (and adjust make files). > >> >> The 64 bit code takes advantage of additional general purpose registers and 64 bit integer arithmetic and so we have two different versions for 32 bit and 64 bit. > > Okay, this is valid argument. Even so we may use push/pop on 32-bit to preserve registers. > >> >> Regarding the FPU usage in cos/sin, we talked with the LIBM algorithm experts and they came back with the following: >> "It would not be easy to remove FPU x87 instructions from libm_sincos_huge and libm_reduced_pi04l, they are designed with using extended precision from FPU in mind. The performance for 32bit implementation for these that do not use x87 instructions may not be optimal. These two are only used for very large input arguments." > > I don't buy this argument. Do they mean that 64-bit code, which does not use FPU, produces less precise result for very large input arguments" ? > Very large input arguments is very rare case, I think. Should we care about its performance? > Note, 32-bit performance become less and less important. > > Okay, for now lets split the file. Late we can try to simplify/combine/factor out the code. > > Thanks, > Vladimir > > >> >> Thank you. >> Regards, >> Vivek >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, December 30, 2015 7:49 PM >> To: Deshpande, Vivek R; Joseph D. Darcy >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hi Vivek, >> >> Why 32-bit code is so different from 64-bit code? You only use it if sse2 is available so XMM registers are present. Why to use FPU if you have SSE? >> >> 32-bit: >> >> 582 movsd(Address(rsp, 8), xmm0); >> 583 fld_d(Address(rsp, 8)); >> 584 movsd(Address(rsp, 16), xmm6); >> 585 fld_d(Address(rsp, 16)); >> 586 fmula(1); >> >> 64-bit: >> >> 295 mulsd(xmm0, xmm2); >> >> It is concerned to all LIBM 32-bit intrinsics. >> >> The main concern is that macroAssembler_x86_libm.cpp file become too large and it would be nice if 32-bit and 64-bit reuse the same code. >> >> Thanks, >> Vladimir >> >> On 12/24/15 6:10 PM, Deshpande, Vivek R wrote: >>> HI Vladimir >>> >>> I have updated the libm sin cos intrinsics for x86 for hotspot. >>> The updated webrev for the same is at this location for your review. >>> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/hotspot/we >>> b >>> rev.00/ >>> Could you please review it. >>> >>> Regards, >>> Vivek >>> >>> >>> -----Original Message----- >>> From: Deshpande, Vivek R >>> Sent: Tuesday, December 22, 2015 5:42 PM >>> To: 'Joseph D. Darcy'; Vladimir Kozlov >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> HI All >>> >>> I have uploaded the patch for sin and cos tests with input and allowed outputs at this location for your review. >>> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev. >>> 00/ Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 >>> Thank you. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] >>> Sent: Friday, December 04, 2015 4:50 PM >>> To: Deshpande, Vivek R; Vladimir Kozlov >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> Hi Vivek, >>> >>> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >>>> Hi >>>> >>>> Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. >>>> Let me know your thoughts. >>> >>> As a rough test of another sin/cos implementation, StrictMath.{sin, >>> cos} can be used a reference with the following caveat: there isn't >>> an indication of which why the error is in a StrictMath result. Let >>> me given an example, if >>> >>> StrictMath.sin(x) => y >>> >>> then one of the following should be true >>> >>> Math.sin(x) => y >>> Math.sin(x) => Math.nextUp(y) >>> Math.sin(x) => Math.nextDown(y) >>> >>> That is, Math.sin(x) should either be the same as StrictMath.sin(x) >>> OR equal to one of the floating-point numbers adjacent to that >>> result. Of these three options, only two area allowed by the accuracy >>> requirements of the StrictMath.sin specification. However, since >>> StrictMath.sin doesn't give an indication of which way its error went >>> (if it rounded up or down), there is no indication without additional >>> work which of >>> nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't buggy). >>> >>> HTH, >>> >>> -Joe >>> >>> >>>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>> Sent: Thursday, December 03, 2015 1:29 PM >>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> Hello, >>>> >>>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>>>> Vivek, >>>>> >>>>> I think Joe is asking you to write these tests as hotspot >>>>> regression test in hotspot/test/compiler. >>>> Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. >>>> >>>> Thanks, >>>> >>>> -Joe >>>> >>>>> Vladimir >>>>> >>>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>>>> Hi Joe >>>>>> >>>>>> It would be great if you would please share the additional tests >>>>>> with us. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>> Sent: Thursday, December 03, 2015 1:17 PM >>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> I think it is unwise for this large of an implementation change to >>>>>> be pushed with no tests targeting the specifics of the new implementation. >>>>>> >>>>>> The worst-case tests in the jdk repo are the mathematical worst >>>>>> cases for floating-point approximations, in other words the cases >>>>>> were the exact mathematical answer is closes to half-way between >>>>>> two representation floating-point numbers. Passing such tests is >>>>>> necessary but not sufficient condition for a new implementation. >>>>>> >>>>>> Chers, >>>>>> >>>>>> -Joe >>>>>> >>>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>>>> Okay, looks reasonable to me. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev with latest hotspot >>>>>>>> source as base for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Deshpande, Vivek R >>>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>>>> To: Deshpande, Vivek R; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> Please send link to new webrev on cr server. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi Vladimir >>>>>>>>> >>>>>>>>> Please find the webrev with your suggested updates attached >>>>>>>>> with the mail. >>>>>>>>> We will update it in the jbs entry soon. >>>>>>>>> Please let me know if it needs further changes. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Deshpande, Vivek R >>>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> HI Vladimir, Joe >>>>>>>>> >>>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>>>> have mentioned. It passed those tests. >>>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>>>> The performance gain is 3.2x over base jdk, that is over >>>>>>>>> current fsin/fcos intrinsic. This gain is more realistic. >>>>>>>>> >>>>>>>>> Could I get those tests around the boundary values. Would >>>>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>>>> If yes, then it has passed those boundary cases. >>>>>>>>> >>>>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>>>> for libm and send out the webrev soon. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Just getting added to the thread.. >>>>>>>>> >>>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>>>> Thank you, for explanation, Vivek. >>>>>>>>>> >>>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition >>>>>>>>>> to Hotspot tests. >>>>>>>>>> >>>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>>>> Hi Vladimir >>>>>>>>>>> >>>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>>>> StrictMath result and not exact result. So I added the flag >>>>>>>>>>> to switch between FDLIBM and LIBM. >>>>>>>>>>> >>>>>>>>>>> Quick explanation: >>>>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>>>> should be = 0.19457293629570216 >>>>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>>>> 0.1945729362957022 >>>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>>>> library result is between the above two values and Exact >>>>>>>>>>> result would be pretty close to it. >>>>>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>>>>> StrictMath >>>>>>>>>>> - 1ulp, according to our test. >>>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both >>>>>>>>>> direction, I >>>>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>>>> generated by JIT compilers: >>>>>>>>>> >>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#s >>>>>>>>>> i >>>>>>>>>> n >>>>>>>>>> % >>>>>>>>>> 28 >>>>>>>>>> do >>>>>>>>>> u >>>>>>>>>> ble%29 >>>>>>>>>> >>>>>>>>> That interpretation of the spec is not quite right. For the >>>>>>>>> Math methods with a 1/2 ulp error bound, the floating-point >>>>>>>>> result closest to the exact result must be returned. For the >>>>>>>>> methods with a >>>>>>>>> 1 ulp error bound, either of the floating-point result >>>>>>>>> bracketing the true result can be returned, subject to the >>>>>>>>> monotonicity constraints of the specification of the particular method. >>>>>>>>> >>>>>>>>>>> I have done the experiments with >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin and >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dcos. >>>>>>>>>>> With this option, the interpreter would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>> I was thinking about using existing >>>>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add >>>>>>>>>> additional versions of functions which accept intrinsic ID >>>>>>>>>> instead of methodHandle. >>>>>>>>>> >>>>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>>>> >>>>>>>>>>> Also the performance gain ~4x is with >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>>>> LIBM code and compilers use FDLIB? >>>>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>>>> >>>>>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>>>> boost to the StrictMath methods that have been ported. >>>>>>>>> >>>>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>>>> testing. >>>>>>>>> For example, part of patch says >>>>>>>>> >>>>>>>>> # For sin >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> # For cos >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> If nothing else, there are no tests at around those boundary >>>>>>>>> values, which is unacceptable. There should also be some tests >>>>>>>>> of values of interest to the algorithm in question. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> -Joe >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>>>> questions and give more data if needed. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Vivek >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>>>> the math lib >>>>>>>>>>> >>>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>>>> log() changes did not have flags. >>>>>>>>>>>> >>>>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>>>> >>>>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>>>> Hi Vivek, >>>>>>>>>>> >>>>>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>>>>> file bugs and fixed them after FC. >>>>>>>>>>> >>>>>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>>>>> the only thing holding it from push. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>>>> Hi all >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to contribute a patch which optimizes >>>>>>>>>>>>> Math.sin() and >>>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>>>>> implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>>>>>> >>>>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>>>> >>>>>>>>>>>>> Bug-id: >>>>>>>>>>>>> >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>>>> webrev: >>>>>>>>>>>>> >>>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Vivek >>>>>>>>>>>>> >>> From aph at redhat.com Sat Jan 9 10:40:18 2016 From: aph at redhat.com (Andrew Haley) Date: Sat, 9 Jan 2016 10:40:18 +0000 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <56902043.1040409@oracle.com> References: <568F9852.4090806@oracle.com> <56902043.1040409@oracle.com> Message-ID: <5690E392.9060704@redhat.com> On 08/01/16 20:46, Vladimir Kozlov wrote: > Looks good to me. Maybe we're going to need changes for PPC and AArch64. I'm wondering if maybe we could have some sort of way to flag such changes for maintainers of those ports. Otherwise it's just luck that I notice the bug going past. Andrew. From tobias.hartmann at oracle.com Mon Jan 11 06:56:32 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Jan 2016 07:56:32 +0100 Subject: [9] RFR(S): 8136469: OptimizeStringConcat fails on pre-sized StringBuilder shapes In-Reply-To: <56900F31.4060409@oracle.com> References: <55FBDFEC.4060405@oracle.com> <568CF8F5.5090202@oracle.com> <568DB2DF.4010305@oracle.com> <568EAE74.6020507@oracle.com> <568EBB86.1060108@oracle.com> <568F7BD4.1070000@oracle.com> <56900F31.4060409@oracle.com> Message-ID: <56935220.6020301@oracle.com> Thanks, Vladimir. Best, Tobias On 08.01.2016 20:34, Vladimir Kozlov wrote: > Very good. > > Thanks, > Vladimir > > On 1/8/16 1:05 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 07.01.2016 20:24, Vladimir Kozlov wrote: >>> On 1/7/16 10:29 AM, Tobias Hartmann wrote: >>>> Hi Vladimir, >>>> >>>> On 07.01.2016 01:35, Vladimir Kozlov wrote: >>>>> Nope. Too much unrelated changes. If you want to go this road - file separate RFE to change phase argument type of Identity() and Value(). >>>> >>>> Okay, I agree. I filed JDK-8146629 [1]. >>>> >>>>> And why use PhaseValue and not PhaseGVN as in Ideal()? >>>> >>>> Right, we can use PhaseGVN. >>>> >>>>> So I agree to do your change in IfNode::Identity(). But as separate fix after general change. >>>> >>>> Here is the updated webrev based on JDK-8146629: >>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.06/ >>> >>> So for IGVN we wait until dead branch is removed and only one IfProj node left before we do this Identity optimization. >>> And for GVN (Parse phase) we don't wait because during this phase we don't remove nodes. >>> The comment should say something about GVN/Parse phase to understand !phase->is_IterGVN() condition. >> >> Right, I updated the comment. Does this look good to you? >> http://cr.openjdk.java.net/~thartmann/8136469/webrev.07 >> >> Thanks, >> Tobias >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8146629 >>>> >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/6/16 3:22 AM, Tobias Hartmann wrote: >>>>>> Hi, >>>>>> >>>>>> I had an off-thread discussion with Roland and we came to the conclusion that all proposed fixes essentially work around the fact that we are unable to determine if Identity is called from GVN or IGVN. As Roland pointed out, we would probably miss to adapt such a fix if we ever get the ability to check for GVN/IGVN. >>>>>> >>>>>> Here is a more robust solution not depending on any worklist ordering assumptions and not causing unexpected side effects: >>>>>> Since Node::Identity(PhaseTransform* phase) is always called with either PhaseGVN or PhaseIterGVN, we can change the argument to type PhaseValues* and can therefore simply use phase->is_IterGVN() to determine if we were called from GVN or IGVN. This could also be useful for other changes. Of course, this introduces an additional virtual call but we are already calling phase->is_IterGVN() at many other places in the code. In the future, these calls could be replaced by a field access (as Vladimir suggested in the RFR for 8139771). >>>>>> >>>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.05/ >>>>>> >>>>>> What do you think? >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>> >>>>>> On 18.09.2015 11:57, Tobias Hartmann wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please review the following patch. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8136469 >>>>>>> http://cr.openjdk.java.net/~thartmann/8136469/webrev.00/ >>>>>>> >>>>>>> Problem: >>>>>>> When creating a pre-sized StringBuilder, C2's string concatenation optimization sometimes fails to optimize the chain (see [1]). The problem is that the initial size of the StringBuilder depends on a static final boolean that is initialized to true at runtime. Therefore the string concatenation control flow chain [2] contains an IfNode with a ConI (1) as input instead of the expected BoolNode and StringConcat::validate_control_flow() silently bails out. >>>>>>> >>>>>>> Solution: >>>>>>> I changed the implementation to skip dead tests as they would be removed by IGVN later anyway. I added an assert to make sure we don't bail out silently if the input of the IfNode is not a bool. I also had to change validate_mem_flow() to handle dead ifs. Further, the assert in line 825 is unnecessary because we execute the same check in as_If(). >>>>>>> >>>>>>> Testing: >>>>>>> - New test (TestPresizedStringBuilder) >>>>>>> - JPRT >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/secure/attachment/53220/TestPresizedStringBuilder.java >>>>>>> [2] https://bugs.openjdk.java.net/secure/attachment/53218/graph.png >>>>>>> From martin.doerr at sap.com Mon Jan 11 08:39:50 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 11 Jan 2016 08:39:50 +0000 Subject: RFR(M): 8146612: C2: Precedence edges specification violated In-Reply-To: <56901219.8090805@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB418116567228AAB8@DEWDFEMB19C.global.corp.sap> <568EE1A6.3050202@oracle.com> <7C9B87B351A4BA4AA9EC95BB418116567228ACE8@DEWDFEMB19C.global.corp.sap> <56901219.8090805@oracle.com> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228AF47@DEWDFEMB19C.global.corp.sap> Hi Vladimir, thanks for reviewing and sponsoring. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Freitag, 8. Januar 2016 20:47 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8146612: C2: Precedence edges specification violated Very good. I will sponsor it. Thanks, Vladimir On 1/8/16 3:06 AM, Doerr, Martin wrote: > Hi Vladimir, > > thanks for the review. > > I have changed the comments, added assertions and factored out the common functionality of del_req(), del_req_ordered() and rm_prec() into a new private function close_prec_gap_at(). That makes sense. > > About your concern about accessing outside of _in array in rm_prec(): > Please note that i is decremented before it gets used: > "j == _max-1", "i" will be set to "_max", but decremented in "_in[--i]" > > Anyway, I have replaced this code by close_prec_gap_at(), so it doesn't matter anymore. > > The new webrev is here: > http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.01/ > > Best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 7. Januar 2016 23:08 > To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8146612: C2: Precedence edges specification violated > > // Avoid spec violation: multiple prec edge. > > I think should be: > > // Avoid spec violation: duplicated prec edge. > > Should we add assert to rm_prec()?: > assert(j >= _cnt, "not a precedence edge"); > > Also we may need to check that input index is < _max in set_prec() and rm_prec(). > > Next access will be outside _in array if j == _max-1 (in rm_prec()): > > _in[i] = NULL; // NULL out last element > > unless we guarantee that there is always NULL at the end. Which I don't see because set_prec() may set the last prec > edge to not NULL. > > Please factor out similar code (search for last non-NULL prec edge) in del_req(), del_req_ordered() and rm_prec() into > separate method. > > Thanks, > Vladimir > > > On 1/7/16 5:45 AM, Doerr, Martin wrote: >> Hi, >> >> some time ago, we found out, that C2 doesn't treat precedence edges as specified. >> >> The description of precedence edges in node.hpp says: >> >> "They are unordered and not duplicated; they have no embedded NULLs." >> >> Some functions in the current implementation violate this specification. >> >> I have fixed this in the following webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8146612_C2_prec_edges/webrev.00/ >> >> Please review. I will need a sponsor, please. >> >> Best regards, >> >> Martin >> From tobias.hartmann at oracle.com Mon Jan 11 08:48:40 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Jan 2016 09:48:40 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <568F9852.4090806@oracle.com> References: <568F9852.4090806@oracle.com> Message-ID: <56936C68.70002@oracle.com> Hi Zoltan, looks good to me. Do you think it would make sense to add a regression test running with flag combinations like -XX:-UseTLAB and -XX:+ZeroTLAB to catch the missing initialization? Best, Tobias On 08.01.2016 12:06, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8086053. > > https://bugs.openjdk.java.net/browse/JDK-8086053 > > Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. > Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These inconsistencies lead to newly allocated regions not being filled with zeros. > > Solution: Address the following: > - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. > - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ > > Testing: > - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; > - JPRT; > - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. > > Thank you and best regards, > > > Zoltan > From tobias.hartmann at oracle.com Mon Jan 11 09:26:00 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Jan 2016 10:26:00 +0100 Subject: FW: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: <56937528.2080600@oracle.com> Hi Rahul, > http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ Why don't you use 'markOopDesc::hash_mask_in_place' for the 64 bit version? This should safe some instructions and you also don't need the 'hash' register if you compute everything in 'result'. Best, Tobias On 08.01.2016 18:13, Rahul Raghavan wrote: > Hello, > > Please review the following revised patch for JDK-6378256 - > http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ > > This revised webrev got following changes - > > 1) A minor, better optimized code with return 0 at initial stage (instead of continuing to 'slowCase' path), for special/rare null reference input! > (as per documentation, test results confirmed it is safe to 'return 0' for null reference input, for System.identityHashCode) > > 2) Added similar Object.hashCode, System.identityHashCode optimization support in sharedRuntime_x86_64.cpp. > > Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. > > Thanks, > Rahul > > >> -----Original Message----- >> From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan> Cc: hotspot-compiler-dev at openjdk.java.net >> >>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >> >> Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again would be >> nice. >> Shouldn't we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? >> >> Roland. > > >> -----Original Message----- >> From: Rahul Raghavan > Sent: Wednesday, December 09, 2015 2:43 PM > To: hotspot-compiler-dev at openjdk.java.net >> >> Hello, >> >> Please review the following patch for JDK-6378256. >> >> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . >> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times >> slower). >> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). >> >> sample unit test: >> public class Jdk6378256Test >> { >> public static void main(String[] args) >> { >> Object obj = new Object(); >> long time = System.nanoTime(); >> for(int i = 0 ; i < 1000000 ; i++) >> System.identityHashCode(obj); //compare to obj.hashCode(); >> System.out.println ("Result = " + (System.nanoTime() - time)); >> } >> } >> >> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. >> (looks in the header for the hashCode before calling into the VM). >> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. >> So also added required additional null check for System.identityHashCode case. >> >> Testing: >> - successful JPRT run (-testset hotspot). >> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). >> (with -client / -XX:TieredStopAtLevel=1 etc. options). >> - Added 'noreg-perf' label for this performance bug. >> Manual testing done and confirmed expected performance values for unit tests with fix. >> >> Thanks, >> Rahul From tobias.hartmann at oracle.com Mon Jan 11 10:08:31 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Jan 2016 11:08:31 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <568EB3A0.3040909@oracle.com> References: <568EB3A0.3040909@oracle.com> Message-ID: <56937F1F.7010709@oracle.com> FYI, I had to merge with JDK-8143353 [1] (CosDNode and SinDNode were removed). This is the change I indent to push: http://cr.openjdk.java.net/~thartmann/8146629/webrev.01/ Thanks, Tobias [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/13b04370e8e9 On 07.01.2016 19:51, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8146629 > http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ > > Currently, there is no way to determine in Node::Identity() and Node::Value() if we were called from GVN or IGVN but sometimes we would like to do optimizations based on this information (for example, see discussion in RFR for JDK-8136469 [1]). I changed the arguments of Node::Identity() and Node::Value() from PhaseTransform* to PhaseGVN*. Like this, we can simply call PhaseValues::is_IterGVN() from both methods. > > Thanks, > Tobias > > [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html > From zoltan.majo at oracle.com Mon Jan 11 13:10:42 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 11 Jan 2016 14:10:42 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <56902043.1040409@oracle.com> References: <568F9852.4090806@oracle.com> <56902043.1040409@oracle.com> Message-ID: <5693A9D2.3080009@oracle.com> Thank you, Vladimir, for the review! Best regards, Zoltan On 01/08/2016 09:46 PM, Vladimir Kozlov wrote: > Looks good to me. > > Thanks, > Vladimir > > On 1/8/16 3:06 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8086053. >> >> https://bugs.openjdk.java.net/browse/JDK-8086053 >> >> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly >> allocated TLAB regions. With ZeroTLAB disabled, >> the interpreter and compiled code should assume the responsibility to >> zero-fill newly allocated regions. >> Currently, the handling of the ZeroTLAB flag shows some >> inconsistencies between the GC and the compilers. These >> inconsistencies lead to newly allocated regions not being filled with >> zeros. >> >> Solution: Address the following: >> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without >> notifying the GC. As a result, the newly allocated >> TLAB is not initialized with zero. Add TLAB initialization code to C1. >> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of >> newly allocated objects/arrays even if TLAB >> allocation is disabled. Add stricter conditions to C2 on when to skip >> filling objects/arrays with zero. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >> >> Testing: >> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC >> and -XX:+UseSerialGC; >> - JPRT; >> - all hotspot tests on all platforms affected by the change using all >> combinations of +/-UseTLAB and +/-ZeroTLAB. >> >> Thank you and best regards, >> >> >> Zoltan >> From zoltan.majo at oracle.com Mon Jan 11 13:11:03 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 11 Jan 2016 14:11:03 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <56936C68.70002@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> Message-ID: <5693A9E7.3040700@oracle.com> Hi Tobias, On 01/11/2016 09:48 AM, Tobias Hartmann wrote: > Hi Zoltan, > > looks good to me. thank you for the feedback! > Do you think it would make sense to add a regression test running with flag combinations like -XX:-UseTLAB and -XX:+ZeroTLAB to catch the missing initialization? Yes, that is a good idea. I added a test that launches the VM with all flag combinations and also with different GCs. I did the same what the test does to reproduce the original failure. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ The newly added test passes on all supported platforms. Thank you and best regards, Zoltan > > Best, > Tobias > > > On 08.01.2016 12:06, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8086053. >> >> https://bugs.openjdk.java.net/browse/JDK-8086053 >> >> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. >> Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These inconsistencies lead to newly allocated regions not being filled with zeros. >> >> Solution: Address the following: >> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. >> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >> >> Testing: >> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; >> - JPRT; >> - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. >> >> Thank you and best regards, >> >> >> Zoltan >> From zoltan.majo at oracle.com Mon Jan 11 13:16:11 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 11 Jan 2016 14:16:11 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5690E392.9060704@redhat.com> References: <568F9852.4090806@oracle.com> <56902043.1040409@oracle.com> <5690E392.9060704@redhat.com> Message-ID: <5693AB1B.7090909@oracle.com> Hi Andrew, On 01/09/2016 11:40 AM, Andrew Haley wrote: > On 08/01/16 20:46, Vladimir Kozlov wrote: >> Looks good to me. > Maybe we're going to need changes for PPC and AArch64. Yes, I think you need this patch on those platforms as well. > I'm wondering > if maybe we could have some sort of way to flag such changes for > maintainers of those ports. Otherwise it's just luck that I notice > the bug going past. Maybe we could define a new JIRA label for this purpose. What do you think about that? Also, we might need a way to signal the need to propagate changes into the opposite direction (i.e., from ppc/aarch64 to the other supported platforms). Best regards, Zoltan > > Andrew. > From doug.simon at oracle.com Mon Jan 11 13:18:50 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Jan 2016 14:18:50 +0100 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation Message-ID: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. This review is for changes that address the above issues as follows: 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. 2. A thread waiting for a compilation task to complete checks the state of the compiler thread periodically (500ms intervals). If 5 successive checks see a blocked thread, the compilation times out and the waiting thread is unblocked. https://bugs.openjdk.java.net/browse/JDK-8146705 http://cr.openjdk.java.net/~dnsimon/8146705/ -Doug From tobias.hartmann at oracle.com Mon Jan 11 13:20:33 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Jan 2016 14:20:33 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5693A9E7.3040700@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> Message-ID: <5693AC21.5070304@oracle.com> Hi Zoltan, looks good to me! Best, Tobias On 11.01.2016 14:11, Zolt?n Maj? wrote: > Hi Tobias, > > > On 01/11/2016 09:48 AM, Tobias Hartmann wrote: >> Hi Zoltan, >> >> looks good to me. > > thank you for the feedback! > >> Do you think it would make sense to add a regression test running with flag combinations like -XX:-UseTLAB and -XX:+ZeroTLAB to catch the missing initialization? > > Yes, that is a good idea. I added a test that launches the VM with all flag combinations and also with different GCs. I did the same what the test does to reproduce the original failure. > > Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ > > The newly added test passes on all supported platforms. > > Thank you and best regards, > > > Zoltan > >> >> Best, >> Tobias >> >> >> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the patch for 8086053. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>> >>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. >>> Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These inconsistencies lead to newly allocated regions not being filled with zeros. >>> >>> Solution: Address the following: >>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. >>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>> >>> Testing: >>> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; >>> - JPRT; >>> - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> > From aph at redhat.com Mon Jan 11 13:31:46 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 11 Jan 2016 13:31:46 +0000 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5693AB1B.7090909@oracle.com> References: <568F9852.4090806@oracle.com> <56902043.1040409@oracle.com> <5690E392.9060704@redhat.com> <5693AB1B.7090909@oracle.com> Message-ID: <5693AEC2.70409@redhat.com> Hi, On 01/11/2016 01:16 PM, Zolt?n Maj? wrote: > On 01/09/2016 11:40 AM, Andrew Haley wrote: >> On 08/01/16 20:46, Vladimir Kozlov wrote: >>> Looks good to me. >> Maybe we're going to need changes for PPC and AArch64. > > Yes, I think you need this patch on those platforms as well. > >> I'm wondering >> if maybe we could have some sort of way to flag such changes for >> maintainers of those ports. Otherwise it's just luck that I notice >> the bug going past. > > Maybe we could define a new JIRA label for this purpose. What do you > think about that? That sounds like it might work. > Also, we might need a way to signal the need to propagate changes into > the opposite direction (i.e., from ppc/aarch64 to the other supported > platforms). Maybe so. That hasn't happened yet, though. The symmetry appeals to me. Andrew. From zoltan.majo at oracle.com Mon Jan 11 13:37:33 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 11 Jan 2016 14:37:33 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5693A9E7.3040700@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> Message-ID: <5693B01D.60604@oracle.com> Hi, On 01/11/2016 02:11 PM, Zolt?n Maj? wrote: > [...] > Yes, that is a good idea. I added a test that launches the VM with all > flag combinations and also with different GCs. I did the same what the > test does to reproduce the original failure. > > Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ The test contains and unnecessary @library tag and package import. The year in the copyright statement must be changed as well (to 2016). Here is the webrev with those changes: http://cr.openjdk.java.net/~zmajo/8086053/webrev.02/ Sorry for the noise. Thank you and best regards, Zoltan > > The newly added test passes on all supported platforms. > > Thank you and best regards, > > > Zoltan > >> >> Best, >> Tobias >> >> >> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the patch for 8086053. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>> >>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill >>> newly allocated TLAB regions. With ZeroTLAB disabled, the >>> interpreter and compiled code should assume the responsibility to >>> zero-fill newly allocated regions. >>> Currently, the handling of the ZeroTLAB flag shows some >>> inconsistencies between the GC and the compilers. These >>> inconsistencies lead to newly allocated regions not being filled >>> with zeros. >>> >>> Solution: Address the following: >>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB >>> without notifying the GC. As a result, the newly allocated TLAB is >>> not initialized with zero. Add TLAB initialization code to C1. >>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of >>> newly allocated objects/arrays even if TLAB allocation is disabled. >>> Add stricter conditions to C2 on when to skip filling objects/arrays >>> with zero. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>> >>> Testing: >>> - local testing (linux_x86_64) of failing test case with >>> -XX:+UseG1GC and -XX:+UseSerialGC; >>> - JPRT; >>> - all hotspot tests on all platforms affected by the change using >>> all combinations of +/-UseTLAB and +/-ZeroTLAB. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> > From zoltan.majo at oracle.com Mon Jan 11 13:40:18 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 11 Jan 2016 14:40:18 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5693AC21.5070304@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> <5693AC21.5070304@oracle.com> Message-ID: <5693B0C2.8090908@oracle.com> Hi Tobias, On 01/11/2016 02:20 PM, Tobias Hartmann wrote: > Hi Zoltan, > > looks good to me! thank you for the review! Best regards, Zoltan > > Best, > Tobias > > On 11.01.2016 14:11, Zolt?n Maj? wrote: >> Hi Tobias, >> >> >> On 01/11/2016 09:48 AM, Tobias Hartmann wrote: >>> Hi Zoltan, >>> >>> looks good to me. >> thank you for the feedback! >> >>> Do you think it would make sense to add a regression test running with flag combinations like -XX:-UseTLAB and -XX:+ZeroTLAB to catch the missing initialization? >> Yes, that is a good idea. I added a test that launches the VM with all flag combinations and also with different GCs. I did the same what the test does to reproduce the original failure. >> >> Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ >> >> The newly added test passes on all supported platforms. >> >> Thank you and best regards, >> >> >> Zoltan >> >>> Best, >>> Tobias >>> >>> >>> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8086053. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>>> >>>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. >>>> Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These inconsistencies lead to newly allocated regions not being filled with zeros. >>>> >>>> Solution: Address the following: >>>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. >>>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>>> >>>> Testing: >>>> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; >>>> - JPRT; >>>> - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> From doug.simon at oracle.com Mon Jan 11 14:05:03 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Jan 2016 15:05:03 +0100 Subject: RFR: 8146788: remove jvmci.jar from mx suite Message-ID: Please this small change to remove generation of a jvmci.jar by the mx JVMCI build system. https://bugs.openjdk.java.net/browse/JDK-8146788 http://cr.openjdk.java.net/~dnsimon/8146788/ -Doug From roland.westrelin at oracle.com Mon Jan 11 15:07:55 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 11 Jan 2016 16:07:55 +0100 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph Message-ID: http://cr.openjdk.java.net/~roland/8146792/webrev.00/ - partial peeling is applied to a loop - the peeled section is optimized and leaves a pinned node between the loop predicates and the loop body but no control flow - loop predicates are applied and a predicate that depends on the pinned node is moved out of the loop, before the pinned node, leading to a broken graph This is the same issue that came up during review of 8139771. Vladimir suggested it gets reviewed separately. With the included test case it reproduces without the change from 8139771. Roland. From tobias.hartmann at oracle.com Mon Jan 11 15:20:31 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 11 Jan 2016 16:20:31 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <56901101.6050503@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> Message-ID: <5693C83F.9030100@oracle.com> On 08.01.2016 20:41, Vladimir Kozlov wrote: > On 1/8/16 2:37 AM, Tobias Hartmann wrote: >> >> On 07.01.2016 21:49, Vladimir Kozlov wrote: >>> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>>> Hi Vladimir, >>>> >>>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>>> Andrew is right. >>>> >>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >>> >>> Right. It was the root of this bug, see below. >>> >>>> >>>> I fixed this for the inflate and compress intrinsics. >>>> >>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>>> StrInflatedCopyNode is not memory node. >>>> >>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >>> >>> I did not get the question. Is it before your webrev.01 change? Or even with the change? >> >> I meant with webrev.01 but you answered my question below. >> >>>> // This class defines a projection of the memory state of a store conditional node. >>>> // These nodes return a value, but also update memory. >>>> >>>> But inflate does not return any value. >>> >>> Hmm, according to bottom type inflate produce memory: >>> >>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >>> >>> So it really does not need SCMemProjNode. Sorry about that. >>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. >> >> Exactly. >> >>> Instead of SCMemProjNode we should have to change the idx of your dst_type: >>> >>> set_memory(str, dst_type); >> >> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. >> >>> And you should rollback part of changes in escape.cpp and macro.cpp. >> >> Okay, I'll to that. >> >>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >>> >>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. >> >> Okay, should we then use BOTTOM for both the input and output type? > > Only input. Output type corresponds to dst array type which you set correctly now. It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case: StoreC inflate_string LoadC The memory graph (def->use) now looks like this: LoadC -> inflate_string -> ByteMem ... StoreC-> CharMem The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2'). Setting the input to BOTTOM, generates the following graph: http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails. I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this: LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem)) http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png Here is the new webrev: http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/ Probably, we could also only capture the byte and char slices instead of merging everything. What do you think? Best, Tobias >>>> Related question: >>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>>> char[int:>=0]:exact+any * >>>> >>>> which is equal to the type of the char load. >>> >>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? >> >> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: >> >> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) >> // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) >> // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) >> // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) >> // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >> >> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. >> >>> Should we also be more careful in inflate_string_slow()? Is it used? >> >> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >> >>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>>> char[int:1]:NotNull:exact * >>>> >>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >>> >>> It is indeed strange. What memory type of LoadUS? It could be bug. >> >> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. >> >> I will look into this again and try to understand what happens. > > It could that aryptr is pointer to array and load type is pointer to array's element. > > Thanks, > Vladimir > >> >> Thanks, >> Tobias >> >>>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>>> >>>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>>> >>>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>>> // compiler because the CPU does all the ordering for us. >>>>>>> >>>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>>> memory ordering, loads are never moved before dependent stores. >>>>>>> >>>>>>> Or did I misunderstand your question? >>>>>> >>>>>> No, I don't think so. I was just checking: I am very aware that >>>>>> HotSpot has presented those of use with relaxed memory order machines >>>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>>> surprised that C2 needs this barrier, given that there is a >>>>>> read-after-write dependency, but never mind. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Andrew. >>>>>> From roland.westrelin at oracle.com Mon Jan 11 15:36:52 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 11 Jan 2016 16:36:52 +0100 Subject: RFR(M): 8145322: Code generated from unsafe loops can be slightly improved In-Reply-To: <5670C098.1030301@oracle.com> References: <566F8177.8080000@oracle.com> <6BBA85D7-71DE-43AD-9DA9-CA97FF99F73D@oracle.com> <5670C098.1030301@oracle.com> Message-ID: <13ADAC9C-2611-410B-A0F9-A8662C4F58BD@oracle.com> Thanks for the review, Vladimir and Tobias. Roland. > On Dec 16, 2015, at 2:38 AM, Vladimir Kozlov wrote: > > Very nice! > > You may need to change code in castnode.cpp according new changes 8145096 if they pushed first (not yet). > And also 32-bit as Tobias pointed. > > Thanks, > Vladimir > > On 12/15/15 12:55 AM, Roland Westrelin wrote: >> Hi Vladimir, >> >> Thanks for looking at this. >> >>> Second assembler output still have intermediate increments and also new movslq instructions. Why it should be better. >> >> I thinks there is some confusion here. There are 2 problems I?d like to fix. One is when using checkIndex. In that case, the code should be as good as regular array accesses. The first assembly dump shows it?s not. The second problem is when not using checkIndex but we know the loop bounds, should be able to do better. That?s the second assembly dump. In my email I only showed assembly without my change. With my change: >> >> first test case: >> >> 0c2 B11: # B37 B12 <- B8 B10 Loop: B11-B10 inner main of N142 Freq: 975.841 >> 0c2 movq RAX, [RSI + #16 + RDI << #3] # long >> 0c7 movq RBX, [R9 + #16 + RDI << #3] # long >> 0cc cmpq RBX, RAX >> 0cf jne B37 P=0.000000 C=7836.000000 >> 0cf >> 0d5 B12: # B38 B13 <- B11 Freq: 975.84 >> 0d5 movq RAX, [RSI + #24 + RDI << #3] # long >> 0da movq RBX, [R9 + #24 + RDI << #3] # long >> 0df cmpq RBX, RAX >> 0e2 jne B38 P=0.000000 C=7836.000000 >> 0e2 >> 0e8 B13: # B40 B14 <- B12 Freq: 975.84 >> 0e8 movq RAX, [RSI + #32 + RDI << #3] # long >> 0ed movq RBX, [R9 + #32 + RDI << #3] # long >> 0f2 cmpq RBX, RAX >> 0f5 jne B40 P=0.000000 C=7836.000000 >> 0f5 >> 0fb B14: # B42 B15 <- B13 Freq: 975.84 >> 0fb movq RAX, [RSI + #40 + RDI << #3] # long >> 100 movq RBX, [R9 + #40 + RDI << #3] # long >> 105 cmpq RBX, RAX >> 108 jne B42 P=0.000000 C=7836.000000 >> 108 >> 10e B15: # B44 B16 <- B14 Freq: 975.839 >> 10e movq RAX, [RSI + #48 + RDI << #3] # long >> 113 movq RBX, [R9 + #48 + RDI << #3] # long >> 118 movl RDX, RDI # spill >> 11a addl RDX, #4 # int >> 11d cmpq RBX, RAX >> 120 jne B44 P=0.000000 C=7836.000000 >> 120 >> 126 B16: # B39 B17 <- B15 Freq: 975.839 >> 126 movq RAX, [RSI + #56 + RDI << #3] # long >> 12b movq RBX, [R9 + #56 + RDI << #3] # long >> 130 cmpq RBX, RAX >> 133 jne B39 P=0.000000 C=7836.000000 >> 133 >> 139 B17: # B41 B18 <- B16 Freq: 975.838 >> 139 movq RAX, [RSI + #64 + RDI << #3] # long >> 13e movq RBX, [R9 + #64 + RDI << #3] # long >> 143 cmpq RBX, RAX >> 146 jne B41 P=0.000000 C=7836.000000 >> 146 >> 14c B18: # B43 B19 <- B17 Freq: 975.838 >> 14c movq RAX, [RSI + #72 + RDI << #3] # long >> 151 movq RBX, [R9 + #72 + RDI << #3] # long >> 156 cmpq RBX, RAX >> 159 jne B43 P=0.000000 C=7836.000000 >> 159 >> 15f B19: # B10 B20 <- B18 Freq: 975.837 >> 15f movl RDX, RDI # spill >> 161 addl RDX, #8 # int >> 164 cmpl RDX, RBP >> 166 jl B10 # loop end P=0.998980 C=7836.000000 >> >> >> >> second test case: >> >> 0a3 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 >> 0a3 movq RDI, [RBP + #16 + RSI << #3] # long >> 0a8 movq RAX, [RDX + #16 + RSI << #3] # long >> 0ad cmpq RAX, RDI >> 0b0 jne B32 P=0.000000 C=7836.000000 >> 0b0 >> 0b6 B8: # B33 B9 <- B7 Freq: 975.842 >> 0b6 movq RDI, [RBP + #24 + RSI << #3] # long >> 0bb movq RAX, [RDX + #24 + RSI << #3] # long >> 0c0 cmpq RAX, RDI >> 0c3 jne B33 P=0.000000 C=7836.000000 >> 0c3 >> 0c9 B9: # B35 B10 <- B8 Freq: 975.842 >> 0c9 movq RDI, [RBP + #32 + RSI << #3] # long >> 0ce movq RAX, [RDX + #32 + RSI << #3] # long >> 0d3 cmpq RAX, RDI >> 0d6 jne B35 P=0.000000 C=7836.000000 >> 0d6 >> 0dc B10: # B39 B11 <- B9 Freq: 975.842 >> 0dc movq RDI, [RBP + #40 + RSI << #3] # long >> 0e1 movq RAX, [RDX + #40 + RSI << #3] # long >> 0e6 cmpq RAX, RDI >> 0e9 jne B39 P=0.000000 C=7836.000000 >> 0e9 >> 0ef B11: # B38 B12 <- B10 Freq: 975.841 >> 0ef movq RDI, [RBP + #48 + RSI << #3] # long >> 0f4 movq RAX, [RDX + #48 + RSI << #3] # long >> 0f9 movl R8, RSI # spill >> 0fc addl R8, #4 # int >> 100 cmpq RAX, RDI >> 103 jne B38 P=0.000000 C=7836.000000 >> 103 >> 109 B12: # B34 B13 <- B11 Freq: 975.841 >> 109 movq RDI, [RBP + #56 + RSI << #3] # long >> 10e movq RAX, [RDX + #56 + RSI << #3] # long >> 113 cmpq RAX, RDI >> 116 jne B34 P=0.000000 C=7836.000000 >> 116 >> 11c B13: # B36 B14 <- B12 Freq: 975.84 >> 11c movq RDI, [RBP + #64 + RSI << #3] # long >> 121 movq RAX, [RDX + #64 + RSI << #3] # long >> 126 cmpq RAX, RDI >> 129 jne B36 P=0.000000 C=7836.000000 >> 129 >> 12f B14: # B38 B15 <- B13 Freq: 975.84 >> 12f movq RDI, [RBP + #72 + RSI << #3] # long >> 134 movq RAX, [RDX + #72 + RSI << #3] # long >> 139 movl R8, RSI # spill >> 13c addl R8, #7 # int >> 140 cmpq RAX, RDI >> 143 jne B38 P=0.000000 C=7836.000000 >> 143 >> 149 B15: # B7 B16 <- B14 Freq: 975.839 >> 149 addl RSI, #8 # int >> 14c cmpl RSI, R11 >> 14f jl B7 # loop end P=0.998980 C=7836.000000 >> >> Roland. >> >>> >>> Thanks, >>> Vladimir >>> >>> On 12/14/15 8:42 AM, Roland Westrelin wrote: >>>> http://cr.openjdk.java.net/~roland/8145322/webrev.00/ >>>> >>>> Paul spotted the following small inefficiencies: >>>> >>>> for (; wi < l; wi++) { >>>> long bi = ((long) Objects.checkIndex(wi, l, null)) << LOG2_ARRAY_LONG_INDEX_SCALE; >>>> long av = U.getLongUnaligned(a, aOffset + bi); >>>> long bv = U.getLongUnaligned(b, bOffset + bi); >>>> if (av != bv) { >>>> >>>> is compiled to: >>>> >>>> 0b0 B9: # B28 B10 <- B8 B13 Loop: B9-B13 inner main of N130 Freq: 977.661 >>>> 0b0 movl RDX, RDI # spill >>>> 0b2 # castII of RDX >>>> 0b2 movq RBX, [R9 + #16 + RDX << #3] # long >>>> 0b7 movq RAX, [RSI + #16 + RDX << #3] # long >>>> 0bc cmpq RBX, RAX >>>> 0bf jne B28 P=0.000000 C=7836.000000 >>>> 0bf >>>> 0c5 B10: # B28 B11 <- B9 Freq: 977.66 >>>> 0c5 movl RDX, RDI # spill >>>> 0c7 incl RDX # int >>>> 0c9 # castII of RDX >>>> 0c9 movq RBX, [R9 + #16 + RDX << #3] # long >>>> 0ce movq RAX, [RSI + #16 + RDX << #3] # long >>>> 0d3 cmpq RBX, RAX >>>> 0d6 jne B28 P=0.000000 C=7836.000000 >>>> 0d6 >>>> 0dc B11: # B28 B12 <- B10 Freq: 977.66 >>>> 0dc movl RDX, RDI # spill >>>> 0de addl RDX, #2 # int >>>> 0e1 # castII of RDX >>>> 0e1 movq RBX, [R9 + #16 + RDX << #3] # long >>>> 0e6 movq RAX, [RSI + #16 + RDX << #3] # long >>>> 0eb cmpq RBX, RAX >>>> 0ee jne B28 P=0.000000 C=7836.000000 >>>> 0ee >>>> 0f4 B12: # B28 B13 <- B11 Freq: 977.659 >>>> 0f4 movl RDX, RDI # spill >>>> 0f6 addl RDX, #3 # int >>>> 0f9 # castII of RDX >>>> 0f9 movq RBX, [R9 + #16 + RDX << #3] # long >>>> 0fe movq RAX, [RSI + #16 + RDX << #3] # long >>>> 103 cmpq RBX, RAX >>>> 106 jne B28 P=0.000000 C=7836.000000 >>>> 106 >>>> 10c B13: # B9 B14 <- B12 Freq: 977.659 >>>> 10c addl RDI, #4 # int >>>> 10f cmpl RDI, RBP >>>> 111 jl,s B9 # loop end P=0.998980 C=7836.000000 >>>> >>>> But the intermediate increment of the induction variable: >>>> 0c7 incl RDX # int >>>> 0de addl RDX, #2 # int >>>> 0f6 addl RDX, #3 # int >>>> >>>> should be folded in the address computation of the memory accesses: ConvI2L(AddI(x, y)) should be converted to AddL(ConvI2L(x), ConvI2L(y)) but there?s a CastII from the checkIndex between the AddI and the ConvI2L so we first need to push the CastII through the AddI. That?s the first CastIINode::Ideal transformation. If we apply that transformation we then have several CastII that only differ by their type so we need the second transformation of CastIINode::Ideal so all of them fold after loop opts. >>>> >>>> for (; wi < length >> valuesPerWidth; wi++) { >>>> long bi = ((long) wi) << LOG2_ARRAY_LONG_INDEX_SCALE; >>>> long av = U.getLongUnaligned(a, aOffset + bi); >>>> long bv = U.getLongUnaligned(b, bOffset + bi); >>>> if (av != bv) { >>>> >>>> 0b0 B7: # B32 B8 <- B6 B15 Loop: B7-B15 inner main of N123 Freq: 975.843 >>>> 0b0 movslq R8, RSI # i2l >>>> 0b3 movq RAX, [RDX + #16 + R8 << #3] # long >>>> 0b8 movq RDI, [RBP + #16 + R8 << #3] # long >>>> 0bd cmpq RAX, RDI >>>> 0c0 jne B32 P=0.000000 C=7836.000000 >>>> 0c0 >>>> 0c6 B8: # B33 B9 <- B7 Freq: 975.842 >>>> 0c6 movl R8, RSI # spill >>>> 0c9 incl R8 # int >>>> 0cc movslq RDI, R8 # i2l >>>> 0cf movq RAX, [RDX + #16 + RDI << #3] # long >>>> 0d4 movq RDI, [RBP + #16 + RDI << #3] # long >>>> 0d9 cmpq RAX, RDI >>>> 0dc jne B33 P=0.000000 C=7836.000000 >>>> 0dc >>>> 0e2 B9: # B33 B10 <- B8 Freq: 975.842 >>>> 0e2 movl R8, RSI # spill >>>> 0e5 addl R8, #2 # int >>>> 0e9 movslq RDI, R8 # i2l >>>> 0ec movq RAX, [RDX + #16 + RDI << #3] # long >>>> 0f1 movq RDI, [RBP + #16 + RDI << #3] # long >>>> 0f6 cmpq RAX, RDI >>>> 0f9 jne B33 P=0.000000 C=7836.000000 >>>> 0f9 >>>> 0ff B10: # B33 B11 <- B9 Freq: 975.842 >>>> 0ff movl R8, RSI # spill >>>> 102 addl R8, #3 # int >>>> 106 movslq RDI, R8 # i2l >>>> 109 movq RAX, [RDX + #16 + RDI << #3] # long >>>> 10e movq RDI, [RBP + #16 + RDI << #3] # long >>>> 113 cmpq RAX, RDI >>>> 116 jne B33 P=0.000000 C=7836.000000 >>>> 116 >>>> 11c B11: # B33 B12 <- B10 Freq: 975.841 >>>> 11c movl R8, RSI # spill >>>> 11f addl R8, #4 # int >>>> 123 movslq RDI, R8 # i2l >>>> 126 movq RAX, [RDX + #16 + RDI << #3] # long >>>> 12b movq RDI, [RBP + #16 + RDI << #3] # long >>>> 130 cmpq RAX, RDI >>>> 133 jne B33 P=0.000000 C=7836.000000 >>>> 133 >>>> 139 B12: # B33 B13 <- B11 Freq: 975.841 >>>> 139 movl R8, RSI # spill >>>> 13c addl R8, #5 # int >>>> 140 movslq RDI, R8 # i2l >>>> 143 movq RAX, [RDX + #16 + RDI << #3] # long >>>> 148 movq RDI, [RBP + #16 + RDI << #3] # long >>>> 14d cmpq RAX, RDI >>>> 150 jne B33 P=0.000000 C=7836.000000 >>>> 150 >>>> 156 B13: # B33 B14 <- B12 Freq: 975.84 >>>> 156 movl R8, RSI # spill >>>> 159 addl R8, #6 # int >>>> 15d movslq RDI, R8 # i2l >>>> 160 movq RAX, [RDX + #16 + RDI << #3] # long >>>> 165 movq RDI, [RBP + #16 + RDI << #3] # long >>>> 16a cmpq RAX, RDI >>>> 16d jne B33 P=0.000000 C=7836.000000 >>>> 16d >>>> 173 B14: # B33 B15 <- B13 Freq: 975.84 >>>> 173 movl R8, RSI # spill >>>> 176 addl R8, #7 # int >>>> 17a movslq RDI, R8 # i2l >>>> 17d movq RAX, [RDX + #16 + RDI << #3] # long >>>> 182 movq RDI, [RBP + #16 + RDI << #3] # long >>>> 187 cmpq RAX, RDI >>>> 18a jne B33 P=0.000000 C=7836.000000 >>>> 18a >>>> 190 B15: # B7 B16 <- B14 Freq: 975.839 >>>> 190 addl RSI, #8 # int >>>> 193 cmpl RSI, R11 >>>> 196 jl B7 # loop end P=0.998980 C=7836.000000 >>>> >>>> Same as above the intermediate increment of the induction variable should fold into the address computation but ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) is not applied because the compiler loses track of the bounds of the induction variable. The i2l conversions should also fold into the address computations but they don?t for the same reason. The change in loopnode.cpp tries to work around the problem by capturing the bounds of the loop as soon the CountedLoop is created and before other transformations applied to the loop makes it much harder for the compiler to figure the bounds out. I also relaxed the Phi type computation in PhiNode::Value(). >>>> >>>> I hit a couple unrelated bugs during testing: the fix in x86_64.ad is obvious. The change to superword is because we sometimes end up there with an AddL while, as I understand, we only expect integer nodes. Using the AddL leads to broken graphs. >>>> >>>> Roland. >>>> >> From roland.westrelin at oracle.com Mon Jan 11 15:54:35 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 11 Jan 2016 16:54:35 +0100 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> <568D7FA1.4040707@oracle.com> <1BC8C0B0-E8EF-4D6B-B9EE-D374E2FC3E04@oracle.com> Message-ID: > As a general comment, would it make sense to assume exceptional paths are not taken in most Java code? That is, for code optimization purposes it's probably a reasonable assumption. It seems like having an exceptional path is already a hint that it's not expected to fail; most Java devs know not to use exceptions for expected control flow. That sounds reasonable. There?s a BailoutToInterpreterForThrows command line argument that does that (off by default, not available in product builds). I don?t know what the history behind it is. > Could bytecode shape just like checkIndex be treated as same hint? Are there cases where something looks like checkIndex but really isn't? That sounds like a good suggestion. We would trade: 2 comparisons: i < 0 || i >= length for 2 comparisons: length < 0 || i >=u length so even if it doesn't result in further improvements, we wouldn?t lose anything. Roland. > > Roland. > > > > > On Wednesday, January 6, 2016, Vladimir Kozlov wrote: > > Note, we already have range check pattern matching code in C2 (thanks to Roland): > > > > https://bugs.openjdk.java.net/browse/JDK-8137168 > > > > Vladimir > > > > On 1/6/16 12:39 PM, Vitaly Davidovich wrote: > > I don't think there's a need to write out 20 different ways to do a > > range check -- I think nobody would expect all 20 to be covered by the > > optimizer. Some of those variations may not map cleanly to > > Object::checkIndex either, nor is there any guarantee that people will > > update all their existing range checks (or even know about) to use > > Object::checkIndex -- some code will be left unoptimized no matter what. > > > > But my point is the same as Andrew's, I think; instead of making > > checkIndex an intrinsic, simply add a pattern match against that exact > > bytecode shape (perhaps with basic canonicalization) and then still > > encourage people to use Object::checkIndex. This is better than > > intrinsic (modulo profile pollution) since any other code that happens > > to use same pattern will match as well, and not require an update to use > > checkIndex. Then, if someone comes to this list with an unoptimized > > example with a different bytecode shape and has a convincing argument > > that the code shape is "common", you guys can consider pattern matching > > that as well. > > > > On Wed, Jan 6, 2016 at 2:50 PM, John Rose > > wrote: > > > > > > > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich > > wrote: > > > > > > better canonicalization > > > > That's our first and most important tactic. (Actually inlining is.) > > > > But the various idioms for checkIndex do not canonicalize easily. In > > this case the correct trade-off is not to invest more time and > > research and code into stronger canonicalization. > > > > We do have canonicalization of if-expressions. It's just that in > > this case strengthening it to cover range checks reliably is harder > > than the reasonable alternative. > > > > ? John > > > > PS. I am tempted to write out a list of 20 different ways to code a > > range check but will leave that as a exercise. > > > > > > > > > > -- > > Sent from my phone > > > > -- > Sent from my phone From vitalyd at gmail.com Mon Jan 11 16:18:13 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 11 Jan 2016 11:18:13 -0500 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> <568D7FA1.4040707@oracle.com> <1BC8C0B0-E8EF-4D6B-B9EE-D374E2FC3E04@oracle.com> Message-ID: > > That sounds reasonable. There?s a BailoutToInterpreterForThrows command > line argument that does that (off by default, not available in product > builds). I don?t know what the history behind it is. I'm surprised that's not the default behavior (i.e. statically treating control flow ending with exception as uncommon). Exceptional paths should not be optimization barriers, IMHO. It would also be good to not count bytecodes in those paths for inlining purposes, but that's a separate topic I suppose. That sounds like a good suggestion. We would trade: > 2 comparisons: i < 0 || i >= length > for > 2 comparisons: length < 0 || i >=u length > so even if it doesn't result in further improvements, we wouldn?t lose > anything. Yes, that's my thinking as well. You won't lose anything, but may gain something by picking up similarly-shaped user-code checks elsewhere in existing code. Personally, I think intrinsics should be reserved for constructs/intentions impossible (or very difficult) to express in plain bytecode and for platform/CPU specific things; all else would be pattern matched to cast a wider net. On Mon, Jan 11, 2016 at 10:54 AM, Roland Westrelin < roland.westrelin at oracle.com> wrote: > > As a general comment, would it make sense to assume exceptional paths > are not taken in most Java code? That is, for code optimization purposes > it's probably a reasonable assumption. It seems like having an exceptional > path is already a hint that it's not expected to fail; most Java devs know > not to use exceptions for expected control flow. > > That sounds reasonable. There?s a BailoutToInterpreterForThrows command > line argument that does that (off by default, not available in product > builds). I don?t know what the history behind it is. > > > Could bytecode shape just like checkIndex be treated as same hint? Are > there cases where something looks like checkIndex but really isn't? > > That sounds like a good suggestion. We would trade: > 2 comparisons: i < 0 || i >= length > for > 2 comparisons: length < 0 || i >=u length > > so even if it doesn't result in further improvements, we wouldn?t lose > anything. > > Roland. > > > > > > Roland. > > > > > > > > On Wednesday, January 6, 2016, Vladimir Kozlov < > vladimir.kozlov at oracle.com> wrote: > > > Note, we already have range check pattern matching code in C2 (thanks > to Roland): > > > > > > https://bugs.openjdk.java.net/browse/JDK-8137168 > > > > > > Vladimir > > > > > > On 1/6/16 12:39 PM, Vitaly Davidovich wrote: > > > I don't think there's a need to write out 20 different ways to do a > > > range check -- I think nobody would expect all 20 to be covered by the > > > optimizer. Some of those variations may not map cleanly to > > > Object::checkIndex either, nor is there any guarantee that people will > > > update all their existing range checks (or even know about) to use > > > Object::checkIndex -- some code will be left unoptimized no matter > what. > > > > > > But my point is the same as Andrew's, I think; instead of making > > > checkIndex an intrinsic, simply add a pattern match against that exact > > > bytecode shape (perhaps with basic canonicalization) and then still > > > encourage people to use Object::checkIndex. This is better than > > > intrinsic (modulo profile pollution) since any other code that happens > > > to use same pattern will match as well, and not require an update to > use > > > checkIndex. Then, if someone comes to this list with an unoptimized > > > example with a different bytecode shape and has a convincing argument > > > that the code shape is "common", you guys can consider pattern matching > > > that as well. > > > > > > On Wed, Jan 6, 2016 at 2:50 PM, John Rose > > > wrote: > > > > > > > > > > On Jan 6, 2016, at 9:56 AM, Vitaly Davidovich < > vitalyd at gmail.com > > > > wrote: > > > > > > > > better canonicalization > > > > > > That's our first and most important tactic. (Actually inlining > is.) > > > > > > But the various idioms for checkIndex do not canonicalize easily. > In > > > this case the correct trade-off is not to invest more time and > > > research and code into stronger canonicalization. > > > > > > We do have canonicalization of if-expressions. It's just that in > > > this case strengthening it to cover range checks reliably is harder > > > than the reasonable alternative. > > > > > > ? John > > > > > > PS. I am tempted to write out a list of 20 different ways to code > a > > > range check but will leave that as a exercise. > > > > > > > > > > > > > > > -- > > > Sent from my phone > > > > > > > > -- > > Sent from my phone > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Jan 11 17:23:36 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 07:23:36 -1000 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> Message-ID: > On Jan 11, 2016, at 3:18 AM, Doug Simon wrote: > > The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. > Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. > > This review is for changes that address the above issues as follows: > > 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. Aren?t blocking tasks selected before non-blocking tasks? > 2. A thread waiting for a compilation task to complete checks the state of the compiler thread periodically (500ms intervals). If 5 successive checks see a blocked thread, the compilation times out and the waiting thread is unblocked. > > https://bugs.openjdk.java.net/browse/JDK-8146705 > http://cr.openjdk.java.net/~dnsimon/8146705/ > > -Doug From doug.simon at oracle.com Mon Jan 11 17:30:02 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Jan 2016 18:30:02 +0100 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> Message-ID: <3EFCF17A-DCC3-4D7A-8D85-42A9C5C64A27@oracle.com> > On 11 Jan 2016, at 18:23, Christian Thalinger wrote: > > >> On Jan 11, 2016, at 3:18 AM, Doug Simon wrote: >> >> The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. >> Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. >> >> This review is for changes that address the above issues as follows: >> >> 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. > > Aren?t blocking tasks selected before non-blocking tasks? Yes, exactly the opposite of what I said ;-) I?ve fixed the bug description and thankfully got the implementation the right way round. -Doug From christian.thalinger at oracle.com Mon Jan 11 18:28:57 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 08:28:57 -1000 Subject: RFR: 8146001: Remove support for command line options from JVMCI In-Reply-To: <0BB3D050-7E42-4777-BB7B-E4D7DC2A6605@oracle.com> References: <2FC5EBAA-49A0-42D5-A608-665B8237B326@oracle.com> <8DE14AF8-90A4-4DF2-9CC2-98EE2E4F8670@oracle.com> <1297DA97-3C65-403D-AB46-16E203A74F26@oracle.com> <6C07E8DD-50D4-4B2E-BD8E-B131579A9664@oracle.com> <0BB3D050-7E42-4777-BB7B-E4D7DC2A6605@oracle.com> Message-ID: <721F2EB5-F633-4E8E-AF23-751B169B4A86@oracle.com> > On Jan 6, 2016, at 8:04 AM, Doug Simon wrote: > >> >> On 06 Jan 2016, at 18:54, Christian Thalinger wrote: >> >> I just noticed this code in HotSpotResolvedJavaMethodImpl: >> >> private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); >> >> The only other direct usage of System.getProperty is: >> >> hotspot/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java >> 167: if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { >> >> I think both of them should be using the same mechanism as introduced by this change. > > I agree (assuming you mean the HotSpotJVMCIRuntime.getBooleanProperty mechanism). Yes. > > There?s also: > > hotspot/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java > 70: private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit?); I?ve seen that one too. > > But we will have to leave that as is given that HotSpotJVMCIRuntime is not visible from this code. We could also remove the (legacy) ?jvmci.runtime.TimeInit? alias. Yes, let?s remove the legacy property. https://bugs.openjdk.java.net/browse/JDK-8146820 > > -Doug > >> >>> On Jan 4, 2016, at 12:47 PM, Christian Thalinger wrote: >>> >>>> >>>> On Jan 4, 2016, at 12:31 PM, Doug Simon wrote: >>>> >>>>> >>>>> On 04 Jan 2016, at 18:41, Christian Thalinger wrote: >>>>> >>>>>> >>>>>> On Jan 4, 2016, at 7:19 AM, Christian Thalinger wrote: >>>>>> >>>>>>> >>>>>>> On Jan 4, 2016, at 7:16 AM, Christian Thalinger wrote: >>>>>>> >>>>>>>> >>>>>>>> On Dec 22, 2015, at 4:50 AM, Doug Simon wrote: >>>>>>>> >>>>>>>> The effort of maintaining JVMCI across different JDK versions (including a potential backport to JDK7) is reduced by making JVMCI as small as possible. The support for command line options in JVMCI (based around the @Option annotation) is a good candidate for removal: >>>>>>>> >>>>>>>> 1. It?s almost entirely implemented on top of system properties and so can be made to work without VM support. >>>>>>>> 2. JVMCI itself only currently uses 3 options which can be replaced with usage of sun.misc.VM.getSavedProperty(). The latter ensures application code can?t override JVMCI properties set on the command line. >>>>>>>> >>>>>>>> This change removes the JVMCI command line option support. >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8146001 >>>>>>>> http://cr.openjdk.java.net/~dnsimon/8146001/ >>>>>>> >>>>>>> + private static final boolean TrustFinalDefaultFields = HotSpotJVMCIRuntime.getBooleanProperty(TrustFinalDefaultFieldsProperty, true); >>>>>>> >>>>>>> + private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); >>>>>>> >>>>>>> We should either use the jvmci. prefix or not. >>>>>> >>>>>> Sorry, I was reading the patch wrong. Of course both use the jvmci. prefix. >>>>> >>>>> I think we should prefix the property name in getBooleanProperty: >>>>> >>>>> + public static boolean getBooleanProperty(String name, boolean def) { >>>>> + String value = VM.getSavedProperty("jvmci." + name); >>>> >>>> Ok, sounds reasonable. >>>> >>>>> >>>>> and I put UseProfilingInformation back: >>>>> >>>>> diff -r 0fcfe4b07f7e src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >>>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Tue Dec 29 18:30:51 2015 +0100 >>>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 04 07:40:46 2016 -1000 >>>>> @@ -24,7 +24,6 @@ package jdk.vm.ci.hotspot; >>>>> >>>>> import static jdk.vm.ci.hotspot.CompilerToVM.compilerToVM; >>>>> import static jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime; >>>>> -import static jdk.vm.ci.hotspot.HotSpotResolvedJavaMethod.Options.UseProfilingInformation; >>>>> import static jdk.vm.ci.hotspot.HotSpotVMConfig.config; >>>>> import static jdk.vm.ci.hotspot.UnsafeAccess.UNSAFE; >>>>> >>>>> @@ -65,6 +64,11 @@ import jdk.vm.ci.meta.TriState; >>>>> final class HotSpotResolvedJavaMethodImpl extends HotSpotMethod implements HotSpotResolvedJavaMethod, HotSpotProxified, MetaspaceWrapperObject { >>>>> >>>>> /** >>>>> + * Whether to use profiling information. >>>>> + */ >>>>> + private static final boolean UseProfilingInformation = HotSpotJVMCIRuntime.getBooleanProperty("UseProfilingInformation", true); >>>>> + >>>>> + /** >>>>> * Reference to metaspace Method object. >>>>> */ >>>>> private final long metaspaceMethod; >>>>> @@ -424,7 +428,7 @@ final class HotSpotResolvedJavaMethodImp >>>>> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >>>>> ProfilingInfo info; >>>>> >>>>> - if (UseProfilingInformation.getValue() && methodData == null) { >>>>> + if (UseProfilingInformation && methodData == null) { >>>>> long metaspaceMethodData = UNSAFE.getAddress(metaspaceMethod + config().methodDataOffset); >>>>> if (metaspaceMethodData != 0) { >>>>> methodData = new HotSpotMethodData(metaspaceMethodData, this); >>>> >>>> JVMCI should unconditionally return available profiling information. It's up to the compiler whether or not to use it. For example, this is now compilation local in Graal: >>>> >>>> http://hg.openjdk.java.net/graal/graal-compiler/rev/f35e653aa876#l16.16 >>> >>> Oh, I missed that. Yes, that works for us as well. Thanks for pointing that out. >>> >>>> >>>> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Jan 11 18:35:10 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 08:35:10 -1000 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: <3EFCF17A-DCC3-4D7A-8D85-42A9C5C64A27@oracle.com> References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> <3EFCF17A-DCC3-4D7A-8D85-42A9C5C64A27@oracle.com> Message-ID: <1E212BB1-49D9-4DEE-A1AA-998D96D8ABB5@oracle.com> > On Jan 11, 2016, at 7:30 AM, Doug Simon wrote: > >> >> On 11 Jan 2016, at 18:23, Christian Thalinger wrote: >> >> >>> On Jan 11, 2016, at 3:18 AM, Doug Simon wrote: >>> >>> The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. >>> Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. >>> >>> This review is for changes that address the above issues as follows: >>> >>> 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. >> >> Aren?t blocking tasks selected before non-blocking tasks? > > Yes, exactly the opposite of what I said ;-) I?ve fixed the bug description and thankfully got the implementation the right way round. Then it looks good :-) > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Mon Jan 11 18:46:41 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 11 Jan 2016 10:46:41 -0800 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> Message-ID: Makes sense. Looks good to me. igor > On Jan 11, 2016, at 5:18 AM, Doug Simon wrote: > > The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. > Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. > > This review is for changes that address the above issues as follows: > > 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. > 2. A thread waiting for a compilation task to complete checks the state of the compiler thread periodically (500ms intervals). If 5 successive checks see a blocked thread, the compilation times out and the waiting thread is unblocked. > > https://bugs.openjdk.java.net/browse/JDK-8146705 > http://cr.openjdk.java.net/~dnsimon/8146705/ > > -Doug From christian.thalinger at oracle.com Mon Jan 11 19:15:27 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 09:15:27 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism Message-ID: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8146820 I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Fri Dec 18 20:23:28 2015 +0300 +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Mon Jan 11 09:12:48 2016 -1000 @@ -85,6 +85,21 @@ public final class HotSpotJVMCIRuntime i } /** + * Gets a String value based on a system property {@linkplain VM#getSavedProperty(String) saved} + * at system initialization time. The property name is prefixed with "{@code jvmci.}". + * + * @param name the name of the system property + * @param def the value to return if there is no system property corresponding to {@code name} + */ + public static String getProperty(String name, String def) { + String value = VM.getSavedProperty("jvmci." + name); + if (value == null) { + return def; + } + return value; + } + + /** * Gets a boolean value based on a system property {@linkplain VM#getSavedProperty(String) * saved} at system initialization time. The property name is prefixed with "{@code jvmci.}". * @@ -93,7 +108,7 @@ public final class HotSpotJVMCIRuntime i * @param def the value to return if there is no system property corresponding to {@code name} */ public static boolean getBooleanProperty(String name, boolean def) { - String value = VM.getSavedProperty("jvmci." + name); + String value = getProperty(name, null); if (value == null) { return def; } @@ -164,7 +179,7 @@ public final class HotSpotJVMCIRuntime i } metaAccessContext = context; - if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { + if (getBooleanProperty("printconfig", false)) { printConfig(config, compilerToVm); } diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Fri Dec 18 20:23:28 2015 +0300 +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 11 09:12:48 2016 -1000 @@ -417,7 +417,7 @@ final class HotSpotResolvedJavaMethodImp return false; } - private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); + private static final String TraceMethodDataFilter = HotSpotJVMCIRuntime.getProperty("TraceMethodDataFilter", null); @Override public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Fri Dec 18 20:23:28 2015 +0300 +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Mon Jan 11 09:12:48 2016 -1000 @@ -65,9 +65,11 @@ public final class InitTimer implements } /** - * Specifies if initialization timing is enabled. + * Specifies if initialization timing is enabled. Note: this property cannot use + * {@code HotSpotJVMCIRuntime.getBooleanProperty} since that class is not visible from this + * package. */ - private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); + private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer"); public static final AtomicInteger nesting = ENABLED ? new AtomicInteger() : null; public static final String SPACES = " "; From vladimir.kozlov at oracle.com Mon Jan 11 19:26:22 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Jan 2016 11:26:22 -0800 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5693B01D.60604@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> <5693B01D.60604@oracle.com> Message-ID: <569401DE.8000105@oracle.com> Don't use GC flags in the test. They will conflict with flags passed by testing infra and the test will fail. The was bug fixed by removing GC flags from all our tests. Note, Nightly testing does GC flags rotation so you don't need to do that. Otherwise looks good. Thanks, Vladimir On 1/11/16 5:37 AM, Zolt?n Maj? wrote: > Hi, > > > > On 01/11/2016 02:11 PM, Zolt?n Maj? wrote: >> [...] >> Yes, that is a good idea. I added a test that launches the VM with all flag combinations and also with different GCs. >> I did the same what the test does to reproduce the original failure. >> >> Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ > > The test contains and unnecessary @library tag and package import. The year in the copyright statement must be changed > as well (to 2016). > > Here is the webrev with those changes: > http://cr.openjdk.java.net/~zmajo/8086053/webrev.02/ > > Sorry for the noise. > > Thank you and best regards, > > > Zoltan > > >> >> The newly added test passes on all supported platforms. >> >> Thank you and best regards, >> >> >> Zoltan >> >>> >>> Best, >>> Tobias >>> >>> >>> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8086053. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>>> >>>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB >>>> disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. >>>> Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These >>>> inconsistencies lead to newly allocated regions not being filled with zeros. >>>> >>>> Solution: Address the following: >>>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly >>>> allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. >>>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB >>>> allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>>> >>>> Testing: >>>> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; >>>> - JPRT; >>>> - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> > From vladimir.kozlov at oracle.com Mon Jan 11 19:30:17 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Jan 2016 11:30:17 -0800 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <56937F1F.7010709@oracle.com> References: <568EB3A0.3040909@oracle.com> <56937F1F.7010709@oracle.com> Message-ID: <569402C9.5060305@oracle.com> Sounds good. Thanks, Vladimir On 1/11/16 2:08 AM, Tobias Hartmann wrote: > FYI, I had to merge with JDK-8143353 [1] (CosDNode and SinDNode were removed). > > This is the change I indent to push: > http://cr.openjdk.java.net/~thartmann/8146629/webrev.01/ > > Thanks, > Tobias > > [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/13b04370e8e9 > > On 07.01.2016 19:51, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8146629 >> http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ >> >> Currently, there is no way to determine in Node::Identity() and Node::Value() if we were called from GVN or IGVN but sometimes we would like to do optimizations based on this information (for example, see discussion in RFR for JDK-8136469 [1]). I changed the arguments of Node::Identity() and Node::Value() from PhaseTransform* to PhaseGVN*. Like this, we can simply call PhaseValues::is_IterGVN() from both methods. >> >> Thanks, >> Tobias >> >> [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html >> From vladimir.kozlov at oracle.com Mon Jan 11 19:50:17 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Jan 2016 11:50:17 -0800 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> Message-ID: <56940779.8070804@oracle.com> What is naming convention for properties? Do we have somewhere list of all JVMCI properties we accept? May be we should add it. All JVMCI properties names should be consistent whatever you choose. 'inittimer' is also lowcased. Thanks, Vladimir On 1/11/16 11:15 AM, Christian Thalinger wrote: > https://bugs.openjdk.java.net/browse/JDK-8146820 > > I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? > > diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java > --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Fri Dec 18 20:23:28 2015 +0300 > +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Mon Jan 11 09:12:48 2016 -1000 > @@ -85,6 +85,21 @@ public final class HotSpotJVMCIRuntime i > } > > /** > + * Gets a String value based on a system property {@linkplain VM#getSavedProperty(String) saved} > + * at system initialization time. The property name is prefixed with "{@code jvmci.}". > + * > + * @param name the name of the system property > + * @param def the value to return if there is no system property corresponding to {@code name} > + */ > + public static String getProperty(String name, String def) { > + String value = VM.getSavedProperty("jvmci." + name); > + if (value == null) { > + return def; > + } > + return value; > + } > + > + /** > * Gets a boolean value based on a system property {@linkplain VM#getSavedProperty(String) > * saved} at system initialization time. The property name is prefixed with "{@code jvmci.}". > * > @@ -93,7 +108,7 @@ public final class HotSpotJVMCIRuntime i > * @param def the value to return if there is no system property corresponding to {@code name} > */ > public static boolean getBooleanProperty(String name, boolean def) { > - String value = VM.getSavedProperty("jvmci." + name); > + String value = getProperty(name, null); > if (value == null) { > return def; > } > @@ -164,7 +179,7 @@ public final class HotSpotJVMCIRuntime i > } > metaAccessContext = context; > > - if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { > + if (getBooleanProperty("printconfig", false)) { > printConfig(config, compilerToVm); > } > > diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java > --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Fri Dec 18 20:23:28 2015 +0300 > +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 11 09:12:48 2016 -1000 > @@ -417,7 +417,7 @@ final class HotSpotResolvedJavaMethodImp > return false; > } > > - private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); > + private static final String TraceMethodDataFilter = HotSpotJVMCIRuntime.getProperty("TraceMethodDataFilter", null); > > @Override > public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { > diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java > --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Fri Dec 18 20:23:28 2015 +0300 > +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Mon Jan 11 09:12:48 2016 -1000 > @@ -65,9 +65,11 @@ public final class InitTimer implements > } > > /** > - * Specifies if initialization timing is enabled. > + * Specifies if initialization timing is enabled. Note: this property cannot use > + * {@code HotSpotJVMCIRuntime.getBooleanProperty} since that class is not visible from this > + * package. > */ > - private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); > + private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer"); > > public static final AtomicInteger nesting = ENABLED ? new AtomicInteger() : null; > public static final String SPACES = " "; > From christian.thalinger at oracle.com Mon Jan 11 19:55:47 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 09:55:47 -1000 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: <1E212BB1-49D9-4DEE-A1AA-998D96D8ABB5@oracle.com> References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> <3EFCF17A-DCC3-4D7A-8D85-42A9C5C64A27@oracle.com> <1E212BB1-49D9-4DEE-A1AA-998D96D8ABB5@oracle.com> Message-ID: <705F08CD-332B-4D25-B352-4FC237D6E6BC@oracle.com> ?or not: /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp: In member function 'virtual CompileTask* AdvancedThresholdPolicy::select_task(CompileQueue*)': /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:201:9: error: 'UseJVMCICompiler' was not declared in this scope if (UseJVMCICompiler && task->is_blocking()) { ^ /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:202:11: error: 'max_blocking_task' was not declared in this scope if (max_blocking_task == NULL || compare_methods(method, max_blocking_task->method())) { ^ > On Jan 11, 2016, at 8:35 AM, Christian Thalinger wrote: > >> >> On Jan 11, 2016, at 7:30 AM, Doug Simon > wrote: >> >>> >>> On 11 Jan 2016, at 18:23, Christian Thalinger > wrote: >>> >>> >>>> On Jan 11, 2016, at 3:18 AM, Doug Simon > wrote: >>>> >>>> The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. >>>> Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. >>>> >>>> This review is for changes that address the above issues as follows: >>>> >>>> 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. >>> >>> Aren?t blocking tasks selected before non-blocking tasks? >> >> Yes, exactly the opposite of what I said ;-) I?ve fixed the bug description and thankfully got the implementation the right way round. > > Then it looks good :-) > >> >> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Mon Jan 11 19:57:05 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Jan 2016 20:57:05 +0100 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> Message-ID: <7C710B2B-BC2A-4D86-AD0C-608F4E057051@oracle.com> > On 11 Jan 2016, at 20:15, Christian Thalinger wrote: > > https://bugs.openjdk.java.net/browse/JDK-8146820 > > I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? Yes. You should also do the same for jvmci.inittimer (i.e. jvmci.InitTimer). -Doug From christian.thalinger at oracle.com Mon Jan 11 19:57:39 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 09:57:39 -1000 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: <705F08CD-332B-4D25-B352-4FC237D6E6BC@oracle.com> References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> <3EFCF17A-DCC3-4D7A-8D85-42A9C5C64A27@oracle.com> <1E212BB1-49D9-4DEE-A1AA-998D96D8ABB5@oracle.com> <705F08CD-332B-4D25-B352-4FC237D6E6BC@oracle.com> Message-ID: <4E408C1B-8F4C-4763-9A18-0AE6885F8A9E@oracle.com> It?s #ifdef vs. #if: +#ifdef INCLUDE_JVMCI I?ll fix it. > On Jan 11, 2016, at 9:55 AM, Christian Thalinger wrote: > > ?or not: > > /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp: In member function 'virtual CompileTask* AdvancedThresholdPolicy::select_task(CompileQueue*)': > /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:201:9: error: 'UseJVMCICompiler' was not declared in this scope > if (UseJVMCICompiler && task->is_blocking()) { > ^ > /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:202:11: error: 'max_blocking_task' was not declared in this scope > if (max_blocking_task == NULL || compare_methods(method, max_blocking_task->method())) { > ^ > >> On Jan 11, 2016, at 8:35 AM, Christian Thalinger > wrote: >> >>> >>> On Jan 11, 2016, at 7:30 AM, Doug Simon > wrote: >>> >>>> >>>> On 11 Jan 2016, at 18:23, Christian Thalinger > wrote: >>>> >>>> >>>>> On Jan 11, 2016, at 3:18 AM, Doug Simon > wrote: >>>>> >>>>> The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. >>>>> Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. >>>>> >>>>> This review is for changes that address the above issues as follows: >>>>> >>>>> 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. >>>> >>>> Aren?t blocking tasks selected before non-blocking tasks? >>> >>> Yes, exactly the opposite of what I said ;-) I?ve fixed the bug description and thankfully got the implementation the right way round. >> >> Then it looks good :-) >> >>> >>> -Doug > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Jan 11 20:00:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Jan 2016 12:00:05 -0800 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <5693C83F.9030100@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> Message-ID: <569409C5.2040805@oracle.com> On 1/11/16 7:20 AM, Tobias Hartmann wrote: > > On 08.01.2016 20:41, Vladimir Kozlov wrote: >> On 1/8/16 2:37 AM, Tobias Hartmann wrote: >>> >>> On 07.01.2016 21:49, Vladimir Kozlov wrote: >>>> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>>>> Hi Vladimir, >>>>> >>>>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>>>> Andrew is right. >>>>> >>>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >>>> >>>> Right. It was the root of this bug, see below. >>>> >>>>> >>>>> I fixed this for the inflate and compress intrinsics. >>>>> >>>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>>>> StrInflatedCopyNode is not memory node. >>>>> >>>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >>>> >>>> I did not get the question. Is it before your webrev.01 change? Or even with the change? >>> >>> I meant with webrev.01 but you answered my question below. >>> >>>>> // This class defines a projection of the memory state of a store conditional node. >>>>> // These nodes return a value, but also update memory. >>>>> >>>>> But inflate does not return any value. >>>> >>>> Hmm, according to bottom type inflate produce memory: >>>> >>>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >>>> >>>> So it really does not need SCMemProjNode. Sorry about that. >>>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. >>> >>> Exactly. >>> >>>> Instead of SCMemProjNode we should have to change the idx of your dst_type: >>>> >>>> set_memory(str, dst_type); >>> >>> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. >>> >>>> And you should rollback part of changes in escape.cpp and macro.cpp. >>> >>> Okay, I'll to that. >>> >>>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >>>> >>>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. >>> >>> Okay, should we then use BOTTOM for both the input and output type? >> >> Only input. Output type corresponds to dst array type which you set correctly now. > > It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case: > StoreC > inflate_string > LoadC > > The memory graph (def->use) now looks like this: > LoadC -> inflate_string -> ByteMem > ... StoreC-> CharMem I did not get this. If StoreC node is created before inflate_string - inflate_string should point to it be barrier for LoadC. If StoreC followed inflate_string and LoadC followed StoreC - LoadC should point to StoreC. If LoadC does not follow StoreC then result is relaxed. Thanks, Vladimir > > > The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2'). > > Setting the input to BOTTOM, generates the following graph: > http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png > The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails. > > I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this: > LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem)) > http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png > > Here is the new webrev: > http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/ > > Probably, we could also only capture the byte and char slices instead of merging everything. What do you think? > > Best, > Tobias > >>>>> Related question: >>>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>>>> char[int:>=0]:exact+any * >>>>> >>>>> which is equal to the type of the char load. >>>> >>>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? >>> >>> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: >>> >>> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) >>> // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) >>> // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) >>> // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) >>> // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>> >>> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. >>> >>>> Should we also be more careful in inflate_string_slow()? Is it used? >>> >>> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >>> >>>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>>>> char[int:1]:NotNull:exact * >>>>> >>>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >>>> >>>> It is indeed strange. What memory type of LoadUS? It could be bug. >>> >>> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. >>> >>> I will look into this again and try to understand what happens. >> >> It could that aryptr is pointer to array and load type is pointer to array's element. >> >> Thanks, >> Vladimir >> >>> >>> Thanks, >>> Tobias >>> >>>>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>>>> >>>>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>>>> >>>>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>>>> // compiler because the CPU does all the ordering for us. >>>>>>>> >>>>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>>>> memory ordering, loads are never moved before dependent stores. >>>>>>>> >>>>>>>> Or did I misunderstand your question? >>>>>>> >>>>>>> No, I don't think so. I was just checking: I am very aware that >>>>>>> HotSpot has presented those of use with relaxed memory order machines >>>>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>>>> surprised that C2 needs this barrier, given that there is a >>>>>>> read-after-write dependency, but never mind. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Andrew. >>>>>>> From doug.simon at oracle.com Mon Jan 11 20:14:00 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Jan 2016 21:14:00 +0100 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <56940779.8070804@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> Message-ID: > On 11 Jan 2016, at 20:50, Vladimir Kozlov wrote: > > What is naming convention for properties? > Do we have somewhere list of all JVMCI properties we accept? May be we should add it. Currently, there is no list of accepted JVMCI properties. Once Chris applies the changes below such that all system property access (apart from jvmci.InitTimer) goes through HotSpotJVMCIRuntime.getProperty(), then the javadoc of that method could contain the list (much like System.getProperties describes the supported standard properties). > All JVMCI properties names should be consistent whatever you choose. I agree. -Doug > > 'inittimer' is also lowcased. > > Thanks, > Vladimir > > On 1/11/16 11:15 AM, Christian Thalinger wrote: >> https://bugs.openjdk.java.net/browse/JDK-8146820 >> >> I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? >> >> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java >> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Fri Dec 18 20:23:28 2015 +0300 >> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Mon Jan 11 09:12:48 2016 -1000 >> @@ -85,6 +85,21 @@ public final class HotSpotJVMCIRuntime i >> } >> >> /** >> + * Gets a String value based on a system property {@linkplain VM#getSavedProperty(String) saved} >> + * at system initialization time. The property name is prefixed with "{@code jvmci.}". >> + * >> + * @param name the name of the system property >> + * @param def the value to return if there is no system property corresponding to {@code name} >> + */ >> + public static String getProperty(String name, String def) { >> + String value = VM.getSavedProperty("jvmci." + name); >> + if (value == null) { >> + return def; >> + } >> + return value; >> + } >> + >> + /** >> * Gets a boolean value based on a system property {@linkplain VM#getSavedProperty(String) >> * saved} at system initialization time. The property name is prefixed with "{@code jvmci.}". >> * >> @@ -93,7 +108,7 @@ public final class HotSpotJVMCIRuntime i >> * @param def the value to return if there is no system property corresponding to {@code name} >> */ >> public static boolean getBooleanProperty(String name, boolean def) { >> - String value = VM.getSavedProperty("jvmci." + name); >> + String value = getProperty(name, null); >> if (value == null) { >> return def; >> } >> @@ -164,7 +179,7 @@ public final class HotSpotJVMCIRuntime i >> } >> metaAccessContext = context; >> >> - if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { >> + if (getBooleanProperty("printconfig", false)) { >> printConfig(config, compilerToVm); >> } >> >> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Fri Dec 18 20:23:28 2015 +0300 >> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 11 09:12:48 2016 -1000 >> @@ -417,7 +417,7 @@ final class HotSpotResolvedJavaMethodImp >> return false; >> } >> >> - private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); >> + private static final String TraceMethodDataFilter = HotSpotJVMCIRuntime.getProperty("TraceMethodDataFilter", null); >> >> @Override >> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java >> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Fri Dec 18 20:23:28 2015 +0300 >> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Mon Jan 11 09:12:48 2016 -1000 >> @@ -65,9 +65,11 @@ public final class InitTimer implements >> } >> >> /** >> - * Specifies if initialization timing is enabled. >> + * Specifies if initialization timing is enabled. Note: this property cannot use >> + * {@code HotSpotJVMCIRuntime.getBooleanProperty} since that class is not visible from this >> + * package. >> */ >> - private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); >> + private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer"); >> >> public static final AtomicInteger nesting = ENABLED ? new AtomicInteger() : null; >> public static final String SPACES = " "; >> From doug.simon at oracle.com Mon Jan 11 20:15:54 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Jan 2016 21:15:54 +0100 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: <4E408C1B-8F4C-4763-9A18-0AE6885F8A9E@oracle.com> References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> <3EFCF17A-DCC3-4D7A-8D85-42A9C5C64A27@oracle.com> <1E212BB1-49D9-4DEE-A1AA-998D96D8ABB5@oracle.com> <705F08CD-332B-4D25-B352-4FC237D6E6BC@oracle.com> <4E408C1B-8F4C-4763-9A18-0AE6885F8A9E@oracle.com> Message-ID: <444F8F02-2BEB-43DA-BB40-F421010E366B@oracle.com> > On 11 Jan 2016, at 20:57, Christian Thalinger wrote: > > It?s #ifdef vs. #if: > > +#ifdef INCLUDE_JVMCI No, I think it?s the other way around (i.e. #if instead of #ifdef) judging by the rest of the code guarded by this macro. -Doug > > I?ll fix it. > >> On Jan 11, 2016, at 9:55 AM, Christian Thalinger wrote: >> >> ?or not: >> >> /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp: In member function 'virtual CompileTask* AdvancedThresholdPolicy::select_task(CompileQueue*)': >> /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:201:9: error: 'UseJVMCICompiler' was not declared in this scope >> if (UseJVMCICompiler && task->is_blocking()) { >> ^ >> /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:202:11: error: 'max_blocking_task' was not declared in this scope >> if (max_blocking_task == NULL || compare_methods(method, max_blocking_task->method())) { >> ^ >> >> >>> On Jan 11, 2016, at 8:35 AM, Christian Thalinger wrote: >>> >>>> >>>> On Jan 11, 2016, at 7:30 AM, Doug Simon wrote: >>>> >>>>> >>>>> On 11 Jan 2016, at 18:23, Christian Thalinger wrote: >>>>> >>>>> >>>>>> On Jan 11, 2016, at 3:18 AM, Doug Simon wrote: >>>>>> >>>>>> The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. >>>>>> Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. >>>>>> >>>>>> This review is for changes that address the above issues as follows: >>>>>> >>>>>> 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. >>>>> >>>>> Aren?t blocking tasks selected before non-blocking tasks? >>>> >>>> Yes, exactly the opposite of what I said ;-) I?ve fixed the bug description and thankfully got the implementation the right way round. >>> >>> Then it looks good :-) >>> >>>> >>>> -Doug >> > From christian.thalinger at oracle.com Mon Jan 11 21:08:19 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 11:08:19 -1000 Subject: RFR: 8146705: Improve JVMCI support for blocking compilation In-Reply-To: <444F8F02-2BEB-43DA-BB40-F421010E366B@oracle.com> References: <41AAC503-ADDA-42DD-B338-CD29626AC132@oracle.com> <3EFCF17A-DCC3-4D7A-8D85-42A9C5C64A27@oracle.com> <1E212BB1-49D9-4DEE-A1AA-998D96D8ABB5@oracle.com> <705F08CD-332B-4D25-B352-4FC237D6E6BC@oracle.com> <4E408C1B-8F4C-4763-9A18-0AE6885F8A9E@oracle.com> <444F8F02-2BEB-43DA-BB40-F421010E366B@oracle.com> Message-ID: > On Jan 11, 2016, at 10:15 AM, Doug Simon wrote: > > >> On 11 Jan 2016, at 20:57, Christian Thalinger wrote: >> >> It?s #ifdef vs. #if: >> >> +#ifdef INCLUDE_JVMCI > > No, I think it?s the other way around (i.e. #if instead of #ifdef) judging by the rest of the code guarded by this macro. Sorry, that?s what I meant; it should be #if. > > -Doug > >> >> I?ll fix it. >> >>> On Jan 11, 2016, at 9:55 AM, Christian Thalinger wrote: >>> >>> ?or not: >>> >>> /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp: In member function 'virtual CompileTask* AdvancedThresholdPolicy::select_task(CompileQueue*)': >>> /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:201:9: error: 'UseJVMCICompiler' was not declared in this scope >>> if (UseJVMCICompiler && task->is_blocking()) { >>> ^ >>> /scratch/jprt/T/P1/192954.cthaling/s/hotspot/src/share/vm/runtime/advancedThresholdPolicy.cpp:202:11: error: 'max_blocking_task' was not declared in this scope >>> if (max_blocking_task == NULL || compare_methods(method, max_blocking_task->method())) { >>> ^ >>> >>> >>>> On Jan 11, 2016, at 8:35 AM, Christian Thalinger wrote: >>>> >>>>> >>>>> On Jan 11, 2016, at 7:30 AM, Doug Simon wrote: >>>>> >>>>>> >>>>>> On 11 Jan 2016, at 18:23, Christian Thalinger wrote: >>>>>> >>>>>> >>>>>>> On Jan 11, 2016, at 3:18 AM, Doug Simon wrote: >>>>>>> >>>>>>> The CompileBroker currently uses a simple timeout of 1 second when waiting for a blocking JVMCI compilation to complete. This approach is too simple. JVMCI compiler threads themselves flood the compilation queues with compilation requests; such compilations cannot be blocking (the JVMCI compiler can easily cause the system to deadlock). This flooding means that application submitted tasks often timeout before the tasks even start compiling. >>>>>>> Once a JVMCI thread starts compiling a task, there is still the risk of it deadlocking. The current timeout mechanism needs to be augmented with a test of the compiler thread's state. As long as it's not blocked for too long, we know the compiler is making progress and will eventually complete. >>>>>>> >>>>>>> This review is for changes that address the above issues as follows: >>>>>>> >>>>>>> 1. Non-blocking tasks are selected before blocking tasks from the compilation queue. >>>>>> >>>>>> Aren?t blocking tasks selected before non-blocking tasks? >>>>> >>>>> Yes, exactly the opposite of what I said ;-) I?ve fixed the bug description and thankfully got the implementation the right way round. >>>> >>>> Then it looks good :-) >>>> >>>>> >>>>> -Doug >>> >> > From doug.simon at oracle.com Mon Jan 11 22:43:24 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 11 Jan 2016 23:43:24 +0100 Subject: RFR: 8146364: Remove @ServiceProvider mechanism from JVMCI Message-ID: Hi, Please review these changes for removing the mechanism in JVMCI for automating the generation of files in META-INF/services for service providers annotated with @ServiceProvider. https://bugs.openjdk.java.net/browse/JDK-8146364 http://cr.openjdk.java.net/~dnsimon/8146364/jdk9/ http://cr.openjdk.java.net/~dnsimon/8146364/hotspot/ -Doug From vladimir.kozlov at oracle.com Tue Jan 12 00:43:49 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 11 Jan 2016 16:43:49 -0800 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph In-Reply-To: References: Message-ID: <56944C45.6060307@oracle.com> Now I think I understand. Note, there should be NO any control between loop's head and predicate check. I assume CastPP is attached to it because its original check was removed by dominated similar check (for example NULL check). I think it is safe to move CastPP above original dummy predicate checks (one or two checks if there is loop limit checks) since Cast PP should not depend on them. It will solve the problem since moved check(new predicate) is always inserted before original dummy predicate (which will be removed later). Thanks, Vladimir On 1/11/16 7:07 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8146792/webrev.00/ > > - partial peeling is applied to a loop > - the peeled section is optimized and leaves a pinned node between the loop predicates and the loop body but no control flow > - loop predicates are applied and a predicate that depends on the pinned node is moved out of the loop, before the pinned node, leading to a broken graph > > This is the same issue that came up during review of 8139771. Vladimir suggested it gets reviewed separately. With the included test case it reproduces without the change from 8139771. > > Roland. > From christian.thalinger at oracle.com Mon Jan 11 22:48:56 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 12:48:56 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <7C710B2B-BC2A-4D86-AD0C-608F4E057051@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <7C710B2B-BC2A-4D86-AD0C-608F4E057051@oracle.com> Message-ID: <538AF03D-B463-4E34-BBB4-6ED53A232DBD@oracle.com> > On Jan 11, 2016, at 9:57 AM, Doug Simon wrote: > > >> On 11 Jan 2016, at 20:15, Christian Thalinger wrote: >> >> https://bugs.openjdk.java.net/browse/JDK-8146820 >> >> I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? > > Yes. You should also do the same for jvmci.inittimer (i.e. jvmci.InitTimer). Ok > > -Doug From christian.thalinger at oracle.com Mon Jan 11 22:51:04 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 12:51:04 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> Message-ID: <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> > On Jan 11, 2016, at 10:14 AM, Doug Simon wrote: > > >> On 11 Jan 2016, at 20:50, Vladimir Kozlov wrote: >> >> What is naming convention for properties? >> Do we have somewhere list of all JVMCI properties we accept? May be we should add it. > > Currently, there is no list of accepted JVMCI properties. Once Chris applies the changes below such that all system property access (apart from jvmci.InitTimer) goes through HotSpotJVMCIRuntime.getProperty(), then the javadoc of that method could contain the list (much like System.getProperties describes the supported standard properties). Good idea. > >> All JVMCI properties names should be consistent whatever you choose. > > I agree. Yes. They should feel like our other command line options so camel-case is what I had in mind. > > -Doug > >> >> 'inittimer' is also lowcased. >> >> Thanks, >> Vladimir >> >> On 1/11/16 11:15 AM, Christian Thalinger wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8146820 >>> >>> I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? >>> >>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java >>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Fri Dec 18 20:23:28 2015 +0300 >>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Mon Jan 11 09:12:48 2016 -1000 >>> @@ -85,6 +85,21 @@ public final class HotSpotJVMCIRuntime i >>> } >>> >>> /** >>> + * Gets a String value based on a system property {@linkplain VM#getSavedProperty(String) saved} >>> + * at system initialization time. The property name is prefixed with "{@code jvmci.}". >>> + * >>> + * @param name the name of the system property >>> + * @param def the value to return if there is no system property corresponding to {@code name} >>> + */ >>> + public static String getProperty(String name, String def) { >>> + String value = VM.getSavedProperty("jvmci." + name); >>> + if (value == null) { >>> + return def; >>> + } >>> + return value; >>> + } >>> + >>> + /** >>> * Gets a boolean value based on a system property {@linkplain VM#getSavedProperty(String) >>> * saved} at system initialization time. The property name is prefixed with "{@code jvmci.}". >>> * >>> @@ -93,7 +108,7 @@ public final class HotSpotJVMCIRuntime i >>> * @param def the value to return if there is no system property corresponding to {@code name} >>> */ >>> public static boolean getBooleanProperty(String name, boolean def) { >>> - String value = VM.getSavedProperty("jvmci." + name); >>> + String value = getProperty(name, null); >>> if (value == null) { >>> return def; >>> } >>> @@ -164,7 +179,7 @@ public final class HotSpotJVMCIRuntime i >>> } >>> metaAccessContext = context; >>> >>> - if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { >>> + if (getBooleanProperty("printconfig", false)) { >>> printConfig(config, compilerToVm); >>> } >>> >>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Fri Dec 18 20:23:28 2015 +0300 >>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 11 09:12:48 2016 -1000 >>> @@ -417,7 +417,7 @@ final class HotSpotResolvedJavaMethodImp >>> return false; >>> } >>> >>> - private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); >>> + private static final String TraceMethodDataFilter = HotSpotJVMCIRuntime.getProperty("TraceMethodDataFilter", null); >>> >>> @Override >>> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java >>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Fri Dec 18 20:23:28 2015 +0300 >>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Mon Jan 11 09:12:48 2016 -1000 >>> @@ -65,9 +65,11 @@ public final class InitTimer implements >>> } >>> >>> /** >>> - * Specifies if initialization timing is enabled. >>> + * Specifies if initialization timing is enabled. Note: this property cannot use >>> + * {@code HotSpotJVMCIRuntime.getBooleanProperty} since that class is not visible from this >>> + * package. >>> */ >>> - private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); >>> + private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer"); >>> >>> public static final AtomicInteger nesting = ENABLED ? new AtomicInteger() : null; >>> public static final String SPACES = " "; >>> > From christian.thalinger at oracle.com Tue Jan 12 01:35:28 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Jan 2016 15:35:28 -1000 Subject: RFR: 8146364: Remove @ServiceProvider mechanism from JVMCI In-Reply-To: References: Message-ID: > On Jan 11, 2016, at 12:43 PM, Doug Simon wrote: > > Hi, > > Please review these changes for removing the mechanism in JVMCI for automating the generation of files in META-INF/services for service providers annotated with @ServiceProvider. Did you try this with a regular JDK 9 build? I don?t think it works to have the same META-INF file in different locations: src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.amd64/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.sparc/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory They will overwrite each other when being installed into the image. > > https://bugs.openjdk.java.net/browse/JDK-8146364 > > http://cr.openjdk.java.net/~dnsimon/8146364/jdk9/ > http://cr.openjdk.java.net/~dnsimon/8146364/hotspot/ > > -Doug From zoltan.majo at oracle.com Tue Jan 12 07:56:08 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 12 Jan 2016 08:56:08 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5693AEC2.70409@redhat.com> References: <568F9852.4090806@oracle.com> <56902043.1040409@oracle.com> <5690E392.9060704@redhat.com> <5693AB1B.7090909@oracle.com> <5693AEC2.70409@redhat.com> Message-ID: <5694B198.8060902@oracle.com> Hi, On 01/11/2016 02:31 PM, Andrew Haley wrote: > [...] >>> I'm wondering >>> if maybe we could have some sort of way to flag such changes for >>> maintainers of those ports. Otherwise it's just luck that I notice >>> the bug going past. >> Maybe we could define a new JIRA label for this purpose. What do you >> think about that? > That sounds like it might work. > >> Also, we might need a way to signal the need to propagate changes into >> the opposite direction (i.e., from ppc/aarch64 to the other supported >> platforms). > Maybe so. That hasn't happened yet, though. The symmetry appeals to me. OK, I'll look into this and inform you once I've figured out how to proceed. Best wishes, Zoltan > > Andrew. > > From zoltan.majo at oracle.com Tue Jan 12 08:00:34 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 12 Jan 2016 09:00:34 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <569401DE.8000105@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> <5693B01D.60604@oracle.com> <569401DE.8000105@oracle.com> Message-ID: <5694B2A2.7010905@oracle.com> Hi Vladimir, On 01/11/2016 08:26 PM, Vladimir Kozlov wrote: > Don't use GC flags in the test. They will conflict with flags passed > by testing infra and the test will fail. The was bug fixed by removing > GC flags from all our tests. > Note, Nightly testing does GC flags rotation so you don't need to do > that. OK, I removed all GC flags from the test. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8086053/webrev.03/ > Otherwise looks good. Thank you for the review! I'll push webrev.03 today as it addresses all issues that were brought up. Thank you and best regards, Zoltan > > Thanks, > Vladimir > > On 1/11/16 5:37 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> >> On 01/11/2016 02:11 PM, Zolt?n Maj? wrote: >>> [...] >>> Yes, that is a good idea. I added a test that launches the VM with >>> all flag combinations and also with different GCs. >>> I did the same what the test does to reproduce the original failure. >>> >>> Here is the updated webrev: >>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ >> >> The test contains and unnecessary @library tag and package import. >> The year in the copyright statement must be changed >> as well (to 2016). >> >> Here is the webrev with those changes: >> http://cr.openjdk.java.net/~zmajo/8086053/webrev.02/ >> >> Sorry for the noise. >> >> Thank you and best regards, >> >> >> Zoltan >> >> >>> >>> The newly added test passes on all supported platforms. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> >>>> >>>> Best, >>>> Tobias >>>> >>>> >>>> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>>>> Hi, >>>>> >>>>> >>>>> please review the patch for 8086053. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>>>> >>>>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill >>>>> newly allocated TLAB regions. With ZeroTLAB >>>>> disabled, the interpreter and compiled code should assume the >>>>> responsibility to zero-fill newly allocated regions. >>>>> Currently, the handling of the ZeroTLAB flag shows some >>>>> inconsistencies between the GC and the compilers. These >>>>> inconsistencies lead to newly allocated regions not being filled >>>>> with zeros. >>>>> >>>>> Solution: Address the following: >>>>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB >>>>> without notifying the GC. As a result, the newly >>>>> allocated TLAB is not initialized with zero. Add TLAB >>>>> initialization code to C1. >>>>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of >>>>> newly allocated objects/arrays even if TLAB >>>>> allocation is disabled. Add stricter conditions to C2 on when to >>>>> skip filling objects/arrays with zero. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>>>> >>>>> Testing: >>>>> - local testing (linux_x86_64) of failing test case with >>>>> -XX:+UseG1GC and -XX:+UseSerialGC; >>>>> - JPRT; >>>>> - all hotspot tests on all platforms affected by the change using >>>>> all combinations of +/-UseTLAB and +/-ZeroTLAB. >>>>> >>>>> Thank you and best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>> >> From tobias.hartmann at oracle.com Tue Jan 12 08:20:26 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Jan 2016 09:20:26 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5694B2A2.7010905@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> <5693B01D.60604@oracle.com> <569401DE.8000105@oracle.com> <5694B2A2.7010905@oracle.com> Message-ID: <5694B74A.20100@oracle.com> Hi Zoltan, looks good to me. Best, Tobias On 12.01.2016 09:00, Zolt?n Maj? wrote: > Hi Vladimir, > > > On 01/11/2016 08:26 PM, Vladimir Kozlov wrote: >> Don't use GC flags in the test. They will conflict with flags passed by testing infra and the test will fail. The was bug fixed by removing GC flags from all our tests. >> Note, Nightly testing does GC flags rotation so you don't need to do that. > > OK, I removed all GC flags from the test. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8086053/webrev.03/ > >> Otherwise looks good. > > Thank you for the review! I'll push webrev.03 today as it addresses all issues that were brought up. > > Thank you and best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 1/11/16 5:37 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> >>> On 01/11/2016 02:11 PM, Zolt?n Maj? wrote: >>>> [...] >>>> Yes, that is a good idea. I added a test that launches the VM with all flag combinations and also with different GCs. >>>> I did the same what the test does to reproduce the original failure. >>>> >>>> Here is the updated webrev: >>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ >>> >>> The test contains and unnecessary @library tag and package import. The year in the copyright statement must be changed >>> as well (to 2016). >>> >>> Here is the webrev with those changes: >>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.02/ >>> >>> Sorry for the noise. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> >>> >>>> >>>> The newly added test passes on all supported platforms. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >>>>> >>>>> Best, >>>>> Tobias >>>>> >>>>> >>>>> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>>>>> Hi, >>>>>> >>>>>> >>>>>> please review the patch for 8086053. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>>>>> >>>>>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB >>>>>> disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. >>>>>> Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These >>>>>> inconsistencies lead to newly allocated regions not being filled with zeros. >>>>>> >>>>>> Solution: Address the following: >>>>>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly >>>>>> allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. >>>>>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB >>>>>> allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>>>>> >>>>>> Testing: >>>>>> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; >>>>>> - JPRT; >>>>>> - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. >>>>>> >>>>>> Thank you and best regards, >>>>>> >>>>>> >>>>>> Zoltan >>>>>> >>>> >>> > From zoltan.majo at oracle.com Tue Jan 12 08:21:32 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 12 Jan 2016 09:21:32 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5694B74A.20100@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> <5693B01D.60604@oracle.com> <569401DE.8000105@oracle.com> <5694B2A2.7010905@oracle.com> <5694B74A.20100@oracle.com> Message-ID: <5694B78C.2030802@oracle.com> On 01/12/2016 09:20 AM, Tobias Hartmann wrote: > Hi Zoltan, > > looks good to me. Thank you, Tobias! Best regards, Zoltan > > Best, > Tobias > > On 12.01.2016 09:00, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> On 01/11/2016 08:26 PM, Vladimir Kozlov wrote: >>> Don't use GC flags in the test. They will conflict with flags passed by testing infra and the test will fail. The was bug fixed by removing GC flags from all our tests. >>> Note, Nightly testing does GC flags rotation so you don't need to do that. >> OK, I removed all GC flags from the test. Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8086053/webrev.03/ >> >>> Otherwise looks good. >> Thank you for the review! I'll push webrev.03 today as it addresses all issues that were brought up. >> >> Thank you and best regards, >> >> >> Zoltan >> >>> Thanks, >>> Vladimir >>> >>> On 1/11/16 5:37 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> >>>> On 01/11/2016 02:11 PM, Zolt?n Maj? wrote: >>>>> [...] >>>>> Yes, that is a good idea. I added a test that launches the VM with all flag combinations and also with different GCs. >>>>> I did the same what the test does to reproduce the original failure. >>>>> >>>>> Here is the updated webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ >>>> The test contains and unnecessary @library tag and package import. The year in the copyright statement must be changed >>>> as well (to 2016). >>>> >>>> Here is the webrev with those changes: >>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.02/ >>>> >>>> Sorry for the noise. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >>>> >>>>> The newly added test passes on all supported platforms. >>>>> >>>>> Thank you and best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>>>>> Best, >>>>>> Tobias >>>>>> >>>>>> >>>>>> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> please review the patch for 8086053. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>>>>>> >>>>>>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB >>>>>>> disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. >>>>>>> Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These >>>>>>> inconsistencies lead to newly allocated regions not being filled with zeros. >>>>>>> >>>>>>> Solution: Address the following: >>>>>>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly >>>>>>> allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. >>>>>>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB >>>>>>> allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. >>>>>>> >>>>>>> Webrev: >>>>>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>>>>>> >>>>>>> Testing: >>>>>>> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; >>>>>>> - JPRT; >>>>>>> - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. >>>>>>> >>>>>>> Thank you and best regards, >>>>>>> >>>>>>> >>>>>>> Zoltan >>>>>>> From tobias.hartmann at oracle.com Tue Jan 12 08:39:02 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Jan 2016 09:39:02 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <569402C9.5060305@oracle.com> References: <568EB3A0.3040909@oracle.com> <56937F1F.7010709@oracle.com> <569402C9.5060305@oracle.com> Message-ID: <5694BBA6.4040301@oracle.com> I had to merge again with JDK-8139771 (castnode.cpp/hpp): http://cr.openjdk.java.net/~thartmann/8146629/webrev.02/ Thanks, Tobias On 11.01.2016 20:30, Vladimir Kozlov wrote: > Sounds good. > > Thanks, > Vladimir > > On 1/11/16 2:08 AM, Tobias Hartmann wrote: >> FYI, I had to merge with JDK-8143353 [1] (CosDNode and SinDNode were removed). >> >> This is the change I indent to push: >> http://cr.openjdk.java.net/~thartmann/8146629/webrev.01/ >> >> Thanks, >> Tobias >> >> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/13b04370e8e9 >> >> On 07.01.2016 19:51, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8146629 >>> http://cr.openjdk.java.net/~thartmann/8146629/webrev.00/ >>> >>> Currently, there is no way to determine in Node::Identity() and Node::Value() if we were called from GVN or IGVN but sometimes we would like to do optimizations based on this information (for example, see discussion in RFR for JDK-8136469 [1]). I changed the arguments of Node::Identity() and Node::Value() from PhaseTransform* to PhaseGVN*. Like this, we can simply call PhaseValues::is_IterGVN() from both methods. >>> >>> Thanks, >>> Tobias >>> >>> [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-January/020670.html >>> From john.r.rose at oracle.com Tue Jan 12 08:45:16 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 12 Jan 2016 00:45:16 -0800 Subject: RFR (M): 8143925: Enhancing CounterMode.crypt() for AES In-Reply-To: References: <566228AD.6060704@oracle.com> <567C8F5C.204@oracle.com> <5682486D.4030402@oracle.com> <758D9731-2548-4370-A6AA-7CCA2FF671EC@oracle.com> <0C5AB04C-125E-41A2-8761-A5C3025783E7@oracle.com> <568B9188.6000506@redhat.com> <568CEF5B.5060306@redhat.com> <86663D10-D257-44D1-AFDE-BD484AE439A8@oracle.com> <3746840B-2F8D-42A1-B81F-02A0DF4A1D11@oracle.com> <568D7FA1.4040707@oracle.com> <1BC8C0B0-E8EF-4D6B-B9EE-D374E2FC3E04@oracle.com> Message-ID: <5AE9C5CC-4ACE-4140-B044-CDDDAE2D9C9B@oracle.com> On Jan 11, 2016, at 7:54 AM, Roland Westrelin wrote: > >> As a general comment, would it make sense to assume exceptional paths are not taken in most Java code? That is, for code optimization purposes it's probably a reasonable assumption. It seems like having an exceptional path is already a hint that it's not expected to fail; most Java devs know not to use exceptions for expected control flow. > > That sounds reasonable. There?s a BailoutToInterpreterForThrows command line argument that does that (off by default, not available in product builds). I don?t know what the history behind it is. It is reasonable in *most* code, which means you have to be ready to run into a bit of code which misbehaves, and mark the profile so you treat that bit of code specially. Key example: Null checks are *mostly* uncommon, so we work hard to turn them into implicit (trap-bearing) instructions. But those which are not get rewritten differently, after the trap happens, using the profile marks. In other words, we speculate that a null check that throws uncommonly finds a null (pending evidence to the contrary). Hence the subtle interactions with profiles, including the profile inside a bytecoded method like checkIndex. With an intrinsic, you can say (just for that intrinsic), "ignore special-case marks in my profile". We don't have a good-enough heuristic (in the absence of split profiles) for detecting which normal methods can be treated this way. >> Could bytecode shape just like checkIndex be treated as same hint? Are there cases where something looks like checkIndex but really isn't? > > That sounds like a good suggestion. We would trade: > 2 comparisons: i < 0 || i >= length > for > 2 comparisons: length < 0 || i >=u length > > so even if it doesn't result in further improvements, we wouldn?t lose anything. In the small scale you don't lose anything, but (as I said earlier) in the IR graph at scale you lose the opportunity to common up certain expressions. Since Java expressions don't have unsigned comparison operators (and programmers wouldn't use them consistently for index checking, even if they were available), the comparison expressions available in the IR for commoning are signed, except for those which have been converted to unsigned. Converting speculatively from signed to unsigned (as suggested above) would seem to be harmless, but unless it is somehow limited to comparisons that *all* go unsigned, you could get a mix of signed and unsigned versions of the same logic, which would (worst case) double the number of tests in the object code. Using an intrinsic for range checks (which is the current case with aaload and will also be the case with checkIndex) allows us to reduce the number of unsigned comparisons to those which actually . I hope everybody understands that I am not arguing *against* strong IR normalization and automatic detection of idioms, but rather observing that, powerful as those desirable techniques are, they are not infallible, and sometimes benefit from user-driven help via explicit operators, like checkIndex. Of course we want to fold user code which "works just like" checkIndex or the aaload check into the same good IR. But we don't want to rely on this auto-detection always, and we want to tread carefully when balancing the various IR normalization rules, which may work either for or against the use case of checkIndex detection. Open question: Given a "sea of nodes" encoding various configurations of signed and unsigned comparisons, how do we normalize them so that (a) we maximize commoning (U with U and S with S), and (b) we end up with all available clever uses of U mode to fold <=0 checks? Or, more to the point how do we arrange these choices so (c) the dynamic number of comparisons (of either mode) is minimized? Given that (a) and (b) can sometimes work against each other, what's a good heuristic for binning comparisons into U and S categories, for subsequent CSE? (So, checkIndex is a hint for binning.) Personal background: About 10 years ago I worked on opto/subnode.cpp to try to switch between S and U modes more vigorously, implementing something probably related to what Vitaly is advocating. I ran into the (a) vs. (b) tradeoff, especially trying to preserve the aggressive matching of dominating tests in opto/ifnode.cpp. I think it can be made better than it is today, but the details are very tricky. Conjecture: It might help if the TypeInt lattice could encode ranges in the uint32 space, just as (today) it encodes ranges in int32, since some of the heuristics are type-driven. ? John From roland.westrelin at oracle.com Tue Jan 12 08:51:37 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 12 Jan 2016 09:51:37 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: <5694BBA6.4040301@oracle.com> References: <568EB3A0.3040909@oracle.com> <56937F1F.7010709@oracle.com> <569402C9.5060305@oracle.com> <5694BBA6.4040301@oracle.com> Message-ID: > I had to merge again with JDK-8139771 (castnode.cpp/hpp): > http://cr.openjdk.java.net/~thartmann/8146629/webrev.02/ Looks good to me. Roland. From tobias.hartmann at oracle.com Tue Jan 12 09:02:38 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Jan 2016 10:02:38 +0100 Subject: [9] RFR(M): 8146629: Make phase->is_IterGVN() accessible from Node::Identity and Node::Value In-Reply-To: References: <568EB3A0.3040909@oracle.com> <56937F1F.7010709@oracle.com> <569402C9.5060305@oracle.com> <5694BBA6.4040301@oracle.com> Message-ID: <5694C12E.6040809@oracle.com> Thanks, Roland. And sorry for the noise. Best, Tobias On 12.01.2016 09:51, Roland Westrelin wrote: >> I had to merge again with JDK-8139771 (castnode.cpp/hpp): >> http://cr.openjdk.java.net/~thartmann/8146629/webrev.02/ > > Looks good to me. > > Roland. > From doug.simon at oracle.com Tue Jan 12 09:39:55 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 12 Jan 2016 10:39:55 +0100 Subject: RFR: 8146364: Remove @ServiceProvider mechanism from JVMCI In-Reply-To: References: Message-ID: Doh! This was a result of pilot error when transplanting patches from graal-jvmci-9. I left out: http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/rev/2390bc159b77 The behavior I saw was not that the META-INF files overwrote each other. Instead, they were ignored completely. I?ve updated http://cr.openjdk.java.net/~dnsimon/8146364/hotspot/ now. -Doug > On 12 Jan 2016, at 02:35, Christian Thalinger wrote: > > >> On Jan 11, 2016, at 12:43 PM, Doug Simon wrote: >> >> Hi, >> >> Please review these changes for removing the mechanism in JVMCI for automating the generation of files in META-INF/services for service providers annotated with @ServiceProvider. > > Did you try this with a regular JDK 9 build? I don?t think it works to have the same META-INF file in different locations: > > src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory > src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.amd64/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory > src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.sparc/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory > > They will overwrite each other when being installed into the image. > >> >> https://bugs.openjdk.java.net/browse/JDK-8146364 >> >> http://cr.openjdk.java.net/~dnsimon/8146364/jdk9/ >> http://cr.openjdk.java.net/~dnsimon/8146364/hotspot/ >> >> -Doug > From roland.westrelin at oracle.com Tue Jan 12 10:07:50 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 12 Jan 2016 11:07:50 +0100 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph In-Reply-To: <56944C45.6060307@oracle.com> References: <56944C45.6060307@oracle.com> Message-ID: <37CD6E66-1ACA-4B40-A974-F1B6A3086C10@oracle.com> Hi Vladimir, Thanks for looking at this. > Now I think I understand. > Note, there should be NO any control between loop's head and predicate check. I assume CastPP is attached to it because its original check was removed by dominated similar check (for example NULL check). > > I think it is safe to move CastPP above original dummy predicate checks (one or two checks if there is loop limit checks) since Cast PP should not depend on them. It will solve the problem since moved check(new predicate) is always inserted before original dummy predicate (which will be removed later). I first saw that problem with a CastPP but in the test case that I wrote for that bug, the pinned node is not a CastPP, it?s a StoreF. The predicate that is moved above tests a LoadF value that is memory dependent on the StoreF. I don?t see any reason the same problem couldn?t be reproduced with any data node. With the test case, it would be safe to move the StoreF above the predicates I think. But in the general case, I don?t see how we can be sure that we don?t have: - null check/range check for the StoreF moved out of loops as predicates - partial peel that causes the StoreF to be pinned below the predicates - loop predication that moves some data node that depends on the StoreF above it Roland. > > Thanks, > Vladimir > > On 1/11/16 7:07 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8146792/webrev.00/ >> >> - partial peeling is applied to a loop >> - the peeled section is optimized and leaves a pinned node between the loop predicates and the loop body but no control flow >> - loop predicates are applied and a predicate that depends on the pinned node is moved out of the loop, before the pinned node, leading to a broken graph >> >> This is the same issue that came up during review of 8139771. Vladimir suggested it gets reviewed separately. With the included test case it reproduces without the change from 8139771. >> >> Roland. >> From roland.westrelin at oracle.com Tue Jan 12 10:17:16 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 12 Jan 2016 11:17:16 +0100 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph In-Reply-To: <37CD6E66-1ACA-4B40-A974-F1B6A3086C10@oracle.com> References: <56944C45.6060307@oracle.com> <37CD6E66-1ACA-4B40-A974-F1B6A3086C10@oracle.com> Message-ID: <78074704-CE75-4F50-9F53-22FEC75E836E@oracle.com> > With the test case, it would be safe to move the StoreF above the predicates I think. But in the general case, I don?t see how we can be sure that we don?t have: > > - null check/range check for the StoreF moved out of loops as predicates > - partial peel that causes the StoreF to be pinned below the predicates > - loop predication that moves some data node that depends on the StoreF above it Actually, I can reproduce this scenario with the patch below: some changes to the test and making range check smearing a little big more aggressive so a range check is replaced by a dominating predicate range check. Roland. diff --git a/src/share/vm/opto/ifnode.cpp b/src/share/vm/opto/ifnode.cpp --- a/src/share/vm/opto/ifnode.cpp +++ b/src/share/vm/opto/ifnode.cpp @@ -514,7 +514,7 @@ // along the OOB path. Otherwise, it's possible that the user wrote // something which optimized to look like a range check but behaves // in some other way. - if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_range_check) == NULL) { + if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_none) == NULL) { return 0; } diff --git a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java --- a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java +++ b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java @@ -30,6 +30,8 @@ * */ +import java.util.Objects; + public class BadPredicateAfterPartialPeel { static void not_inlined1() {} @@ -45,13 +47,13 @@ boolean flag; int j; - static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) { + static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) throws Exception { int i1 = 1; // To delay partial peeling to the loop opts pass right before CCP - int i2 = 0; - for (; i2 < 10; i2 += i1); - i2 = i2 / 10; + int i2 = 1; + // for (; i2 < 10; i2 += i1); + // i2 = i2 / 10; // Simplified during CCP: int i3 = 2; @@ -63,11 +65,12 @@ not_inlined1(); - array[0] = -1; do { // peeled section starts here o.flag = false; o.j = 0; + + Objects.checkIndex(0, array.length, null); if (b) { // The following store will be pinned between @@ -300,7 +303,7 @@ not_inlined4(); } - static public void main(String[] args) { + static public void main(String[] args) throws Exception { BadPredicateAfterPartialPeel o1 = new BadPredicateAfterPartialPeel(); BadPredicateAfterPartialPeel o2 = new BadPredicateAfterPartialPeel(); for (int i = 0; i < 20000; i++) { From tobias.hartmann at oracle.com Tue Jan 12 13:59:38 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 12 Jan 2016 14:59:38 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <569409C5.2040805@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> <569409C5.2040805@oracle.com> Message-ID: <569506CA.8040001@oracle.com> On 11.01.2016 21:00, Vladimir Kozlov wrote: > On 1/11/16 7:20 AM, Tobias Hartmann wrote: >> On 08.01.2016 20:41, Vladimir Kozlov wrote: >>> On 1/8/16 2:37 AM, Tobias Hartmann wrote: >>>> On 07.01.2016 21:49, Vladimir Kozlov wrote: >>>>> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>>>>> Andrew is right. >>>>>> >>>>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >>>>> >>>>> Right. It was the root of this bug, see below. >>>>> >>>>>> >>>>>> I fixed this for the inflate and compress intrinsics. >>>>>> >>>>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>>>>> StrInflatedCopyNode is not memory node. >>>>>> >>>>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >>>>> >>>>> I did not get the question. Is it before your webrev.01 change? Or even with the change? >>>> >>>> I meant with webrev.01 but you answered my question below. >>>> >>>>>> // This class defines a projection of the memory state of a store conditional node. >>>>>> // These nodes return a value, but also update memory. >>>>>> >>>>>> But inflate does not return any value. >>>>> >>>>> Hmm, according to bottom type inflate produce memory: >>>>> >>>>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >>>>> >>>>> So it really does not need SCMemProjNode. Sorry about that. >>>>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. >>>> >>>> Exactly. >>>> >>>>> Instead of SCMemProjNode we should have to change the idx of your dst_type: >>>>> >>>>> set_memory(str, dst_type); >>>> >>>> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. >>>> >>>>> And you should rollback part of changes in escape.cpp and macro.cpp. >>>> >>>> Okay, I'll to that. >>>> >>>>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >>>>> >>>>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. >>>> >>>> Okay, should we then use BOTTOM for both the input and output type? >>> >>> Only input. Output type corresponds to dst array type which you set correctly now. >> >> It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case: >> StoreC >> inflate_string >> LoadC >> >> The memory graph (def->use) now looks like this: >> LoadC -> inflate_string -> ByteMem >> ... StoreC-> CharMem > > I did not get this. If StoreC node is created before inflate_string - inflate_string should point to it be barrier for LoadC. Note that the StoreC and inflate_string are *not* writing to the same char[] array. The test looks like this: char c1[] = new char[1]; char c2[] = new char[1]; c2[0] = 42; // Inflate String from byte[] to char[] s.getChars(0, 1, c1, 0); // Read char[] memory written before inflation return c2[0]; The result should be 42. The problem is that inflate_string does not point to StoreC because inflate_string uses a byte[] as input and in this case also writes to a different char[]. Even if we set the input to BOTTOM, inflate_string points to 7 Parm (BOTTOM) but not to the char[] memory produced by 96 StoreC: http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png 349 LoadUS then reads from the output char[] memory of inflate_string which does not include the result of StoreC. The test fails because the return value is != 42. My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. > If StoreC followed inflate_string and LoadC followed StoreC - LoadC should point to StoreC. If LoadC does not follow StoreC then result is relaxed. Yes, these cases work fine. Thanks, Tobias >> The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2'). >> >> Setting the input to BOTTOM, generates the following graph: >> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >> The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails. >> >> I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this: >> LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem)) >> http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/ >> Probably, we could also only capture the byte and char slices instead of merging everything. What do you think? >> >> Best, >> Tobias >> >>>>>> Related question: >>>>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>>>>> char[int:>=0]:exact+any * >>>>>> >>>>>> which is equal to the type of the char load. >>>>> >>>>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? >>>> >>>> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: >>>> >>>> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) >>>> // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) >>>> // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) >>>> // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) >>>> // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>> >>>> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. >>>> >>>>> Should we also be more careful in inflate_string_slow()? Is it used? >>>> >>>> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >>>> >>>>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>>>>> char[int:1]:NotNull:exact * >>>>>> >>>>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >>>>> >>>>> It is indeed strange. What memory type of LoadUS? It could be bug. >>>> >>>> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. >>>> >>>> I will look into this again and try to understand what happens. >>> >>> It could that aryptr is pointer to array and load type is pointer to array's element. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> Tobias >>>> >>>>>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>>>>> >>>>>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>>>>> >>>>>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>>>>> // compiler because the CPU does all the ordering for us. >>>>>>>>> >>>>>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>>>>> memory ordering, loads are never moved before dependent stores. >>>>>>>>> >>>>>>>>> Or did I misunderstand your question? >>>>>>>> >>>>>>>> No, I don't think so. I was just checking: I am very aware that >>>>>>>> HotSpot has presented those of use with relaxed memory order machines >>>>>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>>>>> surprised that C2 needs this barrier, given that there is a >>>>>>>> read-after-write dependency, but never mind. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Andrew. >>>>>>>> From edward.nevill at gmail.com Tue Jan 12 14:01:01 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 12 Jan 2016 14:01:01 +0000 Subject: RFR: 8146886: aarch64: fails to build following 8136525 and 8139864 Message-ID: <1452607261.30600.5.camel@mylittlepony.linaroharston> Hi, The following webrev fixed several build failures in aarch64 following recent merges. Webrev: http://cr.openjdk.java.net/~enevill/8146886/webrev/ Jira: https://bugs.openjdk.java.net/browse/JDK-8146886 Testing by building release and festdebug versions and by running jtreg hotspot and langtools. Jtreg hotspot: Test results: passed: 1,068; failed: 16; error: 15 JTreg langtools: Test results: passed: 3,358; failed: 1; error: 4 OK to push? Ed. From vladimir.x.ivanov at oracle.com Tue Jan 12 15:22:38 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Jan 2016 18:22:38 +0300 Subject: [9] RFR (S): 8140001: _allocateInstance intrinsic does not throw InstantiationException for abstract classes and interfaces Message-ID: <56951A3E.7070805@oracle.com> http://cr.openjdk.java.net/~vlivanov/8140001/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8140001 EA can eliminate allocations of abstract classes or interfaces, thus changing observable behavior of a program as the test case demonstrates. The fix is to always mark such allocations as escaping. Testing: failing test, JPRT. Thanks! Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Jan 12 17:00:33 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Jan 2016 09:00:33 -0800 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <5694B2A2.7010905@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> <5693B01D.60604@oracle.com> <569401DE.8000105@oracle.com> <5694B2A2.7010905@oracle.com> Message-ID: <56953131.7070408@oracle.com> Looks good. The test may still have a problem on slow platforms with -Xcomp. We may need to increase timeout later. Thanks, Vladimir On 1/12/16 12:00 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > On 01/11/2016 08:26 PM, Vladimir Kozlov wrote: >> Don't use GC flags in the test. They will conflict with flags passed by testing infra and the test will fail. The was >> bug fixed by removing GC flags from all our tests. >> Note, Nightly testing does GC flags rotation so you don't need to do that. > > OK, I removed all GC flags from the test. Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8086053/webrev.03/ > >> Otherwise looks good. > > Thank you for the review! I'll push webrev.03 today as it addresses all issues that were brought up. > > Thank you and best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 1/11/16 5:37 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> >>> On 01/11/2016 02:11 PM, Zolt?n Maj? wrote: >>>> [...] >>>> Yes, that is a good idea. I added a test that launches the VM with all flag combinations and also with different GCs. >>>> I did the same what the test does to reproduce the original failure. >>>> >>>> Here is the updated webrev: >>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ >>> >>> The test contains and unnecessary @library tag and package import. The year in the copyright statement must be changed >>> as well (to 2016). >>> >>> Here is the webrev with those changes: >>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.02/ >>> >>> Sorry for the noise. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> >>> >>>> >>>> The newly added test passes on all supported platforms. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >>>>> >>>>> Best, >>>>> Tobias >>>>> >>>>> >>>>> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>>>>> Hi, >>>>>> >>>>>> >>>>>> please review the patch for 8086053. >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>>>>> >>>>>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill newly allocated TLAB regions. With ZeroTLAB >>>>>> disabled, the interpreter and compiled code should assume the responsibility to zero-fill newly allocated regions. >>>>>> Currently, the handling of the ZeroTLAB flag shows some inconsistencies between the GC and the compilers. These >>>>>> inconsistencies lead to newly allocated regions not being filled with zeros. >>>>>> >>>>>> Solution: Address the following: >>>>>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB without notifying the GC. As a result, the newly >>>>>> allocated TLAB is not initialized with zero. Add TLAB initialization code to C1. >>>>>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization of newly allocated objects/arrays even if TLAB >>>>>> allocation is disabled. Add stricter conditions to C2 on when to skip filling objects/arrays with zero. >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>>>>> >>>>>> Testing: >>>>>> - local testing (linux_x86_64) of failing test case with -XX:+UseG1GC and -XX:+UseSerialGC; >>>>>> - JPRT; >>>>>> - all hotspot tests on all platforms affected by the change using all combinations of +/-UseTLAB and +/-ZeroTLAB. >>>>>> >>>>>> Thank you and best regards, >>>>>> >>>>>> >>>>>> Zoltan >>>>>> >>>> >>> > From zoltan.majo at oracle.com Tue Jan 12 17:08:14 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 12 Jan 2016 18:08:14 +0100 Subject: [9] RFR (M): 8086053: Address inconsistencies regarding ZeroTLAB In-Reply-To: <56953131.7070408@oracle.com> References: <568F9852.4090806@oracle.com> <56936C68.70002@oracle.com> <5693A9E7.3040700@oracle.com> <5693B01D.60604@oracle.com> <569401DE.8000105@oracle.com> <5694B2A2.7010905@oracle.com> <56953131.7070408@oracle.com> Message-ID: <569532FE.2080107@oracle.com> Hi Vladimir, On 01/12/2016 06:00 PM, Vladimir Kozlov wrote: > Looks good. The test may still have a problem on slow platforms with > -Xcomp. We may need to increase timeout later. thank you for pointing that out! I'll keep an eye on the test. Best regards, Zoltan > > Thanks, > Vladimir > > On 1/12/16 12:00 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> On 01/11/2016 08:26 PM, Vladimir Kozlov wrote: >>> Don't use GC flags in the test. They will conflict with flags passed >>> by testing infra and the test will fail. The was >>> bug fixed by removing GC flags from all our tests. >>> Note, Nightly testing does GC flags rotation so you don't need to do >>> that. >> >> OK, I removed all GC flags from the test. Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8086053/webrev.03/ >> >>> Otherwise looks good. >> >> Thank you for the review! I'll push webrev.03 today as it addresses >> all issues that were brought up. >> >> Thank you and best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 1/11/16 5:37 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> >>>> On 01/11/2016 02:11 PM, Zolt?n Maj? wrote: >>>>> [...] >>>>> Yes, that is a good idea. I added a test that launches the VM with >>>>> all flag combinations and also with different GCs. >>>>> I did the same what the test does to reproduce the original failure. >>>>> >>>>> Here is the updated webrev: >>>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.01/ >>>> >>>> The test contains and unnecessary @library tag and package import. >>>> The year in the copyright statement must be changed >>>> as well (to 2016). >>>> >>>> Here is the webrev with those changes: >>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.02/ >>>> >>>> Sorry for the noise. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >>>> >>>>> >>>>> The newly added test passes on all supported platforms. >>>>> >>>>> Thank you and best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>>>>> >>>>>> Best, >>>>>> Tobias >>>>>> >>>>>> >>>>>> On 08.01.2016 12:06, Zolt?n Maj? wrote: >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> please review the patch for 8086053. >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8086053 >>>>>>> >>>>>>> Problem: With ZeroTLAB enabled, the GC is supposed to zero-fill >>>>>>> newly allocated TLAB regions. With ZeroTLAB >>>>>>> disabled, the interpreter and compiled code should assume the >>>>>>> responsibility to zero-fill newly allocated regions. >>>>>>> Currently, the handling of the ZeroTLAB flag shows some >>>>>>> inconsistencies between the GC and the compilers. These >>>>>>> inconsistencies lead to newly allocated regions not being filled >>>>>>> with zeros. >>>>>>> >>>>>>> Solution: Address the following: >>>>>>> - With -XX:+FastTLABRefill, C1-compiled code refills the TLAB >>>>>>> without notifying the GC. As a result, the newly >>>>>>> allocated TLAB is not initialized with zero. Add TLAB >>>>>>> initialization code to C1. >>>>>>> - With -XX:+ZeroTLAB, the C2 compiler skips zero-initialization >>>>>>> of newly allocated objects/arrays even if TLAB >>>>>>> allocation is disabled. Add stricter conditions to C2 on when to >>>>>>> skip filling objects/arrays with zero. >>>>>>> >>>>>>> Webrev: >>>>>>> http://cr.openjdk.java.net/~zmajo/8086053/webrev.00/ >>>>>>> >>>>>>> Testing: >>>>>>> - local testing (linux_x86_64) of failing test case with >>>>>>> -XX:+UseG1GC and -XX:+UseSerialGC; >>>>>>> - JPRT; >>>>>>> - all hotspot tests on all platforms affected by the change >>>>>>> using all combinations of +/-UseTLAB and +/-ZeroTLAB. >>>>>>> >>>>>>> Thank you and best regards, >>>>>>> >>>>>>> >>>>>>> Zoltan >>>>>>> >>>>> >>>> >> From vladimir.x.ivanov at oracle.com Tue Jan 12 18:41:24 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 12 Jan 2016 21:41:24 +0300 Subject: [9] RFR (XS): 6985422: flush the output streams before OnError commands Message-ID: <569548D4.2070707@oracle.com> http://cr.openjdk.java.net/~vlivanov/6985422/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-6985422 OnError commands are executed before hotspot log is finished. The fix is to finish the log before executing OnError commands. Also, I moved compilation replay data dumping logic before OnError processing, so compilation replay file is accessible from OnError commands as well. I verified the fix by triggering VM crash w/ -XX:+LogCompilation -XX:LogFile=hotspot.log -XX:OnError='cp hotspot.log hs.log' flags and checking that hs.log is complete. Without the fix the log is corrupted. Testing: manual, JPRT. Thanks! Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Jan 12 19:13:35 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Jan 2016 11:13:35 -0800 Subject: [9] RFR (XS): 6985422: flush the output streams before OnError commands In-Reply-To: <569548D4.2070707@oracle.com> References: <569548D4.2070707@oracle.com> Message-ID: <5695505F.7050005@oracle.com> Looks good. Vladimir K On 1/12/16 10:41 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/6985422/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-6985422 > > OnError commands are executed before hotspot log is finished. > > The fix is to finish the log before executing OnError commands. > > Also, I moved compilation replay data dumping logic before OnError processing, so compilation replay file is accessible > from OnError commands as well. > > I verified the fix by triggering VM crash w/ -XX:+LogCompilation -XX:LogFile=hotspot.log -XX:OnError='cp hotspot.log > hs.log' flags and checking that hs.log is complete. Without the fix the log is corrupted. > > Testing: manual, JPRT. > > Thanks! > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Jan 12 19:24:30 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Jan 2016 11:24:30 -0800 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <569506CA.8040001@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> <569409C5.2040805@oracle.com> <569506CA.8040001@oracle.com> Message-ID: <569552EE.8050809@oracle.com> > My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. Yes, that is right solution here. Thanks, Vladimir On 1/12/16 5:59 AM, Tobias Hartmann wrote: > On 11.01.2016 21:00, Vladimir Kozlov wrote: >> On 1/11/16 7:20 AM, Tobias Hartmann wrote: >>> On 08.01.2016 20:41, Vladimir Kozlov wrote: >>>> On 1/8/16 2:37 AM, Tobias Hartmann wrote: >>>>> On 07.01.2016 21:49, Vladimir Kozlov wrote: >>>>>> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>>>>>> Andrew is right. >>>>>>> >>>>>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >>>>>> >>>>>> Right. It was the root of this bug, see below. >>>>>> >>>>>>> >>>>>>> I fixed this for the inflate and compress intrinsics. >>>>>>> >>>>>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>>>>>> StrInflatedCopyNode is not memory node. >>>>>>> >>>>>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >>>>>> >>>>>> I did not get the question. Is it before your webrev.01 change? Or even with the change? >>>>> >>>>> I meant with webrev.01 but you answered my question below. >>>>> >>>>>>> // This class defines a projection of the memory state of a store conditional node. >>>>>>> // These nodes return a value, but also update memory. >>>>>>> >>>>>>> But inflate does not return any value. >>>>>> >>>>>> Hmm, according to bottom type inflate produce memory: >>>>>> >>>>>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >>>>>> >>>>>> So it really does not need SCMemProjNode. Sorry about that. >>>>>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. >>>>> >>>>> Exactly. >>>>> >>>>>> Instead of SCMemProjNode we should have to change the idx of your dst_type: >>>>>> >>>>>> set_memory(str, dst_type); >>>>> >>>>> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. >>>>> >>>>>> And you should rollback part of changes in escape.cpp and macro.cpp. >>>>> >>>>> Okay, I'll to that. >>>>> >>>>>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >>>>>> >>>>>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. >>>>> >>>>> Okay, should we then use BOTTOM for both the input and output type? >>>> >>>> Only input. Output type corresponds to dst array type which you set correctly now. >>> >>> It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case: >>> StoreC >>> inflate_string >>> LoadC >>> >>> The memory graph (def->use) now looks like this: >>> LoadC -> inflate_string -> ByteMem >>> ... StoreC-> CharMem >> >> I did not get this. If StoreC node is created before inflate_string - inflate_string should point to it be barrier for LoadC. > > Note that the StoreC and inflate_string are *not* writing to the same char[] array. The test looks like this: > > char c1[] = new char[1]; > char c2[] = new char[1]; > > c2[0] = 42; > // Inflate String from byte[] to char[] > s.getChars(0, 1, c1, 0); > // Read char[] memory written before inflation > return c2[0]; > > The result should be 42. The problem is that inflate_string does not point to StoreC because inflate_string uses a byte[] as input and in this case also writes to a different char[]. Even if we set the input to BOTTOM, inflate_string points to 7 Parm (BOTTOM) but not to the char[] memory produced by 96 StoreC: > http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png > > 349 LoadUS then reads from the output char[] memory of inflate_string which does not include the result of StoreC. The test fails because the return value is != 42. > > My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. > >> If StoreC followed inflate_string and LoadC followed StoreC - LoadC should point to StoreC. If LoadC does not follow StoreC then result is relaxed. > > Yes, these cases work fine. > > Thanks, > Tobias > >>> The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2'). >>> >>> Setting the input to BOTTOM, generates the following graph: >>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >>> The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails. >>> >>> I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this: >>> LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem)) >>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png >>> >>> Here is the new webrev: >>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/ >>> Probably, we could also only capture the byte and char slices instead of merging everything. What do you think? >>> >>> Best, >>> Tobias >>> >>>>>>> Related question: >>>>>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>>>>>> char[int:>=0]:exact+any * >>>>>>> >>>>>>> which is equal to the type of the char load. >>>>>> >>>>>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? >>>>> >>>>> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: >>>>> >>>>> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) >>>>> // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>> // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) >>>>> // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) >>>>> // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>> >>>>> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. >>>>> >>>>>> Should we also be more careful in inflate_string_slow()? Is it used? >>>>> >>>>> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >>>>> >>>>>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>>>>>> char[int:1]:NotNull:exact * >>>>>>> >>>>>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >>>>>> >>>>>> It is indeed strange. What memory type of LoadUS? It could be bug. >>>>> >>>>> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. >>>>> >>>>> I will look into this again and try to understand what happens. >>>> >>>> It could that aryptr is pointer to array and load type is pointer to array's element. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Thanks, >>>>> Tobias >>>>> >>>>>>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>>>>>> >>>>>>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>>>>>> >>>>>>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>>>>>> // compiler because the CPU does all the ordering for us. >>>>>>>>>> >>>>>>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>>>>>> memory ordering, loads are never moved before dependent stores. >>>>>>>>>> >>>>>>>>>> Or did I misunderstand your question? >>>>>>>>> >>>>>>>>> No, I don't think so. I was just checking: I am very aware that >>>>>>>>> HotSpot has presented those of use with relaxed memory order machines >>>>>>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>>>>>> surprised that C2 needs this barrier, given that there is a >>>>>>>>> read-after-write dependency, but never mind. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Andrew. >>>>>>>>> From vladimir.kozlov at oracle.com Tue Jan 12 19:40:06 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Jan 2016 11:40:06 -0800 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph In-Reply-To: <78074704-CE75-4F50-9F53-22FEC75E836E@oracle.com> References: <56944C45.6060307@oracle.com> <37CD6E66-1ACA-4B40-A974-F1B6A3086C10@oracle.com> <78074704-CE75-4F50-9F53-22FEC75E836E@oracle.com> Message-ID: <56955696.2080501@oracle.com> On 1/12/16 2:17 AM, Roland Westrelin wrote: >> With the test case, it would be safe to move the StoreF above the predicates I think. But in the general case, I don?t see how we can be sure that we don?t have: >> >> - null check/range check for the StoreF moved out of loops as predicates >> - partial peel that causes the StoreF to be pinned below the predicates >> - loop predication that moves some data node that depends on the StoreF above it I agree that your change works as very conservative approach. But we will not get performance from it. I am thinking that it is "always" safe to move pinned data node above original/dummy predicates (loop index variable is depending on limit check predicate, but we will never move index node from loop). We only needs to be sure that we move it (after partial peel, for example) before any dependent checks and data nodes are moved from the loop. Those checks and data will be inserted below it. Anyway, how rare this case? If it is vary rare I agree with your change since performance is not important. Thanks, Vladimir > > Actually, I can reproduce this scenario with the patch below: some changes to the test and making range check smearing a little big more aggressive so a range check is replaced by a dominating predicate range check. > > Roland. > > diff --git a/src/share/vm/opto/ifnode.cpp b/src/share/vm/opto/ifnode.cpp > --- a/src/share/vm/opto/ifnode.cpp > +++ b/src/share/vm/opto/ifnode.cpp > @@ -514,7 +514,7 @@ > // along the OOB path. Otherwise, it's possible that the user wrote > // something which optimized to look like a range check but behaves > // in some other way. > - if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_range_check) == NULL) { > + if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_none) == NULL) { > return 0; > } > > diff --git a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java > --- a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java > +++ b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java > @@ -30,6 +30,8 @@ > * > */ > > +import java.util.Objects; > + > public class BadPredicateAfterPartialPeel { > > static void not_inlined1() {} > @@ -45,13 +47,13 @@ > boolean flag; > int j; > > - static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) { > + static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) throws Exception { > int i1 = 1; > > // To delay partial peeling to the loop opts pass right before CCP > - int i2 = 0; > - for (; i2 < 10; i2 += i1); > - i2 = i2 / 10; > + int i2 = 1; > + // for (; i2 < 10; i2 += i1); > + // i2 = i2 / 10; > > // Simplified during CCP: > int i3 = 2; > @@ -63,11 +65,12 @@ > > not_inlined1(); > > - array[0] = -1; > do { > // peeled section starts here > o.flag = false; > o.j = 0; > + > + Objects.checkIndex(0, array.length, null); > > if (b) { > // The following store will be pinned between > @@ -300,7 +303,7 @@ > not_inlined4(); > } > > - static public void main(String[] args) { > + static public void main(String[] args) throws Exception { > BadPredicateAfterPartialPeel o1 = new BadPredicateAfterPartialPeel(); > BadPredicateAfterPartialPeel o2 = new BadPredicateAfterPartialPeel(); > for (int i = 0; i < 20000; i++) { > > From christian.thalinger at oracle.com Tue Jan 12 19:40:42 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 12 Jan 2016 09:40:42 -1000 Subject: RFR: 8146364: Remove @ServiceProvider mechanism from JVMCI In-Reply-To: References: Message-ID: <6DC2B1CE-3AAF-45D7-99D2-24C584BDE71B@oracle.com> > On Jan 11, 2016, at 11:39 PM, Doug Simon wrote: > > Doh! This was a result of pilot error when transplanting patches from graal-jvmci-9. I left out: > > http://hg.openjdk.java.net/graal/graal-jvmci-9/hotspot/rev/2390bc159b77 > > The behavior I saw was not that the META-INF files overwrote each other. Instead, they were ignored completely. > > I?ve updated http://cr.openjdk.java.net/~dnsimon/8146364/hotspot/ now. Yes, that looks better. > > -Doug > >> On 12 Jan 2016, at 02:35, Christian Thalinger wrote: >> >> >>> On Jan 11, 2016, at 12:43 PM, Doug Simon wrote: >>> >>> Hi, >>> >>> Please review these changes for removing the mechanism in JVMCI for automating the generation of files in META-INF/services for service providers annotated with @ServiceProvider. >> >> Did you try this with a regular JDK 9 build? I don?t think it works to have the same META-INF file in different locations: >> >> src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory >> src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.amd64/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory >> src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot.sparc/src/META-INF/services/jdk.vm.ci.hotspot.HotSpotJVMCIBackendFactory >> >> They will overwrite each other when being installed into the image. >> >>> >>> https://bugs.openjdk.java.net/browse/JDK-8146364 >>> >>> http://cr.openjdk.java.net/~dnsimon/8146364/jdk9/ >>> http://cr.openjdk.java.net/~dnsimon/8146364/hotspot/ >>> >>> -Doug >> > From roland.westrelin at oracle.com Tue Jan 12 19:56:12 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 12 Jan 2016 20:56:12 +0100 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph In-Reply-To: <56955696.2080501@oracle.com> References: <56944C45.6060307@oracle.com> <37CD6E66-1ACA-4B40-A974-F1B6A3086C10@oracle.com> <78074704-CE75-4F50-9F53-22FEC75E836E@oracle.com> <56955696.2080501@oracle.com> Message-ID: > I am thinking that it is "always" safe to move pinned data node above original/dummy predicates (loop index variable is depending on limit check predicate, but we will never move index node from loop). We only needs to be sure that we move it (after partial peel, for example) before any dependent checks and data nodes are moved from the loop. Those checks and data will be inserted below it. I get it now and I think you?re right but it would need to be done for all data nodes which sounds like a mess. > Anyway, how rare this case? If it is vary rare I agree with your change since performance is not important. It?s very rare. I?ve seen it only once running the old CTW with the castPP change from 8139771. Roland. > > Thanks, > Vladimir > >> >> Actually, I can reproduce this scenario with the patch below: some changes to the test and making range check smearing a little big more aggressive so a range check is replaced by a dominating predicate range check. >> >> Roland. >> >> diff --git a/src/share/vm/opto/ifnode.cpp b/src/share/vm/opto/ifnode.cpp >> --- a/src/share/vm/opto/ifnode.cpp >> +++ b/src/share/vm/opto/ifnode.cpp >> @@ -514,7 +514,7 @@ >> // along the OOB path. Otherwise, it's possible that the user wrote >> // something which optimized to look like a range check but behaves >> // in some other way. >> - if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_range_check) == NULL) { >> + if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_none) == NULL) { >> return 0; >> } >> >> diff --git a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >> --- a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >> +++ b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >> @@ -30,6 +30,8 @@ >> * >> */ >> >> +import java.util.Objects; >> + >> public class BadPredicateAfterPartialPeel { >> >> static void not_inlined1() {} >> @@ -45,13 +47,13 @@ >> boolean flag; >> int j; >> >> - static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) { >> + static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) throws Exception { >> int i1 = 1; >> >> // To delay partial peeling to the loop opts pass right before CCP >> - int i2 = 0; >> - for (; i2 < 10; i2 += i1); >> - i2 = i2 / 10; >> + int i2 = 1; >> + // for (; i2 < 10; i2 += i1); >> + // i2 = i2 / 10; >> >> // Simplified during CCP: >> int i3 = 2; >> @@ -63,11 +65,12 @@ >> >> not_inlined1(); >> >> - array[0] = -1; >> do { >> // peeled section starts here >> o.flag = false; >> o.j = 0; >> + >> + Objects.checkIndex(0, array.length, null); >> >> if (b) { >> // The following store will be pinned between >> @@ -300,7 +303,7 @@ >> not_inlined4(); >> } >> >> - static public void main(String[] args) { >> + static public void main(String[] args) throws Exception { >> BadPredicateAfterPartialPeel o1 = new BadPredicateAfterPartialPeel(); >> BadPredicateAfterPartialPeel o2 = new BadPredicateAfterPartialPeel(); >> for (int i = 0; i < 20000; i++) { >> >> From christian.thalinger at oracle.com Tue Jan 12 20:04:44 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 12 Jan 2016 10:04:44 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> Message-ID: > On Jan 11, 2016, at 12:51 PM, Christian Thalinger wrote: > > >> On Jan 11, 2016, at 10:14 AM, Doug Simon wrote: >> >> >>> On 11 Jan 2016, at 20:50, Vladimir Kozlov wrote: >>> >>> What is naming convention for properties? >>> Do we have somewhere list of all JVMCI properties we accept? May be we should add it. >> >> Currently, there is no list of accepted JVMCI properties. Once Chris applies the changes below such that all system property access (apart from jvmci.InitTimer) goes through HotSpotJVMCIRuntime.getProperty(), then the javadoc of that method could contain the list (much like System.getProperties describes the supported standard properties). > > Good idea. > >> >>> All JVMCI properties names should be consistent whatever you choose. >> >> I agree. > > Yes. They should feel like our other command line options so camel-case is what I had in mind. How about this: http://cr.openjdk.java.net/~twisti/8146820/webrev.01/index.html Now all options are in an enum so that we can have PrintFlags and ShowFlags options. I did not add any documentation but we could. > >> >> -Doug >> >>> >>> 'inittimer' is also lowcased. >>> >>> Thanks, >>> Vladimir >>> >>> On 1/11/16 11:15 AM, Christian Thalinger wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8146820 >>>> >>>> I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? >>>> >>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java >>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Fri Dec 18 20:23:28 2015 +0300 >>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Mon Jan 11 09:12:48 2016 -1000 >>>> @@ -85,6 +85,21 @@ public final class HotSpotJVMCIRuntime i >>>> } >>>> >>>> /** >>>> + * Gets a String value based on a system property {@linkplain VM#getSavedProperty(String) saved} >>>> + * at system initialization time. The property name is prefixed with "{@code jvmci.}". >>>> + * >>>> + * @param name the name of the system property >>>> + * @param def the value to return if there is no system property corresponding to {@code name} >>>> + */ >>>> + public static String getProperty(String name, String def) { >>>> + String value = VM.getSavedProperty("jvmci." + name); >>>> + if (value == null) { >>>> + return def; >>>> + } >>>> + return value; >>>> + } >>>> + >>>> + /** >>>> * Gets a boolean value based on a system property {@linkplain VM#getSavedProperty(String) >>>> * saved} at system initialization time. The property name is prefixed with "{@code jvmci.}". >>>> * >>>> @@ -93,7 +108,7 @@ public final class HotSpotJVMCIRuntime i >>>> * @param def the value to return if there is no system property corresponding to {@code name} >>>> */ >>>> public static boolean getBooleanProperty(String name, boolean def) { >>>> - String value = VM.getSavedProperty("jvmci." + name); >>>> + String value = getProperty(name, null); >>>> if (value == null) { >>>> return def; >>>> } >>>> @@ -164,7 +179,7 @@ public final class HotSpotJVMCIRuntime i >>>> } >>>> metaAccessContext = context; >>>> >>>> - if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { >>>> + if (getBooleanProperty("printconfig", false)) { >>>> printConfig(config, compilerToVm); >>>> } >>>> >>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Fri Dec 18 20:23:28 2015 +0300 >>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 11 09:12:48 2016 -1000 >>>> @@ -417,7 +417,7 @@ final class HotSpotResolvedJavaMethodImp >>>> return false; >>>> } >>>> >>>> - private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); >>>> + private static final String TraceMethodDataFilter = HotSpotJVMCIRuntime.getProperty("TraceMethodDataFilter", null); >>>> >>>> @Override >>>> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java >>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Fri Dec 18 20:23:28 2015 +0300 >>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Mon Jan 11 09:12:48 2016 -1000 >>>> @@ -65,9 +65,11 @@ public final class InitTimer implements >>>> } >>>> >>>> /** >>>> - * Specifies if initialization timing is enabled. >>>> + * Specifies if initialization timing is enabled. Note: this property cannot use >>>> + * {@code HotSpotJVMCIRuntime.getBooleanProperty} since that class is not visible from this >>>> + * package. >>>> */ >>>> - private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); >>>> + private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer"); >>>> >>>> public static final AtomicInteger nesting = ENABLED ? new AtomicInteger() : null; >>>> public static final String SPACES = " "; >>>> >> > From doug.simon at oracle.com Tue Jan 12 20:14:10 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 12 Jan 2016 21:14:10 +0100 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> Message-ID: If we?re going with an enum, you could put accessors directly in the enum: private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); You could then type the value of the options and check the right accessor is used: public enum Option { ImplicitStableValues(boolean.class), InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). PrintConfig(boolean.class), PrintFlags(boolean.class), ShowFlags(boolean.class), TraceMethodDataFilter(String.class), TrustFinalDefaultFields(String.class); Even ignoring these suggestions, the discipline imposed by the enum if a good idea. -Doug > On 12 Jan 2016, at 21:04, Christian Thalinger wrote: > >> >> On Jan 11, 2016, at 12:51 PM, Christian Thalinger wrote: >> >> >>> On Jan 11, 2016, at 10:14 AM, Doug Simon wrote: >>> >>> >>>> On 11 Jan 2016, at 20:50, Vladimir Kozlov wrote: >>>> >>>> What is naming convention for properties? >>>> Do we have somewhere list of all JVMCI properties we accept? May be we should add it. >>> >>> Currently, there is no list of accepted JVMCI properties. Once Chris applies the changes below such that all system property access (apart from jvmci.InitTimer) goes through HotSpotJVMCIRuntime.getProperty(), then the javadoc of that method could contain the list (much like System.getProperties describes the supported standard properties). >> >> Good idea. >> >>> >>>> All JVMCI properties names should be consistent whatever you choose. >>> >>> I agree. >> >> Yes. They should feel like our other command line options so camel-case is what I had in mind. > > How about this: > > http://cr.openjdk.java.net/~twisti/8146820/webrev.01/index.html > > Now all options are in an enum so that we can have PrintFlags and ShowFlags options. I did not add any documentation but we could. > >> >>> >>> -Doug >>> >>>> >>>> 'inittimer' is also lowcased. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/11/16 11:15 AM, Christian Thalinger wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8146820 >>>>> >>>>> I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? >>>>> >>>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java >>>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Fri Dec 18 20:23:28 2015 +0300 >>>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Mon Jan 11 09:12:48 2016 -1000 >>>>> @@ -85,6 +85,21 @@ public final class HotSpotJVMCIRuntime i >>>>> } >>>>> >>>>> /** >>>>> + * Gets a String value based on a system property {@linkplain VM#getSavedProperty(String) saved} >>>>> + * at system initialization time. The property name is prefixed with "{@code jvmci.}". >>>>> + * >>>>> + * @param name the name of the system property >>>>> + * @param def the value to return if there is no system property corresponding to {@code name} >>>>> + */ >>>>> + public static String getProperty(String name, String def) { >>>>> + String value = VM.getSavedProperty("jvmci." + name); >>>>> + if (value == null) { >>>>> + return def; >>>>> + } >>>>> + return value; >>>>> + } >>>>> + >>>>> + /** >>>>> * Gets a boolean value based on a system property {@linkplain VM#getSavedProperty(String) >>>>> * saved} at system initialization time. The property name is prefixed with "{@code jvmci.}". >>>>> * >>>>> @@ -93,7 +108,7 @@ public final class HotSpotJVMCIRuntime i >>>>> * @param def the value to return if there is no system property corresponding to {@code name} >>>>> */ >>>>> public static boolean getBooleanProperty(String name, boolean def) { >>>>> - String value = VM.getSavedProperty("jvmci." + name); >>>>> + String value = getProperty(name, null); >>>>> if (value == null) { >>>>> return def; >>>>> } >>>>> @@ -164,7 +179,7 @@ public final class HotSpotJVMCIRuntime i >>>>> } >>>>> metaAccessContext = context; >>>>> >>>>> - if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { >>>>> + if (getBooleanProperty("printconfig", false)) { >>>>> printConfig(config, compilerToVm); >>>>> } >>>>> >>>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >>>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Fri Dec 18 20:23:28 2015 +0300 >>>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 11 09:12:48 2016 -1000 >>>>> @@ -417,7 +417,7 @@ final class HotSpotResolvedJavaMethodImp >>>>> return false; >>>>> } >>>>> >>>>> - private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); >>>>> + private static final String TraceMethodDataFilter = HotSpotJVMCIRuntime.getProperty("TraceMethodDataFilter", null); >>>>> >>>>> @Override >>>>> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >>>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java >>>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Fri Dec 18 20:23:28 2015 +0300 >>>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Mon Jan 11 09:12:48 2016 -1000 >>>>> @@ -65,9 +65,11 @@ public final class InitTimer implements >>>>> } >>>>> >>>>> /** >>>>> - * Specifies if initialization timing is enabled. >>>>> + * Specifies if initialization timing is enabled. Note: this property cannot use >>>>> + * {@code HotSpotJVMCIRuntime.getBooleanProperty} since that class is not visible from this >>>>> + * package. >>>>> */ >>>>> - private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); >>>>> + private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer"); >>>>> >>>>> public static final AtomicInteger nesting = ENABLED ? new AtomicInteger() : null; >>>>> public static final String SPACES = " "; From vladimir.kozlov at oracle.com Tue Jan 12 20:33:08 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 12 Jan 2016 12:33:08 -0800 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph In-Reply-To: References: <56944C45.6060307@oracle.com> <37CD6E66-1ACA-4B40-A974-F1B6A3086C10@oracle.com> <78074704-CE75-4F50-9F53-22FEC75E836E@oracle.com> <56955696.2080501@oracle.com> Message-ID: <56956304.30100@oracle.com> On 1/12/16 11:56 AM, Roland Westrelin wrote: >> I am thinking that it is "always" safe to move pinned data node above original/dummy predicates (loop index variable is depending on limit check predicate, but we will never move index node from loop). We only needs to be sure that we move it (after partial peel, for example) before any dependent checks and data nodes are moved from the loop. Those checks and data will be inserted below it. > > I get it now and I think you?re right but it would need to be done for all data nodes which sounds like a mess. > >> Anyway, how rare this case? If it is vary rare I agree with your change since performance is not important. > > It?s very rare. I?ve seen it only once running the old CTW with the castPP change from 8139771. Okay then. Go with your changes - they are good enough. Thanks, Vladimir > > Roland. > >> >> Thanks, >> Vladimir >> >>> >>> Actually, I can reproduce this scenario with the patch below: some changes to the test and making range check smearing a little big more aggressive so a range check is replaced by a dominating predicate range check. >>> >>> Roland. >>> >>> diff --git a/src/share/vm/opto/ifnode.cpp b/src/share/vm/opto/ifnode.cpp >>> --- a/src/share/vm/opto/ifnode.cpp >>> +++ b/src/share/vm/opto/ifnode.cpp >>> @@ -514,7 +514,7 @@ >>> // along the OOB path. Otherwise, it's possible that the user wrote >>> // something which optimized to look like a range check but behaves >>> // in some other way. >>> - if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_range_check) == NULL) { >>> + if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_none) == NULL) { >>> return 0; >>> } >>> >>> diff --git a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >>> --- a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >>> +++ b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >>> @@ -30,6 +30,8 @@ >>> * >>> */ >>> >>> +import java.util.Objects; >>> + >>> public class BadPredicateAfterPartialPeel { >>> >>> static void not_inlined1() {} >>> @@ -45,13 +47,13 @@ >>> boolean flag; >>> int j; >>> >>> - static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) { >>> + static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) throws Exception { >>> int i1 = 1; >>> >>> // To delay partial peeling to the loop opts pass right before CCP >>> - int i2 = 0; >>> - for (; i2 < 10; i2 += i1); >>> - i2 = i2 / 10; >>> + int i2 = 1; >>> + // for (; i2 < 10; i2 += i1); >>> + // i2 = i2 / 10; >>> >>> // Simplified during CCP: >>> int i3 = 2; >>> @@ -63,11 +65,12 @@ >>> >>> not_inlined1(); >>> >>> - array[0] = -1; >>> do { >>> // peeled section starts here >>> o.flag = false; >>> o.j = 0; >>> + >>> + Objects.checkIndex(0, array.length, null); >>> >>> if (b) { >>> // The following store will be pinned between >>> @@ -300,7 +303,7 @@ >>> not_inlined4(); >>> } >>> >>> - static public void main(String[] args) { >>> + static public void main(String[] args) throws Exception { >>> BadPredicateAfterPartialPeel o1 = new BadPredicateAfterPartialPeel(); >>> BadPredicateAfterPartialPeel o2 = new BadPredicateAfterPartialPeel(); >>> for (int i = 0; i < 20000; i++) { >>> >>> > From christian.thalinger at oracle.com Tue Jan 12 21:39:35 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 12 Jan 2016 11:39:35 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> Message-ID: <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> > On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: > > If we?re going with an enum, you could put accessors directly in the enum: > > private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); > > private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); > > You could then type the value of the options and check the right accessor is used: > > public enum Option { > ImplicitStableValues(boolean.class), > InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). > PrintConfig(boolean.class), > PrintFlags(boolean.class), > ShowFlags(boolean.class), > TraceMethodDataFilter(String.class), > TrustFinalDefaultFields(String.class); > > Even ignoring these suggestions, the discipline imposed by the enum if a good idea. Excellent idea! I was also thinking about adding the default value to the enum. > > -Doug > > >> On 12 Jan 2016, at 21:04, Christian Thalinger wrote: >> >>> >>> On Jan 11, 2016, at 12:51 PM, Christian Thalinger wrote: >>> >>> >>>> On Jan 11, 2016, at 10:14 AM, Doug Simon wrote: >>>> >>>> >>>>> On 11 Jan 2016, at 20:50, Vladimir Kozlov wrote: >>>>> >>>>> What is naming convention for properties? >>>>> Do we have somewhere list of all JVMCI properties we accept? May be we should add it. >>>> >>>> Currently, there is no list of accepted JVMCI properties. Once Chris applies the changes below such that all system property access (apart from jvmci.InitTimer) goes through HotSpotJVMCIRuntime.getProperty(), then the javadoc of that method could contain the list (much like System.getProperties describes the supported standard properties). >>> >>> Good idea. >>> >>>> >>>>> All JVMCI properties names should be consistent whatever you choose. >>>> >>>> I agree. >>> >>> Yes. They should feel like our other command line options so camel-case is what I had in mind. >> >> How about this: >> >> http://cr.openjdk.java.net/~twisti/8146820/webrev.01/index.html >> >> Now all options are in an enum so that we can have PrintFlags and ShowFlags options. I did not add any documentation but we could. >> >>> >>>> >>>> -Doug >>>> >>>>> >>>>> 'inittimer' is also lowcased. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/11/16 11:15 AM, Christian Thalinger wrote: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8146820 >>>>>> >>>>>> I?ve renamed traceMethodDataFilter to TraceMethodDataFilter. Should we rename printconfig to PrintConfig? >>>>>> >>>>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java >>>>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Fri Dec 18 20:23:28 2015 +0300 >>>>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotJVMCIRuntime.java Mon Jan 11 09:12:48 2016 -1000 >>>>>> @@ -85,6 +85,21 @@ public final class HotSpotJVMCIRuntime i >>>>>> } >>>>>> >>>>>> /** >>>>>> + * Gets a String value based on a system property {@linkplain VM#getSavedProperty(String) saved} >>>>>> + * at system initialization time. The property name is prefixed with "{@code jvmci.}". >>>>>> + * >>>>>> + * @param name the name of the system property >>>>>> + * @param def the value to return if there is no system property corresponding to {@code name} >>>>>> + */ >>>>>> + public static String getProperty(String name, String def) { >>>>>> + String value = VM.getSavedProperty("jvmci." + name); >>>>>> + if (value == null) { >>>>>> + return def; >>>>>> + } >>>>>> + return value; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> * Gets a boolean value based on a system property {@linkplain VM#getSavedProperty(String) >>>>>> * saved} at system initialization time. The property name is prefixed with "{@code jvmci.}". >>>>>> * >>>>>> @@ -93,7 +108,7 @@ public final class HotSpotJVMCIRuntime i >>>>>> * @param def the value to return if there is no system property corresponding to {@code name} >>>>>> */ >>>>>> public static boolean getBooleanProperty(String name, boolean def) { >>>>>> - String value = VM.getSavedProperty("jvmci." + name); >>>>>> + String value = getProperty(name, null); >>>>>> if (value == null) { >>>>>> return def; >>>>>> } >>>>>> @@ -164,7 +179,7 @@ public final class HotSpotJVMCIRuntime i >>>>>> } >>>>>> metaAccessContext = context; >>>>>> >>>>>> - if (Boolean.valueOf(System.getProperty("jvmci.printconfig"))) { >>>>>> + if (getBooleanProperty("printconfig", false)) { >>>>>> printConfig(config, compilerToVm); >>>>>> } >>>>>> >>>>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java >>>>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Fri Dec 18 20:23:28 2015 +0300 >>>>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java Mon Jan 11 09:12:48 2016 -1000 >>>>>> @@ -417,7 +417,7 @@ final class HotSpotResolvedJavaMethodImp >>>>>> return false; >>>>>> } >>>>>> >>>>>> - private static final String TraceMethodDataFilter = System.getProperty("jvmci.traceMethodDataFilter"); >>>>>> + private static final String TraceMethodDataFilter = HotSpotJVMCIRuntime.getProperty("TraceMethodDataFilter", null); >>>>>> >>>>>> @Override >>>>>> public ProfilingInfo getProfilingInfo(boolean includeNormal, boolean includeOSR) { >>>>>> diff -r c90679b0ea25 src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java >>>>>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Fri Dec 18 20:23:28 2015 +0300 >>>>>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.inittimer/src/jdk/vm/ci/inittimer/InitTimer.java Mon Jan 11 09:12:48 2016 -1000 >>>>>> @@ -65,9 +65,11 @@ public final class InitTimer implements >>>>>> } >>>>>> >>>>>> /** >>>>>> - * Specifies if initialization timing is enabled. >>>>>> + * Specifies if initialization timing is enabled. Note: this property cannot use >>>>>> + * {@code HotSpotJVMCIRuntime.getBooleanProperty} since that class is not visible from this >>>>>> + * package. >>>>>> */ >>>>>> - private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer") || Boolean.getBoolean("jvmci.runtime.TimeInit"); >>>>>> + private static final boolean ENABLED = Boolean.getBoolean("jvmci.inittimer"); >>>>>> >>>>>> public static final AtomicInteger nesting = ENABLED ? new AtomicInteger() : null; >>>>>> public static final String SPACES = " "; > From doug.simon at oracle.com Tue Jan 12 22:03:04 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 12 Jan 2016 23:03:04 +0100 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> Message-ID: > On 12 Jan 2016, at 22:39, Christian Thalinger wrote: > >> >> On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: >> >> If we?re going with an enum, you could put accessors directly in the enum: >> >> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >> >> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >> >> You could then type the value of the options and check the right accessor is used: >> >> public enum Option { >> ImplicitStableValues(boolean.class), >> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >> PrintConfig(boolean.class), >> PrintFlags(boolean.class), >> ShowFlags(boolean.class), >> TraceMethodDataFilter(String.class), >> TrustFinalDefaultFields(String.class); >> >> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. > > Excellent idea! I was also thinking about adding the default value to the enum. Can you do that without having to box the default value? -Doug From christian.thalinger at oracle.com Tue Jan 12 22:14:29 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 12 Jan 2016 12:14:29 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> Message-ID: <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> > On Jan 12, 2016, at 12:03 PM, Doug Simon wrote: > >> >> On 12 Jan 2016, at 22:39, Christian Thalinger wrote: >> >>> >>> On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: >>> >>> If we?re going with an enum, you could put accessors directly in the enum: >>> >>> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >>> >>> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >>> >>> You could then type the value of the options and check the right accessor is used: >>> >>> public enum Option { >>> ImplicitStableValues(boolean.class), >>> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >>> PrintConfig(boolean.class), >>> PrintFlags(boolean.class), >>> ShowFlags(boolean.class), >>> TraceMethodDataFilter(String.class), >>> TrustFinalDefaultFields(String.class); >>> >>> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. >> >> Excellent idea! I was also thinking about adding the default value to the enum. > > Can you do that without having to box the default value? No, we have to box but we can initialize all flags in the constructor: http://cr.openjdk.java.net/~twisti/8146820/webrev.02/ We will not have many flags so this should be alright. A PrintFlags looks like this: $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.PrintFlags=true InitGraal [List of JVMCI options] boolean ImplicitStableValues := true boolean InitTimer := false boolean PrintConfig := false boolean PrintFlags = true boolean ShowFlags := false String TraceMethodDataFilter := null String TrustFinalDefaultFields := true I?m almost tempted to move InitTimer to another package, like jdk.vm.ci.common ? > > -Doug From christian.thalinger at oracle.com Tue Jan 12 22:39:32 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 12 Jan 2016 12:39:32 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> Message-ID: > On Jan 12, 2016, at 12:14 PM, Christian Thalinger wrote: > >> >> On Jan 12, 2016, at 12:03 PM, Doug Simon wrote: >> >>> >>> On 12 Jan 2016, at 22:39, Christian Thalinger wrote: >>> >>>> >>>> On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: >>>> >>>> If we?re going with an enum, you could put accessors directly in the enum: >>>> >>>> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >>>> >>>> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >>>> >>>> You could then type the value of the options and check the right accessor is used: >>>> >>>> public enum Option { >>>> ImplicitStableValues(boolean.class), >>>> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >>>> PrintConfig(boolean.class), >>>> PrintFlags(boolean.class), >>>> ShowFlags(boolean.class), >>>> TraceMethodDataFilter(String.class), >>>> TrustFinalDefaultFields(String.class); >>>> >>>> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. >>> >>> Excellent idea! I was also thinking about adding the default value to the enum. >> >> Can you do that without having to box the default value? > > No, we have to box but we can initialize all flags in the constructor: > > http://cr.openjdk.java.net/~twisti/8146820/webrev.02/ > > We will not have many flags so this should be alright. A PrintFlags looks like this: > > $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.PrintFlags=true InitGraal > [List of JVMCI options] > boolean ImplicitStableValues := true > boolean InitTimer := false > boolean PrintConfig := false > boolean PrintFlags = true > boolean ShowFlags := false > String TraceMethodDataFilter := null > String TrustFinalDefaultFields := true ?and this is a bug, of course :-) > > I?m almost tempted to move InitTimer to another package, like jdk.vm.ci.common ? > >> >> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Jan 13 09:06:34 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 13 Jan 2016 10:06:34 +0100 Subject: RFR(S): 8146792: Predicate moved after partial peel may lead to broken graph In-Reply-To: <56956304.30100@oracle.com> References: <56944C45.6060307@oracle.com> <37CD6E66-1ACA-4B40-A974-F1B6A3086C10@oracle.com> <78074704-CE75-4F50-9F53-22FEC75E836E@oracle.com> <56955696.2080501@oracle.com> <56956304.30100@oracle.com> Message-ID: >>> I am thinking that it is "always" safe to move pinned data node above original/dummy predicates (loop index variable is depending on limit check predicate, but we will never move index node from loop). We only needs to be sure that we move it (after partial peel, for example) before any dependent checks and data nodes are moved from the loop. Those checks and data will be inserted below it. >> >> I get it now and I think you?re right but it would need to be done for all data nodes which sounds like a mess. >> >>> Anyway, how rare this case? If it is vary rare I agree with your change since performance is not important. >> >> It?s very rare. I?ve seen it only once running the old CTW with the castPP change from 8139771. > > Okay then. Go with your changes - they are good enough. Thanks for the review. Roland. > > Thanks, > Vladimir > >> >> Roland. >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Actually, I can reproduce this scenario with the patch below: some changes to the test and making range check smearing a little big more aggressive so a range check is replaced by a dominating predicate range check. >>>> >>>> Roland. >>>> >>>> diff --git a/src/share/vm/opto/ifnode.cpp b/src/share/vm/opto/ifnode.cpp >>>> --- a/src/share/vm/opto/ifnode.cpp >>>> +++ b/src/share/vm/opto/ifnode.cpp >>>> @@ -514,7 +514,7 @@ >>>> // along the OOB path. Otherwise, it's possible that the user wrote >>>> // something which optimized to look like a range check but behaves >>>> // in some other way. >>>> - if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_range_check) == NULL) { >>>> + if (iftrap->is_uncommon_trap_proj(Deoptimization::Reason_none) == NULL) { >>>> return 0; >>>> } >>>> >>>> diff --git a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >>>> --- a/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >>>> +++ b/test/compiler/loopopts/BadPredicateAfterPartialPeel.java >>>> @@ -30,6 +30,8 @@ >>>> * >>>> */ >>>> >>>> +import java.util.Objects; >>>> + >>>> public class BadPredicateAfterPartialPeel { >>>> >>>> static void not_inlined1() {} >>>> @@ -45,13 +47,13 @@ >>>> boolean flag; >>>> int j; >>>> >>>> - static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) { >>>> + static void m(BadPredicateAfterPartialPeel o1, BadPredicateAfterPartialPeel o2, BadPredicateAfterPartialPeel o, int i4) throws Exception { >>>> int i1 = 1; >>>> >>>> // To delay partial peeling to the loop opts pass right before CCP >>>> - int i2 = 0; >>>> - for (; i2 < 10; i2 += i1); >>>> - i2 = i2 / 10; >>>> + int i2 = 1; >>>> + // for (; i2 < 10; i2 += i1); >>>> + // i2 = i2 / 10; >>>> >>>> // Simplified during CCP: >>>> int i3 = 2; >>>> @@ -63,11 +65,12 @@ >>>> >>>> not_inlined1(); >>>> >>>> - array[0] = -1; >>>> do { >>>> // peeled section starts here >>>> o.flag = false; >>>> o.j = 0; >>>> + >>>> + Objects.checkIndex(0, array.length, null); >>>> >>>> if (b) { >>>> // The following store will be pinned between >>>> @@ -300,7 +303,7 @@ >>>> not_inlined4(); >>>> } >>>> >>>> - static public void main(String[] args) { >>>> + static public void main(String[] args) throws Exception { >>>> BadPredicateAfterPartialPeel o1 = new BadPredicateAfterPartialPeel(); >>>> BadPredicateAfterPartialPeel o2 = new BadPredicateAfterPartialPeel(); >>>> for (int i = 0; i < 20000; i++) { >>>> >>>> >> From martin.doerr at sap.com Wed Jan 13 10:38:51 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 13 Jan 2016 10:38:51 +0000 Subject: RFR(S): 8146978: PPC64: Fix build after integration of C++ interpreter removal Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228B502@DEWDFEMB19C.global.corp.sap> Hi, the file register_ppc.hpp didn't merge correctly. Webrev to fix the build is here: http://cr.openjdk.java.net/~mdoerr/8146978_PPC64_fix_build/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Wed Jan 13 10:45:09 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 13 Jan 2016 10:45:09 +0000 Subject: RFR(S): 8146978: PPC64: Fix build after integration of C++ interpreter removal In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228B502@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116567228B502@DEWDFEMB19C.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41F13C62@DEWDFEMB12A.global.corp.sap> Hi Martin, thanks for doing this fix. Please push it soon, as all other repos pulling from hs will break, too. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Mittwoch, 13. Januar 2016 11:39 > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR(S): 8146978: PPC64: Fix build after integration of C++ interpreter > removal > > Hi, > > > > the file register_ppc.hpp didn't merge correctly. > > > > Webrev to fix the build is here: > > http://cr.openjdk.java.net/~mdoerr/8146978_PPC64_fix_build/webrev.00/ > > > > Please review. > > > > Best regards, > > Martin > > From edward.nevill at gmail.com Wed Jan 13 11:40:06 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 13 Jan 2016 11:40:06 +0000 Subject: RFR: 8146843: aarch64: add scheduling support for FP and vector instructions Message-ID: <1452685206.14278.16.camel@mylittlepony.linaroharston> Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8146843/webrev.1 This adds support for OptoScheduling of FP & Vector (Neon) instructions on aarch64 (aarch64 already has support for scheduling of scalar instructions). The following table shows the performance difference of this change. http://cr.openjdk.java.net/~enevill/8146843/vectest.html Note that the pipeline scheduling used in this change is based on partner C hardware because that is the only hardware I have the micro architecture details for. Unsurprisingly the performance difference is most noticeable for the in order cores (B & D). In a few cases the performance is worse. This seems to be due to it mis-scheduling data processing instructions at the cost of load store instructions on out of order cores. However I think that the overall performance improvement makes this change worthwhile. It may be possible in a future change to predicate individual pipeline classes on the core it is being run on, however this could rapidly lead to explosion in the size of aarch64.ad. Alternatively we could do some coarser predication on In Order vs Out of Order. I have tested the change with jtreg hotspot and langtools with the following results. Before: Hotspot: Test results: passed: 1,066; failed: 15; error: 18 Langtools: Test results: passed: 3,358; failed: 1; error: 4 After: Hotspot: Test results: passed: 1,073; failed: 11; error: 15 Langtools: Test results: passed: 3,358; failed: 1; error: 4 Thanks for the review, Ed. From tobias.hartmann at oracle.com Wed Jan 13 12:00:58 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 13 Jan 2016 13:00:58 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <569552EE.8050809@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> <569409C5.2040805@oracle.com> <569506CA.8040001@oracle.com> <569552EE.8050809@oracle.com> Message-ID: <56963C7A.8040203@oracle.com> Thanks, Vladimir. On 12.01.2016 20:24, Vladimir Kozlov wrote: >> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. > > Yes, that is right solution here. I changed the implementation to only capture the byte[] and char[] memory: http://cr.openjdk.java.net/~thartmann/8144212/webrev.03/ The method GraphKit::capture_memory(src_type, dst_type) returns a new MergeMemNode if the src and dst types are different, merging the two. Best, Tobias > On 1/12/16 5:59 AM, Tobias Hartmann wrote: >> On 11.01.2016 21:00, Vladimir Kozlov wrote: >>> On 1/11/16 7:20 AM, Tobias Hartmann wrote: >>>> On 08.01.2016 20:41, Vladimir Kozlov wrote: >>>>> On 1/8/16 2:37 AM, Tobias Hartmann wrote: >>>>>> On 07.01.2016 21:49, Vladimir Kozlov wrote: >>>>>>> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>>>>>>> Hi Vladimir, >>>>>>>> >>>>>>>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>>>>>>> Andrew is right. >>>>>>>> >>>>>>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >>>>>>> >>>>>>> Right. It was the root of this bug, see below. >>>>>>> >>>>>>>> >>>>>>>> I fixed this for the inflate and compress intrinsics. >>>>>>>> >>>>>>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>>>>>>> StrInflatedCopyNode is not memory node. >>>>>>>> >>>>>>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >>>>>>> >>>>>>> I did not get the question. Is it before your webrev.01 change? Or even with the change? >>>>>> >>>>>> I meant with webrev.01 but you answered my question below. >>>>>> >>>>>>>> // This class defines a projection of the memory state of a store conditional node. >>>>>>>> // These nodes return a value, but also update memory. >>>>>>>> >>>>>>>> But inflate does not return any value. >>>>>>> >>>>>>> Hmm, according to bottom type inflate produce memory: >>>>>>> >>>>>>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >>>>>>> >>>>>>> So it really does not need SCMemProjNode. Sorry about that. >>>>>>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. >>>>>> >>>>>> Exactly. >>>>>> >>>>>>> Instead of SCMemProjNode we should have to change the idx of your dst_type: >>>>>>> >>>>>>> set_memory(str, dst_type); >>>>>> >>>>>> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. >>>>>> >>>>>>> And you should rollback part of changes in escape.cpp and macro.cpp. >>>>>> >>>>>> Okay, I'll to that. >>>>>> >>>>>>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >>>>>>> >>>>>>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. >>>>>> >>>>>> Okay, should we then use BOTTOM for both the input and output type? >>>>> >>>>> Only input. Output type corresponds to dst array type which you set correctly now. >>>> >>>> It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case: >>>> StoreC >>>> inflate_string >>>> LoadC >>>> >>>> The memory graph (def->use) now looks like this: >>>> LoadC -> inflate_string -> ByteMem >>>> ... StoreC-> CharMem >>> >>> I did not get this. If StoreC node is created before inflate_string - inflate_string should point to it be barrier for LoadC. >> >> Note that the StoreC and inflate_string are *not* writing to the same char[] array. The test looks like this: >> >> char c1[] = new char[1]; >> char c2[] = new char[1]; >> >> c2[0] = 42; >> // Inflate String from byte[] to char[] >> s.getChars(0, 1, c1, 0); >> // Read char[] memory written before inflation >> return c2[0]; >> >> The result should be 42. The problem is that inflate_string does not point to StoreC because inflate_string uses a byte[] as input and in this case also writes to a different char[]. Even if we set the input to BOTTOM, inflate_string points to 7 Parm (BOTTOM) but not to the char[] memory produced by 96 StoreC: >> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >> >> 349 LoadUS then reads from the output char[] memory of inflate_string which does not include the result of StoreC. The test fails because the return value is != 42. >> >> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. >> >>> If StoreC followed inflate_string and LoadC followed StoreC - LoadC should point to StoreC. If LoadC does not follow StoreC then result is relaxed. >> >> Yes, these cases work fine. >> >> Thanks, >> Tobias >> >>>> The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2'). >>>> >>>> Setting the input to BOTTOM, generates the following graph: >>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >>>> The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails. >>>> >>>> I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this: >>>> LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem)) >>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png >>>> >>>> Here is the new webrev: >>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/ >>>> Probably, we could also only capture the byte and char slices instead of merging everything. What do you think? >>>> >>>> Best, >>>> Tobias >>>> >>>>>>>> Related question: >>>>>>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>>>>>>> char[int:>=0]:exact+any * >>>>>>>> >>>>>>>> which is equal to the type of the char load. >>>>>>> >>>>>>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? >>>>>> >>>>>> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: >>>>>> >>>>>> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) >>>>>> // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>> // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) >>>>>> // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) >>>>>> // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>> >>>>>> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. >>>>>> >>>>>>> Should we also be more careful in inflate_string_slow()? Is it used? >>>>>> >>>>>> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >>>>>> >>>>>>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>>>>>>> char[int:1]:NotNull:exact * >>>>>>>> >>>>>>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >>>>>>> >>>>>>> It is indeed strange. What memory type of LoadUS? It could be bug. >>>>>> >>>>>> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. >>>>>> >>>>>> I will look into this again and try to understand what happens. >>>>> >>>>> It could that aryptr is pointer to array and load type is pointer to array's element. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Thanks, >>>>>> Tobias >>>>>> >>>>>>>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>>>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>>>>>>> >>>>>>>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>>>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>>>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>>>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>>>>>>> >>>>>>>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>>>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>>>>>>> // compiler because the CPU does all the ordering for us. >>>>>>>>>>> >>>>>>>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>>>>>>> memory ordering, loads are never moved before dependent stores. >>>>>>>>>>> >>>>>>>>>>> Or did I misunderstand your question? >>>>>>>>>> >>>>>>>>>> No, I don't think so. I was just checking: I am very aware that >>>>>>>>>> HotSpot has presented those of use with relaxed memory order machines >>>>>>>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>>>>>>> surprised that C2 needs this barrier, given that there is a >>>>>>>>>> read-after-write dependency, but never mind. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Andrew. >>>>>>>>>> From vladimir.x.ivanov at oracle.com Wed Jan 13 13:53:18 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 13 Jan 2016 16:53:18 +0300 Subject: [9] RFR (XS): 8146983: C1: assert(appendix.not_null()) failed for invokehandle bytecode Message-ID: <569656CE.9050602@oracle.com> http://cr.openjdk.java.net/~vlivanov/8146983/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8146983 Assertion added in 8140659 is too strong: sometimes appendix patching for invokehandle happens when there's no appendix argument. Appendix is present only for MH::invoke (_invokeGeneric; see LinkResolver::lookup_polymorphic_method). MH::invokeExact doesn't have an appendix. But C1 uses patching for all unresolved invokedynamic & invokehandle call sites (see GraphBuilder::invoke and Bytecodes::has_optional_appendix). The fix is to remove the assertion. Also, fixed a leftover from 8140659: appendix resolution for invokehandle should be idempotent as well. Testing: JPRT Best regards, Vladimir Ivanov From pavel.punegov at oracle.com Wed Jan 13 14:21:16 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Wed, 13 Jan 2016 17:21:16 +0300 Subject: RFR (XXS): 8145025: compiler/compilercontrol/commandfile/CompileOnlyTest.java and compiler/compilercontrol/commands/CompileOnlyTest.java fail: java.lang.RuntimeException: FAILED: method ... compilable: false, but should: true In-Reply-To: <037269E6-9A07-4436-86E5-3E19D260D063@oracle.com> References: <567288EC.3020001@oracle.com> <037269E6-9A07-4436-86E5-3E19D260D063@oracle.com> Message-ID: <31DAC8F9-82C5-4C62-922B-B630B5BF6450@oracle.com> Anyone else to review, please? ? Thanks, Pavel Punegov > On 17 Dec 2015, at 16:44, Pavel Punegov wrote: > > Thanks for review, Nils > > ? Pavel. > >> On 17 Dec 2015, at 13:05, Nils Eliasson > wrote: >> >> Hi Pavel, >> >> Looks good. >> >> //Nils >> >> On 2015-12-16 20:56, Pavel Punegov wrote: >>> Please review this small fix to a test bug. >>> >>> Issue: when test builds a state for a method that doesn?t match any compileonly command it should consider that this method wasn?t set compiled/excluded with any other compileonly or exclude commands. This means that it should check that appropriate Optional is not present (isn?t set). >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8145025 >>> webrev: http://cr.openjdk.java.net/~ppunegov/8145025/webrev.00/ >>> ? Thanks, >>> Pavel Punegov >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Jan 13 14:35:03 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 13 Jan 2016 15:35:03 +0100 Subject: [9] RFR (XS): 8146983: C1: assert(appendix.not_null()) failed for invokehandle bytecode In-Reply-To: <569656CE.9050602@oracle.com> References: <569656CE.9050602@oracle.com> Message-ID: <93A56FDC-A240-48BF-A8CF-CB1C0B2D5D0A@oracle.com> > http://cr.openjdk.java.net/~vlivanov/8146983/webrev.00 Looks good to me. Roland. From zoltan.majo at oracle.com Wed Jan 13 15:02:22 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 13 Jan 2016 16:02:22 +0100 Subject: [9] RFR (S): 8071864: compiler/c2/6772683/InterruptedTest.java failed in nightly Message-ID: <569666FE.1010007@oracle.com> Hi, please review the patch for 8071864. https://bugs.openjdk.java.net/browse/JDK-8071864 Problem: The test runs using two threads: The main thread and a worker thread. Before exiting, the main thread interrupts the worker thread. Then, the main thread waits a limited amount of time for the worker thread to exit. On highly loaded systems it can happen that the OS does not provide CPU time to the worker thread to exit in the limited amount of time available. In this case the test fails. Solution: Increase the amount of time the main thread waits for the worker thread. Webrev: http://cr.openjdk.java.net/~zmajo/8071864/webrev.00/ Testing: - executed test on a highly loaded system: Without the fix, the test fails after 66 iterations; with the fix it was possible to execute the test 1000 iteration without a failure; - JPRT. Thank you and best regards, Zoltan From aph at redhat.com Wed Jan 13 15:09:51 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 13 Jan 2016 15:09:51 +0000 Subject: RFR: 8146843: aarch64: add scheduling support for FP and vector instructions In-Reply-To: <1452685206.14278.16.camel@mylittlepony.linaroharston> References: <1452685206.14278.16.camel@mylittlepony.linaroharston> Message-ID: <569668BF.5070000@redhat.com> On 01/13/2016 11:40 AM, Edward Nevill wrote: > The following table shows the performance difference of this change. > > http://cr.openjdk.java.net/~enevill/8146843/vectest.html OK; this is generally positive. Andrew. From roland.westrelin at oracle.com Wed Jan 13 15:10:35 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 13 Jan 2016 16:10:35 +0100 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure Message-ID: http://cr.openjdk.java.net/~roland/8146999/webrev.00/ 8139771 made CheckCastPP inherit from ConstraintCast but the is_ConstraintCast() fails for CheckCastPP and as a consequence so does uncast(). Roland. From tobias.hartmann at oracle.com Wed Jan 13 15:15:05 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 13 Jan 2016 16:15:05 +0100 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure In-Reply-To: References: Message-ID: <569669F9.2030607@oracle.com> Hi Roland, looks good to me (not a reviewer). Best, Tobias On 13.01.2016 16:10, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8146999/webrev.00/ > > 8139771 made CheckCastPP inherit from ConstraintCast but the is_ConstraintCast() fails for CheckCastPP and as a consequence so does uncast(). > > Roland. > From vladimir.x.ivanov at oracle.com Wed Jan 13 15:54:21 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 13 Jan 2016 18:54:21 +0300 Subject: [9] RFR (XS): 8146983: C1: assert(appendix.not_null()) failed for invokehandle bytecode In-Reply-To: <93A56FDC-A240-48BF-A8CF-CB1C0B2D5D0A@oracle.com> References: <569656CE.9050602@oracle.com> <93A56FDC-A240-48BF-A8CF-CB1C0B2D5D0A@oracle.com> Message-ID: <5696732D.8070408@oracle.com> Thanks, Roland! Best regards, Vladimir Ivanov On 1/13/16 5:35 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~vlivanov/8146983/webrev.00 > > Looks good to me. > > Roland. > From vladimir.kozlov at oracle.com Wed Jan 13 18:09:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Jan 2016 10:09:21 -0800 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure In-Reply-To: References: Message-ID: <569692D1.4040001@oracle.com> Right. Thanks, Vladimir On 1/13/16 7:10 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8146999/webrev.00/ > > 8139771 made CheckCastPP inherit from ConstraintCast but the is_ConstraintCast() fails for CheckCastPP and as a consequence so does uncast(). > > Roland. > From vladimir.kozlov at oracle.com Wed Jan 13 18:10:52 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Jan 2016 10:10:52 -0800 Subject: [9] RFR (S): 8071864: compiler/c2/6772683/InterruptedTest.java failed in nightly In-Reply-To: <569666FE.1010007@oracle.com> References: <569666FE.1010007@oracle.com> Message-ID: <5696932C.1080602@oracle.com> Good. Thanks, Vladimir On 1/13/16 7:02 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8071864. > > https://bugs.openjdk.java.net/browse/JDK-8071864 > > Problem: The test runs using two threads: The main thread and a worker thread. Before exiting, the main thread > interrupts the worker thread. Then, the main thread waits a limited amount of time for the worker thread to exit. > > On highly loaded systems it can happen that the OS does not provide CPU time to the worker thread to exit in the limited > amount of time available. In this case the test fails. > > Solution: Increase the amount of time the main thread waits for the worker thread. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8071864/webrev.00/ > > Testing: > - executed test on a highly loaded system: Without the fix, the test fails after 66 iterations; with the fix it was > possible to execute the test 1000 iteration without a failure; > - JPRT. > > Thank you and best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Wed Jan 13 18:16:48 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Jan 2016 10:16:48 -0800 Subject: RFR (XXS): 8145025: compiler/compilercontrol/commandfile/CompileOnlyTest.java and compiler/compilercontrol/commands/CompileOnlyTest.java fail: java.lang.RuntimeException: FAILED: method ... compilable: false, but should: true In-Reply-To: <31DAC8F9-82C5-4C62-922B-B630B5BF6450@oracle.com> References: <567288EC.3020001@oracle.com> <037269E6-9A07-4436-86E5-3E19D260D063@oracle.com> <31DAC8F9-82C5-4C62-922B-B630B5BF6450@oracle.com> Message-ID: <56969490.7020300@oracle.com> Good. Thanks, Vladimir On 1/13/16 6:21 AM, Pavel Punegov wrote: > Anyone else to review, please? > > ? Thanks, > Pavel Punegov > >> On 17 Dec 2015, at 16:44, Pavel Punegov > wrote: >> >> Thanks for review, Nils >> >> ? Pavel. >> >>> On 17 Dec 2015, at 13:05, Nils Eliasson > wrote: >>> >>> Hi Pavel, >>> >>> Looks good. >>> >>> //Nils >>> >>> On 2015-12-16 20:56, Pavel Punegov wrote: >>>> Please review this small fix to a test bug. >>>> >>>> Issue: when test builds a state for a method that doesn?t match any compileonly command it should consider that this >>>> method wasn?t set compiled/excluded with any other compileonly or exclude commands. This means that it should check >>>> that appropriate Optional is not present (isn?t set). >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8145025 >>>> webrev: http://cr.openjdk.java.net/~ppunegov/8145025/webrev.00/ >>>> >>>> >>>> ? Thanks, >>>> Pavel Punegov >>>> >>> >> > From roland.westrelin at oracle.com Wed Jan 13 19:05:03 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 13 Jan 2016 20:05:03 +0100 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure In-Reply-To: <569692D1.4040001@oracle.com> References: <569692D1.4040001@oracle.com> Message-ID: Thanks Tobias, Vladimir for the review. Roland. From ahmed.khawaja at oracle.com Wed Jan 13 20:51:28 2016 From: ahmed.khawaja at oracle.com (Ahmed Khawaja) Date: Wed, 13 Jan 2016 14:51:28 -0600 Subject: Accessing Addresses of object members from interp/C1 Message-ID: <5696B8D0.4010408@oracle.com> I am working on adding/modifying some intrinsics that can get called from the interpreter and C1. I need to pass the address of a member variable (an array of ints) to each of the intrinsics. I know how to do this with C2 but not interp/C1. Can anyone point me in the right direction? For the interpreter: AbstractInterpreter::MethodKind kind -> need address from this For C1: Intrinsic* x -> need address from this Thank you, Ahmed From christian.thalinger at oracle.com Wed Jan 13 23:09:15 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 13 Jan 2016 13:09:15 -1000 Subject: RFR: 8146788: remove jvmci.jar from mx suite In-Reply-To: References: Message-ID: <6C96000A-BEFD-4CE8-9C2E-70C155283BEE@oracle.com> I?m not sure about this. One reason to have the monolithic jvmci.jar is to not have to build the JVMCI on command line when you make a change in your IDE: cthaling at macbook:~/ws/jdk9/hs-comp/hotspot$ mx -v vm -version /Users/cthaling/ws/jdk9/hs-comp/build/macosx-x86_64-normal-server-release/jdk/bin/java -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -d64 -Xbootclasspath/p:/Users/cthaling/ws/jdk9/hs-comp/build/mx/hotspot/dists/jvmci.jar -version java version "9-internal" Java(TM) SE Runtime Environment (build 9-internal+0-2016-01-11-180948.cthaling.hs-comp) Java HotSpot(TM) 64-Bit Server VM (build 9-internal+0-2016-01-11-180948.cthaling.hs-comp, mixed mode) > On Jan 11, 2016, at 4:05 AM, Doug Simon wrote: > > Please this small change to remove generation of a jvmci.jar by the mx JVMCI build system. > > https://bugs.openjdk.java.net/browse/JDK-8146788 > http://cr.openjdk.java.net/~dnsimon/8146788/ > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Wed Jan 13 23:24:55 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 14 Jan 2016 00:24:55 +0100 Subject: RFR: 8146788: remove jvmci.jar from mx suite In-Reply-To: <6C96000A-BEFD-4CE8-9C2E-70C155283BEE@oracle.com> References: <6C96000A-BEFD-4CE8-9C2E-70C155283BEE@oracle.com> Message-ID: <600A5EE2-9B76-4914-A747-AE0FF1CD316B@oracle.com> Fair enough. This was mainly removed because it?s not used by Graal any more. But you?re right, it?s still useful when hacking on JVMCI itself from within Eclipse. How to I withdraw this JBS issue? Close it with ?Won?t fix?? > On 14 Jan 2016, at 00:09, Christian Thalinger wrote: > > I?m not sure about this. One reason to have the monolithic jvmci.jar is to not have to build the JVMCI on command line when you make a change in your IDE: > > cthaling at macbook:~/ws/jdk9/hs-comp/hotspot$ mx -v vm -version > /Users/cthaling/ws/jdk9/hs-comp/build/macosx-x86_64-normal-server-release/jdk/bin/java -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -d64 -Xbootclasspath/p:/Users/cthaling/ws/jdk9/hs-comp/build/mx/hotspot/dists/jvmci.jar -version > java version "9-internal" > Java(TM) SE Runtime Environment (build 9-internal+0-2016-01-11-180948.cthaling.hs-comp) > Java HotSpot(TM) 64-Bit Server VM (build 9-internal+0-2016-01-11-180948.cthaling.hs-comp, mixed mode) > >> On Jan 11, 2016, at 4:05 AM, Doug Simon wrote: >> >> Please this small change to remove generation of a jvmci.jar by the mx JVMCI build system. >> >> https://bugs.openjdk.java.net/browse/JDK-8146788 >> http://cr.openjdk.java.net/~dnsimon/8146788/ >> >> -Doug > From christian.thalinger at oracle.com Wed Jan 13 23:30:34 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 13 Jan 2016 13:30:34 -1000 Subject: RFR: 8146788: remove jvmci.jar from mx suite In-Reply-To: <600A5EE2-9B76-4914-A747-AE0FF1CD316B@oracle.com> References: <6C96000A-BEFD-4CE8-9C2E-70C155283BEE@oracle.com> <600A5EE2-9B76-4914-A747-AE0FF1CD316B@oracle.com> Message-ID: > On Jan 13, 2016, at 1:24 PM, Doug Simon wrote: > > Fair enough. This was mainly removed because it?s not used by Graal any more. But you?re right, it?s still useful when hacking on JVMCI itself from within Eclipse. > > How to I withdraw this JBS issue? Close it with ?Won?t fix?? There is ?Withdrawn? which is technically for JEPs. ?Won?t Fix? works. > >> On 14 Jan 2016, at 00:09, Christian Thalinger wrote: >> >> I?m not sure about this. One reason to have the monolithic jvmci.jar is to not have to build the JVMCI on command line when you make a change in your IDE: >> >> cthaling at macbook:~/ws/jdk9/hs-comp/hotspot$ mx -v vm -version >> /Users/cthaling/ws/jdk9/hs-comp/build/macosx-x86_64-normal-server-release/jdk/bin/java -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -d64 -Xbootclasspath/p:/Users/cthaling/ws/jdk9/hs-comp/build/mx/hotspot/dists/jvmci.jar -version >> java version "9-internal" >> Java(TM) SE Runtime Environment (build 9-internal+0-2016-01-11-180948.cthaling.hs-comp) >> Java HotSpot(TM) 64-Bit Server VM (build 9-internal+0-2016-01-11-180948.cthaling.hs-comp, mixed mode) >> >>> On Jan 11, 2016, at 4:05 AM, Doug Simon wrote: >>> >>> Please this small change to remove generation of a jvmci.jar by the mx JVMCI build system. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8146788 >>> http://cr.openjdk.java.net/~dnsimon/8146788/ >>> >>> -Doug >> > From christian.thalinger at oracle.com Thu Jan 14 05:58:58 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 13 Jan 2016 19:58:58 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> Message-ID: <7C1CBFFE-9A7C-4195-A8EA-BD7B94092E4F@oracle.com> > On Jan 12, 2016, at 12:39 PM, Christian Thalinger wrote: > >> >> On Jan 12, 2016, at 12:14 PM, Christian Thalinger > wrote: >> >>> >>> On Jan 12, 2016, at 12:03 PM, Doug Simon > wrote: >>> >>>> >>>> On 12 Jan 2016, at 22:39, Christian Thalinger > wrote: >>>> >>>>> >>>>> On Jan 12, 2016, at 10:14 AM, Doug Simon > wrote: >>>>> >>>>> If we?re going with an enum, you could put accessors directly in the enum: >>>>> >>>>> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >>>>> >>>>> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >>>>> >>>>> You could then type the value of the options and check the right accessor is used: >>>>> >>>>> public enum Option { >>>>> ImplicitStableValues(boolean.class), >>>>> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >>>>> PrintConfig(boolean.class), >>>>> PrintFlags(boolean.class), >>>>> ShowFlags(boolean.class), >>>>> TraceMethodDataFilter(String.class), >>>>> TrustFinalDefaultFields(String.class); >>>>> >>>>> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. >>>> >>>> Excellent idea! I was also thinking about adding the default value to the enum. >>> >>> Can you do that without having to box the default value? >> >> No, we have to box but we can initialize all flags in the constructor: >> >> http://cr.openjdk.java.net/~twisti/8146820/webrev.02/ Do we agree on the change? >> >> We will not have many flags so this should be alright. A PrintFlags looks like this: >> >> $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.PrintFlags=true InitGraal >> [List of JVMCI options] >> boolean ImplicitStableValues := true >> boolean InitTimer := false >> boolean PrintConfig := false >> boolean PrintFlags = true >> boolean ShowFlags := false >> String TraceMethodDataFilter := null >> String TrustFinalDefaultFields := true > > ?and this is a bug, of course :-) > >> >> I?m almost tempted to move InitTimer to another package, like jdk.vm.ci.common ? >> >>> >>> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Thu Jan 14 08:21:47 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 14 Jan 2016 09:21:47 +0100 Subject: [9] RFR (S): 8071864: compiler/c2/6772683/InterruptedTest.java failed in nightly In-Reply-To: <5696932C.1080602@oracle.com> References: <569666FE.1010007@oracle.com> <5696932C.1080602@oracle.com> Message-ID: <56975A9B.9010005@oracle.com> Thank you, Vladimir, for the review! Best regards, Zoltan On 01/13/2016 07:10 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 1/13/16 7:02 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8071864. >> >> https://bugs.openjdk.java.net/browse/JDK-8071864 >> >> Problem: The test runs using two threads: The main thread and a >> worker thread. Before exiting, the main thread >> interrupts the worker thread. Then, the main thread waits a limited >> amount of time for the worker thread to exit. >> >> On highly loaded systems it can happen that the OS does not provide >> CPU time to the worker thread to exit in the limited >> amount of time available. In this case the test fails. >> >> Solution: Increase the amount of time the main thread waits for the >> worker thread. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8071864/webrev.00/ >> >> Testing: >> - executed test on a highly loaded system: Without the fix, the test >> fails after 66 iterations; with the fix it was >> possible to execute the test 1000 iteration without a failure; >> - JPRT. >> >> Thank you and best regards, >> >> >> Zoltan >> From doug.simon at oracle.com Thu Jan 14 12:44:42 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 14 Jan 2016 13:44:42 +0100 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <7C1CBFFE-9A7C-4195-A8EA-BD7B94092E4F@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> <7C1CBFFE-9A7C-4195-A8EA-BD7B94092E4F@oracle.com> Message-ID: > On 14 Jan 2016, at 06:58, Christian Thalinger wrote: > >> >> On Jan 12, 2016, at 12:39 PM, Christian Thalinger wrote: >> >>> >>> On Jan 12, 2016, at 12:14 PM, Christian Thalinger wrote: >>> >>>> >>>> On Jan 12, 2016, at 12:03 PM, Doug Simon wrote: >>>> >>>>> >>>>> On 12 Jan 2016, at 22:39, Christian Thalinger wrote: >>>>> >>>>>> >>>>>> On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: >>>>>> >>>>>> If we?re going with an enum, you could put accessors directly in the enum: >>>>>> >>>>>> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >>>>>> >>>>>> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >>>>>> >>>>>> You could then type the value of the options and check the right accessor is used: >>>>>> >>>>>> public enum Option { >>>>>> ImplicitStableValues(boolean.class), >>>>>> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >>>>>> PrintConfig(boolean.class), >>>>>> PrintFlags(boolean.class), >>>>>> ShowFlags(boolean.class), >>>>>> TraceMethodDataFilter(String.class), >>>>>> TrustFinalDefaultFields(String.class); >>>>>> >>>>>> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. >>>>> >>>>> Excellent idea! I was also thinking about adding the default value to the enum. >>>> >>>> Can you do that without having to box the default value? >>> >>> No, we have to box but we can initialize all flags in the constructor: >>> >>> http://cr.openjdk.java.net/~twisti/8146820/webrev.02/ > > Do we agree on the change? I would prefer it if the value was lazy initialized (for non-AOT runtimes): /** * Supported JVMCI options. */ public enum Option { ImplicitStableValues(boolean.class, true), InitTimer(boolean.class, false), // Note: Not used (see InitTimer.ENABLED). PrintConfig(boolean.class, false), PrintFlags(boolean.class, false), ShowFlags(boolean.class, false), TraceMethodDataFilter(String.class, null), TrustFinalDefaultFields(String.class, true); /** * The prefix for system properties that are JVMCI options. */ private static final String JVMCI_OPTION_PROPERTY_PREFIX = "jvmci."; private final Class type; private Object value; private final Object defaultValue; private boolean isDefault; private Option(Class type, Object defaultValue) { assert Character.isUpperCase(name().charAt(0)) : "Option name must start with upper-case letter: " + name(); this.type = type; this.value = "UNINITIALIZED"; this.defaultValue = defaultValue; } private Object getValue() { if (value == "UNINITIALIZED") { String propertyValue = VM.getSavedProperty(JVMCI_OPTION_PROPERTY_PREFIX + name()); if (propertyValue == null) { this.value = defaultValue; this.isDefault = true; } else { if (type == boolean.class) { this.value = Boolean.parseBoolean(propertyValue); } else if (type == String.class) { this.value = propertyValue; } else { throw new JVMCIError("Unexpected option type " + type); } this.isDefault = false; } // Saved properties should not be interned - let?s be sure assert value != "UNINITIALIZED"; } return value; } /** * Returns the option's value as boolean. * * @return option's value */ public boolean getBoolean() { return (boolean) getValue(); } /** * Returns the option's value as String. * * @return option's value */ public String getString() { return (String) getValue(); } /** * Prints all option flags to {@code out}. * * @param out stream to print to */ public static void printFlags(PrintStream out) { out.println("[List of JVMCI options]"); for (Option option : values()) { Object value = option.getValue(); String assign = option.isDefault ? ":=" : " ="; out.printf("%9s %-40s %s %-14s%n", option.type.getSimpleName(), option, assign, value); } } } Also, you can remove all the static fields that just cache a (possibly unboxed) option value and use the option directly. For example: diff -r 1034ff44c5d0 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Tue Jan 12 15:04:27 2016 +0100 +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Thu Jan 14 13:40:28 2016 +0100 @@ -29,6 +29,7 @@ import java.lang.reflect.Field; import jdk.vm.ci.common.JVMCIError; +import jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.Option; import jdk.vm.ci.meta.JavaType; import jdk.vm.ci.meta.LocationIdentity; import jdk.vm.ci.meta.MetaAccessProvider; @@ -41,11 +42,6 @@ */ class HotSpotResolvedJavaFieldImpl implements HotSpotResolvedJavaField, HotSpotProxified { - /** - * Mark well-known stable fields as such. - */ - private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); - private final HotSpotResolvedObjectTypeImpl holder; private final String name; private JavaType type; @@ -198,7 +194,7 @@ return true; } assert getAnnotation(Stable.class) == null; - if (ImplicitStableValues && isImplicitStableField()) { + if (Option.ImplicitStableValues.getBoolean() && isImplicitStableField()) { return true; } return false; None of the current options are used in tight loops where the cost of the unboxing (if any) would matter. Lastly, since you?ve added PrintFlags and ShowFlags, why not add a help message to each option. For example: ImplicitStableValues(boolean.class, true, ?Mark well-known stable fields as such."), -Doug > >>> >>> We will not have many flags so this should be alright. A PrintFlags looks like this: >>> >>> $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.PrintFlags=true InitGraal >>> [List of JVMCI options] >>> boolean ImplicitStableValues := true >>> boolean InitTimer := false >>> boolean PrintConfig := false >>> boolean PrintFlags = true >>> boolean ShowFlags := false >>> String TraceMethodDataFilter := null >>> String TrustFinalDefaultFields := true >> >> ?and this is a bug, of course :-) >> >>> >>> I?m almost tempted to move InitTimer to another package, like jdk.vm.ci.common ? >>> >>>> >>>> -Doug From nils.eliasson at oracle.com Thu Jan 14 12:44:47 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Jan 2016 13:44:47 +0100 Subject: RFR(S): 8145331: SEGV in DirectivesStack::release(DirectiveSet*) Message-ID: <5697983F.3080509@oracle.com> Hi, Please review this patch: Description: In the fix for JDK-8144873 I updated only one of the two use cases of CompilerDirectives::get_for(AbstractCompiler..) Summary: I simplify CompilerDirectives::get_for(..) to always return the c1_store for all unsupported cases. Makes getMatchingDirective and getDefaultDirective simpler too. Moved refcount out of get_for(...) since it is not guaranteed to be used if updated here. Testing: All intrinsic tests and all compilercontrol tests in addition to testset hotspot. IntrinsicAvailableTest is updated to not check JVMCI compiler for intrinsics. IntrinsicDisabledTest.jtr doesn't work with JVMCI - no action taken NullCheckDroppingsTest.jtr doesn't work - since JVMCI doesn't support BackgroudCompilation - no action taken Bug: https://bugs.openjdk.java.net/browse/JDK-8145331 Webrev: http://cr.openjdk.java.net/~neliasso/8145331/webrev.01/ Regards, Nils Eliasson From pavel.punegov at oracle.com Thu Jan 14 14:55:52 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Thu, 14 Jan 2016 17:55:52 +0300 Subject: RFR (XXS): 8145025: compiler/compilercontrol/commandfile/CompileOnlyTest.java and compiler/compilercontrol/commands/CompileOnlyTest.java fail: java.lang.RuntimeException: FAILED: method ... compilable: false, but should: true In-Reply-To: <56969490.7020300@oracle.com> References: <567288EC.3020001@oracle.com> <037269E6-9A07-4436-86E5-3E19D260D063@oracle.com> <31DAC8F9-82C5-4C62-922B-B630B5BF6450@oracle.com> <56969490.7020300@oracle.com> Message-ID: Thanks for review, Vladimir. ? Pavel. > On 13 Jan 2016, at 21:16, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 1/13/16 6:21 AM, Pavel Punegov wrote: >> Anyone else to review, please? >> >> ? Thanks, >> Pavel Punegov >> >>> On 17 Dec 2015, at 16:44, Pavel Punegov > wrote: >>> >>> Thanks for review, Nils >>> >>> ? Pavel. >>> >>>> On 17 Dec 2015, at 13:05, Nils Eliasson > wrote: >>>> >>>> Hi Pavel, >>>> >>>> Looks good. >>>> >>>> //Nils >>>> >>>> On 2015-12-16 20:56, Pavel Punegov wrote: >>>>> Please review this small fix to a test bug. >>>>> >>>>> Issue: when test builds a state for a method that doesn?t match any compileonly command it should consider that this >>>>> method wasn?t set compiled/excluded with any other compileonly or exclude commands. This means that it should check >>>>> that appropriate Optional is not present (isn?t set). >>>>> >>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8145025 >>>>> webrev: http://cr.openjdk.java.net/~ppunegov/8145025/webrev.00/ >>>>> >>>>> >>>>> ? Thanks, >>>>> Pavel Punegov >>>>> >>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Jan 14 15:05:25 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 14 Jan 2016 18:05:25 +0300 Subject: [9] RFR (S): 8140001: _allocateInstance intrinsic does not throw InstantiationException for abstract classes and interfaces In-Reply-To: <56951A3E.7070805@oracle.com> References: <56951A3E.7070805@oracle.com> Message-ID: <5697B935.9020209@oracle.com> Any feedback, please? Best regards, Vladimir Ivanov On 1/12/16 6:22 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8140001/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8140001 > > EA can eliminate allocations of abstract classes or interfaces, thus > changing observable behavior of a program as the test case demonstrates. > > The fix is to always mark such allocations as escaping. > > Testing: failing test, JPRT. > > Thanks! > > Best regards, > Vladimir Ivanov From aleksey.shipilev at oracle.com Thu Jan 14 15:15:52 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 14 Jan 2016 18:15:52 +0300 Subject: [9] RFR (S): 8140001: _allocateInstance intrinsic does not throw InstantiationException for abstract classes and interfaces In-Reply-To: <5697B935.9020209@oracle.com> References: <56951A3E.7070805@oracle.com> <5697B935.9020209@oracle.com> Message-ID: <5697BBA8.8030901@oracle.com> Looks okay to me, but I think the property name should reflect Java terminology, e.g. "can_be_instantiated", "not is_allocatable"? $ javac AbstractSample.java [ERROR] AbstractSample.java:[36,9] AbstractSample.M is abstract; cannot be instantiated Thanks, -Aleksey On 01/14/2016 06:05 PM, Vladimir Ivanov wrote: > Any feedback, please? > > Best regards, > Vladimir Ivanov > > On 1/12/16 6:22 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8140001/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8140001 >> >> EA can eliminate allocations of abstract classes or interfaces, thus >> changing observable behavior of a program as the test case demonstrates. >> >> The fix is to always mark such allocations as escaping. >> >> Testing: failing test, JPRT. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Thu Jan 14 16:00:36 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 14 Jan 2016 17:00:36 +0100 Subject: [9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type Message-ID: <5697C624.7040201@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-6675699 http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/ *Problem* The problem is that ConvI2L nodes with a narrow type (used to convert integer array indices to long values) are not dependent on the corresponding range check that proves that the input value is always in the (integer-)range. As a result, the ConvI2L node may flow above the range check during loop optimizations and end up with an input that is not in its type range. The node is then replaced by TOP causing the data path to be eliminated. However, because there is no control dependency on the corresponding range check, the control path from the peeled iteration that uses the result of the ConvI2L may not be eliminated. We crash because we are potentially using a value that is not available. For example, TestLoopPeeling::testArrayAccess() triggers loop peeling because the loop contains an invariant check. The array store in line 66 is moved out of the loop and reachable from the peeled and old iterations of the loop. However, the array index computation consisting of a LShiftL(ConvI2L(Phi)) remains in each loop because it has loop variant usages and is not dependent on the range check that was moved out of the loop. The peeled iteration of the loop uses storeIndex == -1 causing the ConvI2L to be replaced by TOP because -1 is not in its [0, MAX_INT] range. The TOP is propagated downwards and ends up as one of the inputs to the Phi that merges the array index from the peeled and old loop exits. The Phi replaced by it's only remaining input and the store ends up using the index from the old iteration although it's still reachable from the peeled iteration. We crash because we potentially use the index value from the old iteration while coming from the peeled iterat! ion (of co urse, the range check would catch this at runtime). This problem may show up with array accesses but also with other code for which we emit a ConvI2L node with a narrow type. For example, array allocation uses a ConvI2L to convert the integer array size to a long value (see TestLoopPeeling::testArrayAllocation). We solved several different instances of this problem in the past with "workaround-fixes" that just disabled loop optimizations in special cases (see below). Such a workaround fix is not feasible to fix all potential occurrences of this problem. TestLoopPeeling.java crashes JDK 7, 8 and 9. *Solution* To make the ConvI2L dependent on a range check, I added code to emit a narrow CastII node with a control dependency on the range check that is then used as input to the ConvI2L. Like this, we explicitly express the dependency and prevent loop optimizations from moving the ConvI2L above the range check. To make sure that the impact is as small as possible, the range check dependent CastII nodes are removed right after loop optimizations. Further, all optimizations that depend on the old shape of array address computations are adapted to be aware of the CastII node. With the fix, we could now remove the following old "workaround-fixes": https://bugs.openjdk.java.net/browse/JDK-4781451 https://bugs.openjdk.java.net/browse/JDK-4799512 https://bugs.openjdk.java.net/browse/JDK-6659207 https://bugs.openjdk.java.net/browse/JDK-6663854 For reference, the individual patches can be found here: http://cr.openjdk.java.net/~thartmann/6675699/backouts/ However, performance evaluation showed that backing out the old fixes causes significant regressions. It seems that aggressive splitting of ConvI2L nodes through phis leads to less optimal code due to more register spilling. I suspect that additional changes to the loop optimizations are necessary and would therefore like to leave the workaround fixes in for now. I filed JDK-8145313 to remove them later. Like this, we also reduce the impact/risk when backporting this fix to JDK 8 and potentially JDK 7. Roland pointed out that the changes in ConvI2LNode::Ideal() could potentially be merged into the CastIINode::Ideal() optimization introduced by his fix for JDK-8145322. After some investigation it turned out that the CastII optimization does not only affect memory addressing but also other CastII(AddI(..)) graph shapes. Making it more generic has a broader impact and therefore needs more investigation. I filed JDK-8147394 for this. ConvI2L nodes with a narrow type are also emitted by intrinsics: - GraphKit::array_element_address() - PhaseMacroExpand::array_element_address() - ArrayCopyNode::prepare_array_copy() I was not able to reproduce the problem with intrinsics. It's also not easily possible to make the CastII node range check dependent here because the range check is not always available from within the intrinsic. *Testing* I did extensive testing to make sure the fix does not introduce correctness or performance issues. - Different RBT test suites [1] with and without -Xcomp. - Full run of multiple CTW suites. - Verified changes in "PhaseIdealLoop::match_fill_loop" (loopTransform.cpp) by manually checking the output of [2] with -XX:+TraceOptimizeFill. - Verified changes in "IfNode::improve_address_types" (ifnode.cpp) by manually checking the output of [3] with -XX:+PrintOptoAssembly to make sure all range checks are folded. - Verified changes in superword.cpp by comparing output with -XX:+TraceSuperWord. - Performance runs (Footprint, JMH-Javac, SPECjbb2005, SPECjvm2008, Startup, Volano) on x86 and SPARC showed no regression Thanks, Tobias [1] RBT test suites: - hotspot/test/:hotspot_all - noncolo.testlist - vm.compiler.testlist - vm.regression.testlist - nsk.regression.testlist - nsk.split_verifier.testlist - nsk.stress.testlist - nsk.stress.jck.testlist - jdk/test/:jdk_jfr - jdk/test/:svc_tools - jdk/test/:jdk_instrument - jdk/test/:jdk_lang - jdk/test/:jdk_svc - nashorn/test/:tier1 - nashorn/test/:tier2 - nashorn/test/:tier3 Only without -Xcomp: - Kitchensink - runThese - Weblogic12medrec [2] test/compiler/intrinsics/6982370/Test6982370.java [3] test/compiler/rangechecks/TestExplicitRangeChecks.java From vladimir.x.ivanov at oracle.com Thu Jan 14 17:59:10 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 14 Jan 2016 20:59:10 +0300 Subject: [9] RFR (XS): 6985422: flush the output streams before OnError commands In-Reply-To: <5695505F.7050005@oracle.com> References: <569548D4.2070707@oracle.com> <5695505F.7050005@oracle.com> Message-ID: <5697E1EE.4010704@oracle.com> Thank you, Vladimir. Best regards, Vladimir Ivanov On 1/12/16 10:13 PM, Vladimir Kozlov wrote: > Looks good. > > Vladimir K > > On 1/12/16 10:41 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/6985422/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-6985422 >> >> OnError commands are executed before hotspot log is finished. >> >> The fix is to finish the log before executing OnError commands. >> >> Also, I moved compilation replay data dumping logic before OnError >> processing, so compilation replay file is accessible >> from OnError commands as well. >> >> I verified the fix by triggering VM crash w/ -XX:+LogCompilation >> -XX:LogFile=hotspot.log -XX:OnError='cp hotspot.log >> hs.log' flags and checking that hs.log is complete. Without the fix >> the log is corrupted. >> >> Testing: manual, JPRT. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From vladimir.kozlov at oracle.com Thu Jan 14 18:29:55 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Jan 2016 10:29:55 -0800 Subject: Request for Reviews (S): JDK-8003585 strength reduce or eliminate range checks for power-of-two sized arrays In-Reply-To: <70FBA4CF-CF05-4232-AFEC-202E93BFA930@oracle.com> References: <440F2280-4B25-4AE6-A4F6-DDD4EB529636@oracle.com> <52FC129D.7040409@oracle.com> <52FE6A08.20400@oracle.com> <52FE7313.3060404@oracle.com> <530209A8.1020501@oracle.com> <38EE6922-0B9C-49A6-B54D-E78BA0EFECB1@oracle.com> <8232A81B-6B78-4F61-A8EC-1A3DF3938648@oracle.com> <70FBA4CF-CF05-4232-AFEC-202E93BFA930@oracle.com> Message-ID: <5697E923.6000908@oracle.com> I know it is duplication but CmpU creation should be under conditions otherwise you are creating and transforming dead node. + Node* ncmp = phase->transform(new CmpUNode(cmp1, cmp2)); + if (_test._test == BoolTest::le || _test._test == BoolTest::eq) { The test does not cover next conversions: + // Change (arraylength <= 0) or (arraylength == 0) + // into (arraylength u<= 0) + // Also change (arraylength != 0) into (arraylength u> 0) Thanks, Vladimir On 1/7/16 1:29 AM, Roland Westrelin wrote: > Can I get a review for this? > > Roland. > >> On Oct 5, 2015, at 12:51 PM, Roland Westrelin wrote: >> >> Here is a new webrev: >> >> http://cr.openjdk.java.net/~roland/8003585/webrev.01/ >> >> Roland. >> >>> On Oct 2, 2015, at 3:30 PM, Roland Westrelin wrote: >>> >>> Hi Chris, >>> >>>> Thanks for picking it up! It mostly looks good to me. (Not a Reviewer) >>> >>> Thanks for looking at this again. >>> >>>> What I really needed with my earlier webrev was some instructions as to what test to write -- since the Java corelibs can come across this optimization a lot (e.g. HashMap), I didn't have a good idea of what kind of test really needs to be written. >>>> >>>> A couple of issues with this webrev: >>>> >>>> 1. In subnode.cpp, line 1346: >>>> >>>> 1344 } else if (_test._test == BoolTest::lt && >>>> 1345 cmp2->Opcode() == Op_AddI && >>>> 1346 cmp2->in(2)->find_int_con(1)) { >>>> 1347 bound = cmp2->in(1); >>>> 1348 } >>>> >>>> I think it should be >>>> cmp2->in(2)->find_int_con(0) == 1 >>>> instead, because the value passed into this function is actually for a "fallback when no int constant is found". Passing the expected value (1) to it defeats the purpose. >>> >>> You?re right. Thanks for spotting that. >>> >>>> jint find_int_con(jint value_if_unknown) const { >>>> const TypeInt* t = find_int_type(); >>>> return (t != NULL && t->is_con()) ? t->get_con() : value_if_unknown; >>>> } >>>> >>>> 2. Formattign nitpick: could you please trim the spaces before the new's on lines 1368, 1369 and 1387 >>> >>> Sure. >>> >>> I?ll send an updated webrev. >>> >>> Roland. >>> >>>> >>>> Thanks, >>>> Kris (OpenJDK username: krismo) >>>> >>>> On Wed, Sep 30, 2015 at 1:34 AM, Roland Westrelin wrote: >>>> I?m picking that one up. Here is a new webrev: >>>> >>>> http://cr.openjdk.java.net/~roland/8003585/webrev.00/ >>>> >>>> The only change to c2 compared to the previous webrev is that ((x & m) u< m+1) is optimized the same way ((x & m) u<= m) is. Actually, I don?t think that C2 currently produces the ((x & m) u<= m) shape. The IfNode::fold_compares() logic produces the ((x & m) u< m+1) variant. I also added a test case to check the validity of the transformations and ran usual testing on the change. >>>> >>>> Roland. >> > From vladimir.kozlov at oracle.com Thu Jan 14 18:37:00 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Jan 2016 10:37:00 -0800 Subject: RFR(S): 8145331: SEGV in DirectivesStack::release(DirectiveSet*) In-Reply-To: <5697983F.3080509@oracle.com> References: <5697983F.3080509@oracle.com> Message-ID: <5697EACC.1020907@oracle.com> Good. Thanks, Vladimir On 1/14/16 4:44 AM, Nils Eliasson wrote: > Hi, > > Please review this patch: > > Description: > In the fix for JDK-8144873 I updated only one of the two use cases of CompilerDirectives::get_for(AbstractCompiler..) > > Summary: > I simplify CompilerDirectives::get_for(..) to always return the c1_store for all unsupported cases. Makes > getMatchingDirective and getDefaultDirective simpler too. Moved refcount out of get_for(...) since it is not guaranteed > to be used if updated here. > > Testing: > All intrinsic tests and all compilercontrol tests in addition to testset hotspot. > IntrinsicAvailableTest is updated to not check JVMCI compiler for intrinsics. > IntrinsicDisabledTest.jtr doesn't work with JVMCI - no action taken > NullCheckDroppingsTest.jtr doesn't work - since JVMCI doesn't support BackgroudCompilation - no action taken > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145331 > Webrev: http://cr.openjdk.java.net/~neliasso/8145331/webrev.01/ > > Regards, > Nils Eliasson From vladimir.kozlov at oracle.com Thu Jan 14 18:43:33 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Jan 2016 10:43:33 -0800 Subject: [9] RFR (S): 8140001: _allocateInstance intrinsic does not throw InstantiationException for abstract classes and interfaces In-Reply-To: <5697BBA8.8030901@oracle.com> References: <56951A3E.7070805@oracle.com> <5697B935.9020209@oracle.com> <5697BBA8.8030901@oracle.com> Message-ID: <5697EC55.2010507@oracle.com> The fix in EA (mark as escaping) is good. Thanks, Vladimir On 1/14/16 7:15 AM, Aleksey Shipilev wrote: > Looks okay to me, but I think the property name should reflect Java > terminology, e.g. "can_be_instantiated", "not is_allocatable"? > > $ javac AbstractSample.java > [ERROR] AbstractSample.java:[36,9] AbstractSample.M is abstract; cannot > be instantiated > > Thanks, > -Aleksey > > On 01/14/2016 06:05 PM, Vladimir Ivanov wrote: >> Any feedback, please? >> >> Best regards, >> Vladimir Ivanov >> >> On 1/12/16 6:22 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8140001/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8140001 >>> >>> EA can eliminate allocations of abstract classes or interfaces, thus >>> changing observable behavior of a program as the test case demonstrates. >>> >>> The fix is to always mark such allocations as escaping. >>> >>> Testing: failing test, JPRT. >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov > > From christian.thalinger at oracle.com Thu Jan 14 18:55:16 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Jan 2016 08:55:16 -1000 Subject: RFR(S): 8145331: SEGV in DirectivesStack::release(DirectiveSet*) In-Reply-To: <5697983F.3080509@oracle.com> References: <5697983F.3080509@oracle.com> Message-ID: It would be nice if we somehow could detect if we are using C2 or not. I mean this is sufficient: + // Dont bother check JVMCI compiler - returns false on all intrinsics. + if (!Boolean.valueOf(getVMOption("UseJVMCICompiler"))) { but we are doing the reverse test: we should be testing for isC2() not !isJVMCI(). Anyway, that is a different issue. This change looks good. > On Jan 14, 2016, at 2:44 AM, Nils Eliasson wrote: > > Hi, > > Please review this patch: > > Description: > In the fix for JDK-8144873 I updated only one of the two use cases of CompilerDirectives::get_for(AbstractCompiler..) > > Summary: > I simplify CompilerDirectives::get_for(..) to always return the c1_store for all unsupported cases. Makes getMatchingDirective and getDefaultDirective simpler too. Moved refcount out of get_for(...) since it is not guaranteed to be used if updated here. > > Testing: > All intrinsic tests and all compilercontrol tests in addition to testset hotspot. > IntrinsicAvailableTest is updated to not check JVMCI compiler for intrinsics. > IntrinsicDisabledTest.jtr doesn't work with JVMCI - no action taken > NullCheckDroppingsTest.jtr doesn't work - since JVMCI doesn't support BackgroudCompilation - no action taken > > Bug: https://bugs.openjdk.java.net/browse/JDK-8145331 > Webrev: http://cr.openjdk.java.net/~neliasso/8145331/webrev.01/ > > Regards, > Nils Eliasson From vladimir.kozlov at oracle.com Thu Jan 14 18:59:11 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 14 Jan 2016 10:59:11 -0800 Subject: [9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type In-Reply-To: <5697C624.7040201@oracle.com> References: <5697C624.7040201@oracle.com> Message-ID: <5697EFFF.90305@oracle.com> You have to update code for 8146999 changes when Roland push it. The only thing I don't like about changes is using #ifdef _LP64 for part of changes. I know where it is coming from (ConvI2L for loop indexing) but as you said ConvI2L could be generated in other cases too. Should the test cast->has_range_check() return 'false' in 32-bit? Thanks, Vladimir On 1/14/16 8:00 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-6675699 > http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/ > > *Problem* > The problem is that ConvI2L nodes with a narrow type (used to convert integer array indices to long values) are not dependent on the corresponding range check that proves that the input value is always in the (integer-)range. As a result, the ConvI2L node may flow above the range check during loop optimizations and end up with an input that is not in its type range. The node is then replaced by TOP causing the data path to be eliminated. However, because there is no control dependency on the corresponding range check, the control path from the peeled iteration that uses the result of the ConvI2L may not be eliminated. We crash because we are potentially using a value that is not available. > > For example, TestLoopPeeling::testArrayAccess() triggers loop peeling because the loop contains an invariant check. The array store in line 66 is moved out of the loop and reachable from the peeled and old iterations of the loop. However, the array index computation consisting of a LShiftL(ConvI2L(Phi)) remains in each loop because it has loop variant usages and is not dependent on the range check that was moved out of the loop. The peeled iteration of the loop uses storeIndex == -1 causing the ConvI2L to be replaced by TOP because -1 is not in its [0, MAX_INT] range. The TOP is propagated downwards and ends up as one of the inputs to the Phi that merges the array index from the peeled and old loop exits. The Phi replaced by it's only remaining input and the store ends up using the index from the old iteration although it's still reachable from the peeled iteration. We crash because we potentially use the index value from the old iteration while coming from the peeled iter! at! > ion (of co > > urse, the range check would catch this at runtime). > > This problem may show up with array accesses but also with other code for which we emit a ConvI2L node with a narrow type. For example, array allocation uses a ConvI2L to convert the integer array size to a long value (see TestLoopPeeling::testArrayAllocation). We solved several different instances of this problem in the past with "workaround-fixes" that just disabled loop optimizations in special cases (see below). Such a workaround fix is not feasible to fix all potential occurrences of this problem. TestLoopPeeling.java crashes JDK 7, 8 and 9. > > *Solution* > To make the ConvI2L dependent on a range check, I added code to emit a narrow CastII node with a control dependency on the range check that is then used as input to the ConvI2L. Like this, we explicitly express the dependency and prevent loop optimizations from moving the ConvI2L above the range check. > > To make sure that the impact is as small as possible, the range check dependent CastII nodes are removed right after loop optimizations. Further, all optimizations that depend on the old shape of array address computations are adapted to be aware of the CastII node. > > With the fix, we could now remove the following old "workaround-fixes": > https://bugs.openjdk.java.net/browse/JDK-4781451 > https://bugs.openjdk.java.net/browse/JDK-4799512 > https://bugs.openjdk.java.net/browse/JDK-6659207 > https://bugs.openjdk.java.net/browse/JDK-6663854 > For reference, the individual patches can be found here: > http://cr.openjdk.java.net/~thartmann/6675699/backouts/ > > However, performance evaluation showed that backing out the old fixes causes significant regressions. It seems that aggressive splitting of ConvI2L nodes through phis leads to less optimal code due to more register spilling. I suspect that additional changes to the loop optimizations are necessary and would therefore like to leave the workaround fixes in for now. I filed JDK-8145313 to remove them later. Like this, we also reduce the impact/risk when backporting this fix to JDK 8 and potentially JDK 7. > > Roland pointed out that the changes in ConvI2LNode::Ideal() could potentially be merged into the CastIINode::Ideal() optimization introduced by his fix for JDK-8145322. After some investigation it turned out that the CastII optimization does not only affect memory addressing but also other CastII(AddI(..)) graph shapes. Making it more generic has a broader impact and therefore needs more investigation. I filed JDK-8147394 for this. > > ConvI2L nodes with a narrow type are also emitted by intrinsics: > - GraphKit::array_element_address() > - PhaseMacroExpand::array_element_address() > - ArrayCopyNode::prepare_array_copy() > I was not able to reproduce the problem with intrinsics. It's also not easily possible to make the CastII node range check dependent here because the range check is not always available from within the intrinsic. > > *Testing* > I did extensive testing to make sure the fix does not introduce correctness or performance issues. > - Different RBT test suites [1] with and without -Xcomp. > - Full run of multiple CTW suites. > - Verified changes in "PhaseIdealLoop::match_fill_loop" (loopTransform.cpp) by manually checking the output of [2] with -XX:+TraceOptimizeFill. > - Verified changes in "IfNode::improve_address_types" (ifnode.cpp) by manually checking the output of [3] with -XX:+PrintOptoAssembly to make sure all range checks are folded. > - Verified changes in superword.cpp by comparing output with -XX:+TraceSuperWord. > - Performance runs (Footprint, JMH-Javac, SPECjbb2005, SPECjvm2008, Startup, Volano) on x86 and SPARC showed no regression > > Thanks, > Tobias > > [1] RBT test suites: > - hotspot/test/:hotspot_all > - noncolo.testlist > - vm.compiler.testlist > - vm.regression.testlist > - nsk.regression.testlist > - nsk.split_verifier.testlist > - nsk.stress.testlist > - nsk.stress.jck.testlist > - jdk/test/:jdk_jfr > - jdk/test/:svc_tools > - jdk/test/:jdk_instrument > - jdk/test/:jdk_lang > - jdk/test/:jdk_svc > - nashorn/test/:tier1 > - nashorn/test/:tier2 > - nashorn/test/:tier3 > Only without -Xcomp: > - Kitchensink > - runThese > - Weblogic12medrec > [2] test/compiler/intrinsics/6982370/Test6982370.java > [3] test/compiler/rangechecks/TestExplicitRangeChecks.java > From nils.eliasson at oracle.com Thu Jan 14 19:21:10 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Jan 2016 20:21:10 +0100 Subject: RFR(S): 8145331: SEGV in DirectivesStack::release(DirectiveSet*) In-Reply-To: References: <5697983F.3080509@oracle.com> Message-ID: <5697F526.5080804@oracle.com> On 2016-01-14 19:55, Christian Thalinger wrote: > It would be nice if we somehow could detect if we are using C2 or not. I mean this is sufficient: > > + // Dont bother check JVMCI compiler - returns false on all intrinsics. > + if (!Boolean.valueOf(getVMOption("UseJVMCICompiler"))) { > > but we are doing the reverse test: we should be testing for isC2() not !isJVMCI(). Yes, we should really get away from using the messy complevel-to-compiler-translation in these test and iterate over the available compilers instead. That would allow for having different compilers behind the JVMCI interface too. Thanks for having a look, Nils > > Anyway, that is a different issue. This change looks good. > >> On Jan 14, 2016, at 2:44 AM, Nils Eliasson wrote: >> >> Hi, >> >> Please review this patch: >> >> Description: >> In the fix for JDK-8144873 I updated only one of the two use cases of CompilerDirectives::get_for(AbstractCompiler..) >> >> Summary: >> I simplify CompilerDirectives::get_for(..) to always return the c1_store for all unsupported cases. Makes getMatchingDirective and getDefaultDirective simpler too. Moved refcount out of get_for(...) since it is not guaranteed to be used if updated here. >> >> Testing: >> All intrinsic tests and all compilercontrol tests in addition to testset hotspot. >> IntrinsicAvailableTest is updated to not check JVMCI compiler for intrinsics. >> IntrinsicDisabledTest.jtr doesn't work with JVMCI - no action taken >> NullCheckDroppingsTest.jtr doesn't work - since JVMCI doesn't support BackgroudCompilation - no action taken >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8145331 >> Webrev: http://cr.openjdk.java.net/~neliasso/8145331/webrev.01/ >> >> Regards, >> Nils Eliasson From nils.eliasson at oracle.com Thu Jan 14 19:21:33 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 14 Jan 2016 20:21:33 +0100 Subject: RFR(S): 8145331: SEGV in DirectivesStack::release(DirectiveSet*) In-Reply-To: <5697EACC.1020907@oracle.com> References: <5697983F.3080509@oracle.com> <5697EACC.1020907@oracle.com> Message-ID: <5697F53D.8030205@oracle.com> Thank you Vladimir! //Nils On 2016-01-14 19:37, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 1/14/16 4:44 AM, Nils Eliasson wrote: >> Hi, >> >> Please review this patch: >> >> Description: >> In the fix for JDK-8144873 I updated only one of the two use cases of >> CompilerDirectives::get_for(AbstractCompiler..) >> >> Summary: >> I simplify CompilerDirectives::get_for(..) to always return the >> c1_store for all unsupported cases. Makes >> getMatchingDirective and getDefaultDirective simpler too. Moved >> refcount out of get_for(...) since it is not guaranteed >> to be used if updated here. >> >> Testing: >> All intrinsic tests and all compilercontrol tests in addition to >> testset hotspot. >> IntrinsicAvailableTest is updated to not check JVMCI compiler for >> intrinsics. >> IntrinsicDisabledTest.jtr doesn't work with JVMCI - no action taken >> NullCheckDroppingsTest.jtr doesn't work - since JVMCI doesn't support >> BackgroudCompilation - no action taken >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8145331 >> Webrev: http://cr.openjdk.java.net/~neliasso/8145331/webrev.01/ >> >> Regards, >> Nils Eliasson From christian.thalinger at oracle.com Thu Jan 14 21:50:15 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Jan 2016 11:50:15 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> <7C1CBFFE-9A7C-4195-A8EA-BD7B94092E4F@oracle.com> Message-ID: > On Jan 14, 2016, at 2:44 AM, Doug Simon wrote: > >> >> On 14 Jan 2016, at 06:58, Christian Thalinger wrote: >> >>> >>> On Jan 12, 2016, at 12:39 PM, Christian Thalinger wrote: >>> >>>> >>>> On Jan 12, 2016, at 12:14 PM, Christian Thalinger wrote: >>>> >>>>> >>>>> On Jan 12, 2016, at 12:03 PM, Doug Simon wrote: >>>>> >>>>>> >>>>>> On 12 Jan 2016, at 22:39, Christian Thalinger wrote: >>>>>> >>>>>>> >>>>>>> On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: >>>>>>> >>>>>>> If we?re going with an enum, you could put accessors directly in the enum: >>>>>>> >>>>>>> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >>>>>>> >>>>>>> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >>>>>>> >>>>>>> You could then type the value of the options and check the right accessor is used: >>>>>>> >>>>>>> public enum Option { >>>>>>> ImplicitStableValues(boolean.class), >>>>>>> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >>>>>>> PrintConfig(boolean.class), >>>>>>> PrintFlags(boolean.class), >>>>>>> ShowFlags(boolean.class), >>>>>>> TraceMethodDataFilter(String.class), >>>>>>> TrustFinalDefaultFields(String.class); >>>>>>> >>>>>>> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. >>>>>> >>>>>> Excellent idea! I was also thinking about adding the default value to the enum. >>>>> >>>>> Can you do that without having to box the default value? >>>> >>>> No, we have to box but we can initialize all flags in the constructor: >>>> >>>> http://cr.openjdk.java.net/~twisti/8146820/webrev.02/ >> >> Do we agree on the change? > > I would prefer it if the value was lazy initialized (for non-AOT runtimes): It?s not different in AOT-land because these cannot be constants. > > Also, you can remove all the static fields that just cache a (possibly unboxed) option value and use the option directly. For example: > > diff -r 1034ff44c5d0 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java > --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Tue Jan 12 15:04:27 2016 +0100 > +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Thu Jan 14 13:40:28 2016 +0100 > @@ -29,6 +29,7 @@ > import java.lang.reflect.Field; > > import jdk.vm.ci.common.JVMCIError; > +import jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.Option; > import jdk.vm.ci.meta.JavaType; > import jdk.vm.ci.meta.LocationIdentity; > import jdk.vm.ci.meta.MetaAccessProvider; > @@ -41,11 +42,6 @@ > */ > class HotSpotResolvedJavaFieldImpl implements HotSpotResolvedJavaField, HotSpotProxified { > > - /** > - * Mark well-known stable fields as such. > - */ > - private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); > - > private final HotSpotResolvedObjectTypeImpl holder; > private final String name; > private JavaType type; > @@ -198,7 +194,7 @@ > return true; > } > assert getAnnotation(Stable.class) == null; > - if (ImplicitStableValues && isImplicitStableField()) { > + if (Option.ImplicitStableValues.getBoolean() && isImplicitStableField()) { > return true; > } > return false; > > None of the current options are used in tight loops where the cost of the unboxing (if any) would matter. Right. > > Lastly, since you?ve added PrintFlags and ShowFlags, why not add a help message to each option. For example: > > ImplicitStableValues(boolean.class, true, ?Mark well-known stable fields as such.?), We should. http://cr.openjdk.java.net/~twisti/8146820/webrev.03/ $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.ShowFlags=true InitGraal [List of JVMCI options] boolean ImplicitStableValues := true Mark well-known stable fields as such. boolean InitTimer := false Specifies if initialization timing is enabled. boolean PrintConfig := false Prints all HotSpotVMConfig fields. boolean PrintFlags := false Prints all JVMCI flags and exits. boolean ShowFlags = true Prints all JVMCI flags and continues. String TraceMethodDataFilter := null boolean TrustFinalDefaultFields := true Determines whether to treat final fields with default values as constant. From zoltan.majo at oracle.com Fri Jan 15 08:43:00 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Jan 2016 09:43:00 +0100 Subject: [9] RFR (XS): 8147441: unchecked pending exceptions in the WhiteBox API's implementation Message-ID: <5698B114.4020106@oracle.com> Hi, please review the patch for 8147441. https://bugs.openjdk.java.net/browse/JDK-8147441 Problem: The method codeBlob2objectArray is used by the implementation of the WB API to fill in an object array with information about a code blob. Although the codeBlob2objectArray method can cause various JNI exceptions, there are two code locations where the VM does not check for exceptions after codeBlob2objectArray returns. Solution: Add exception check to the above mentioned code locations. Webrev: http://cr.openjdk.java.net/~zmajo/8147441/webrev.00/ Testing: - JPRT; - all hotspot tests executed locally; all tests that pass with the default version pass with the fixed version as well. Thank you and best regards, Zoltan From tobias.hartmann at oracle.com Fri Jan 15 11:07:45 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Jan 2016 12:07:45 +0100 Subject: [9] RFR(XS): 8147444: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with NullPointerException in ClassFileInstaller Message-ID: <5698D301.4070309@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8147444 http://cr.openjdk.java.net/~thartmann/8147444/webrev.00/ The test compiler/jsr292/NonInlinedCall/RedefineTest.java fails in the ClassFileInstaller while trying to install jdk.test.lib.Asserts because the class is not imported and therefore not compiled. Because asserts are not used in this test, I removed the directive. Thanks, Tobias From zoltan.majo at oracle.com Fri Jan 15 11:14:58 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 15 Jan 2016 12:14:58 +0100 Subject: [9] RFR(XS): 8147444: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with NullPointerException in ClassFileInstaller In-Reply-To: <5698D301.4070309@oracle.com> References: <5698D301.4070309@oracle.com> Message-ID: <5698D4B2.6030300@oracle.com> Hi Tobias, this looks good to me! Thank you and best regards, Zoltan On 01/15/2016 12:07 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8147444 > http://cr.openjdk.java.net/~thartmann/8147444/webrev.00/ > > The test compiler/jsr292/NonInlinedCall/RedefineTest.java fails in the ClassFileInstaller while trying to install jdk.test.lib.Asserts because the class is not imported and therefore not compiled. Because asserts are not used in this test, I removed the directive. > > Thanks, > Tobias From tobias.hartmann at oracle.com Fri Jan 15 11:15:54 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Jan 2016 12:15:54 +0100 Subject: [9] RFR(XS): 8147444: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with NullPointerException in ClassFileInstaller In-Reply-To: <5698D4B2.6030300@oracle.com> References: <5698D301.4070309@oracle.com> <5698D4B2.6030300@oracle.com> Message-ID: <5698D4EA.7000909@oracle.com> Thanks, Zoltan! Best, Tobias On 15.01.2016 12:14, Zolt?n Maj? wrote: > Hi Tobias, > > > this looks good to me! > > Thank you and best regards, > > > Zoltan > > On 01/15/2016 12:07 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8147444 >> http://cr.openjdk.java.net/~thartmann/8147444/webrev.00/ >> >> The test compiler/jsr292/NonInlinedCall/RedefineTest.java fails in the ClassFileInstaller while trying to install jdk.test.lib.Asserts because the class is not imported and therefore not compiled. Because asserts are not used in this test, I removed the directive. >> >> Thanks, >> Tobias > From vladimir.x.ivanov at oracle.com Fri Jan 15 12:40:02 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Jan 2016 15:40:02 +0300 Subject: [9] RFR(XS): 8147444: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with NullPointerException in ClassFileInstaller In-Reply-To: <5698D301.4070309@oracle.com> References: <5698D301.4070309@oracle.com> Message-ID: <5698E8A2.7020205@oracle.com> Looks good. Thanks for fixing it! BTW can ClassFileInstaller be improved to check the input and report the problem in a meaningful way? NPE is useless when diagnosing such problems. Best regards, Vladimir Ivanov On 1/15/16 2:07 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8147444 > http://cr.openjdk.java.net/~thartmann/8147444/webrev.00/ > > The test compiler/jsr292/NonInlinedCall/RedefineTest.java fails in the ClassFileInstaller while trying to install jdk.test.lib.Asserts because the class is not imported and therefore not compiled. Because asserts are not used in this test, I removed the directive. > > Thanks, > Tobias > From tobias.hartmann at oracle.com Fri Jan 15 13:07:57 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Jan 2016 14:07:57 +0100 Subject: [9] RFR(XS): 8147444: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with NullPointerException in ClassFileInstaller In-Reply-To: <5698E8A2.7020205@oracle.com> References: <5698D301.4070309@oracle.com> <5698E8A2.7020205@oracle.com> Message-ID: <5698EF2D.1070306@oracle.com> Thanks, Vladimir. On 15.01.2016 13:40, Vladimir Ivanov wrote: > Looks good. Thanks for fixing it! > > BTW can ClassFileInstaller be improved to check the input and report the problem in a meaningful way? NPE is useless when diagnosing such problems. Sure, I changed the implementation to throw an exception: Execution failed: `main' threw exception: java.io.FileNotFoundException: jdk/test/lib/Asserts.class http://cr.openjdk.java.net/~thartmann/8147444/webrev.01/ Best, Tobias > > Best regards, > Vladimir Ivanov > > On 1/15/16 2:07 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8147444 >> http://cr.openjdk.java.net/~thartmann/8147444/webrev.00/ >> >> The test compiler/jsr292/NonInlinedCall/RedefineTest.java fails in the ClassFileInstaller while trying to install jdk.test.lib.Asserts because the class is not imported and therefore not compiled. Because asserts are not used in this test, I removed the directive. >> >> Thanks, >> Tobias >> From vladimir.x.ivanov at oracle.com Fri Jan 15 13:19:23 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Jan 2016 16:19:23 +0300 Subject: [9] RFR (S): 8140001: _allocateInstance intrinsic does not throw InstantiationException for abstract classes and interfaces In-Reply-To: <5697EC55.2010507@oracle.com> References: <56951A3E.7070805@oracle.com> <5697B935.9020209@oracle.com> <5697BBA8.8030901@oracle.com> <5697EC55.2010507@oracle.com> Message-ID: <5698F1DB.8070103@oracle.com> Vladimir, Aleksey, thanks for the review. I don't have a strong opinion about naming. can_be_instantiated looks fine. Will do renaming before the push. Best regards, Vladimir Ivanov On 1/14/16 9:43 PM, Vladimir Kozlov wrote: > The fix in EA (mark as escaping) is good. > > Thanks, > Vladimir > > On 1/14/16 7:15 AM, Aleksey Shipilev wrote: >> Looks okay to me, but I think the property name should reflect Java >> terminology, e.g. "can_be_instantiated", "not is_allocatable"? >> >> $ javac AbstractSample.java >> [ERROR] AbstractSample.java:[36,9] AbstractSample.M is abstract; cannot >> be instantiated >> >> Thanks, >> -Aleksey >> >> On 01/14/2016 06:05 PM, Vladimir Ivanov wrote: >>> Any feedback, please? >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 1/12/16 6:22 PM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8140001/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8140001 >>>> >>>> EA can eliminate allocations of abstract classes or interfaces, thus >>>> changing observable behavior of a program as the test case >>>> demonstrates. >>>> >>>> The fix is to always mark such allocations as escaping. >>>> >>>> Testing: failing test, JPRT. >>>> >>>> Thanks! >>>> >>>> Best regards, >>>> Vladimir Ivanov >> >> From vladimir.x.ivanov at oracle.com Fri Jan 15 13:20:30 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 15 Jan 2016 16:20:30 +0300 Subject: [9] RFR(XS): 8147444: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with NullPointerException in ClassFileInstaller In-Reply-To: <5698EF2D.1070306@oracle.com> References: <5698D301.4070309@oracle.com> <5698E8A2.7020205@oracle.com> <5698EF2D.1070306@oracle.com> Message-ID: <5698F21E.2090403@oracle.com> Reviewed! Best regards, Vladimir Ivanov On 1/15/16 4:07 PM, Tobias Hartmann wrote: > Thanks, Vladimir. > > On 15.01.2016 13:40, Vladimir Ivanov wrote: >> Looks good. Thanks for fixing it! >> >> BTW can ClassFileInstaller be improved to check the input and report the problem in a meaningful way? NPE is useless when diagnosing such problems. > > Sure, I changed the implementation to throw an exception: > Execution failed: `main' threw exception: java.io.FileNotFoundException: jdk/test/lib/Asserts.class > > http://cr.openjdk.java.net/~thartmann/8147444/webrev.01/ > > Best, > Tobias > >> >> Best regards, >> Vladimir Ivanov >> >> On 1/15/16 2:07 PM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8147444 >>> http://cr.openjdk.java.net/~thartmann/8147444/webrev.00/ >>> >>> The test compiler/jsr292/NonInlinedCall/RedefineTest.java fails in the ClassFileInstaller while trying to install jdk.test.lib.Asserts because the class is not imported and therefore not compiled. Because asserts are not used in this test, I removed the directive. >>> >>> Thanks, >>> Tobias >>> From tobias.hartmann at oracle.com Fri Jan 15 13:21:12 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Jan 2016 14:21:12 +0100 Subject: [9] RFR(XS): 8147444: compiler/jsr292/NonInlinedCall/RedefineTest.java fails with NullPointerException in ClassFileInstaller In-Reply-To: <5698F21E.2090403@oracle.com> References: <5698D301.4070309@oracle.com> <5698E8A2.7020205@oracle.com> <5698EF2D.1070306@oracle.com> <5698F21E.2090403@oracle.com> Message-ID: <5698F248.9050600@oracle.com> Thanks, Vladimir. Best, Tobias On 15.01.2016 14:20, Vladimir Ivanov wrote: > Reviewed! > > Best regards, > Vladimir Ivanov > > On 1/15/16 4:07 PM, Tobias Hartmann wrote: >> Thanks, Vladimir. >> >> On 15.01.2016 13:40, Vladimir Ivanov wrote: >>> Looks good. Thanks for fixing it! >>> >>> BTW can ClassFileInstaller be improved to check the input and report the problem in a meaningful way? NPE is useless when diagnosing such problems. >> >> Sure, I changed the implementation to throw an exception: >> Execution failed: `main' threw exception: java.io.FileNotFoundException: jdk/test/lib/Asserts.class >> >> http://cr.openjdk.java.net/~thartmann/8147444/webrev.01/ >> >> Best, >> Tobias >> >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> On 1/15/16 2:07 PM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8147444 >>>> http://cr.openjdk.java.net/~thartmann/8147444/webrev.00/ >>>> >>>> The test compiler/jsr292/NonInlinedCall/RedefineTest.java fails in the ClassFileInstaller while trying to install jdk.test.lib.Asserts because the class is not imported and therefore not compiled. Because asserts are not used in this test, I removed the directive. >>>> >>>> Thanks, >>>> Tobias >>>> From roland.westrelin at oracle.com Fri Jan 15 14:01:49 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 15 Jan 2016 15:01:49 +0100 Subject: [9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type In-Reply-To: <5697C624.7040201@oracle.com> References: <5697C624.7040201@oracle.com> Message-ID: > http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/ I agree with Vladimir on the #ifdef _LP64 but otherwise it looks good to me. Roland. From roland.westrelin at oracle.com Fri Jan 15 14:06:21 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 15 Jan 2016 15:06:21 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <56963C7A.8040203@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> <569409C5.2040805@oracle.com> <569506CA.8040001@oracle.com> <569552EE.8050809@oracle.com> <56963C7A.8040203@oracle.com> Message-ID: <0BEFA2BA-5115-4EE6-A9B4-CFFB8B6485DF@oracle.com> > I changed the implementation to only capture the byte[] and char[] memory: > http://cr.openjdk.java.net/~thartmann/8144212/webrev.03/ That looks good to me. Roland. From tobias.hartmann at oracle.com Fri Jan 15 14:13:45 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Jan 2016 15:13:45 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <0BEFA2BA-5115-4EE6-A9B4-CFFB8B6485DF@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> <569409C5.2040805@oracle.com> <569506CA.8040001@oracle.com> <569552EE.8050809@oracle.com> <56963C7A.8040203@oracle.com> <0BEFA2BA-5115-4EE6-A9B4-CFFB8B6485DF@oracle.com> Message-ID: <5698FE99.6080601@oracle.com> Thanks, Roland! Best, Tobias On 15.01.2016 15:06, Roland Westrelin wrote: > >> I changed the implementation to only capture the byte[] and char[] memory: >> http://cr.openjdk.java.net/~thartmann/8144212/webrev.03/ > > That looks good to me. > > Roland. > From tobias.hartmann at oracle.com Fri Jan 15 14:28:47 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Jan 2016 15:28:47 +0100 Subject: [9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type In-Reply-To: <5697EFFF.90305@oracle.com> References: <5697C624.7040201@oracle.com> <5697EFFF.90305@oracle.com> Message-ID: <5699021F.90500@oracle.com> Thanks, Vladimir. On 14.01.2016 19:59, Vladimir Kozlov wrote: > You have to update code for 8146999 changes when Roland push it. Yes, I'll do so but Roland mentioned that he still has problems with his 814699 fix. > The only thing I don't like about changes is using #ifdef _LP64 for part of changes. > I know where it is coming from (ConvI2L for loop indexing) but as you said ConvI2L could be generated in other cases too. Should the test cast->has_range_check() return 'false' in 32-bit? I added the _LP64 ifdefs because we only emit a narrowed ConvI2L on 64 bit. But I agree - it's cleaner without those. As you suggested, I removed the ifdefs and changed has_range_check() to return false on 32 bit. Here is the new webrev: http://cr.openjdk.java.net/~thartmann/6675699/webrev.02/ Thanks, Tobias > On 1/14/16 8:00 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-6675699 >> http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/ >> >> *Problem* >> The problem is that ConvI2L nodes with a narrow type (used to convert integer array indices to long values) are not dependent on the corresponding range check that proves that the input value is always in the (integer-)range. As a result, the ConvI2L node may flow above the range check during loop optimizations and end up with an input that is not in its type range. The node is then replaced by TOP causing the data path to be eliminated. However, because there is no control dependency on the corresponding range check, the control path from the peeled iteration that uses the result of the ConvI2L may not be eliminated. We crash because we are potentially using a value that is not available. >> >> For example, TestLoopPeeling::testArrayAccess() triggers loop peeling because the loop contains an invariant check. The array store in line 66 is moved out of the loop and reachable from the peeled and old iterations of the loop. However, the array index computation consisting of a LShiftL(ConvI2L(Phi)) remains in each loop because it has loop variant usages and is not dependent on the range check that was moved out of the loop. The peeled iteration of the loop uses storeIndex == -1 causing the ConvI2L to be replaced by TOP because -1 is not in its [0, MAX_INT] range. The TOP is propagated downwards and ends up as one of the inputs to the Phi that merges the array index from the peeled and old loop exits. The Phi replaced by it's only remaining input and the store ends up using the index from the old iteration although it's still reachable from the peeled iteration. We crash because we potentially use the index value from the old iteration while coming from the peeled ite! r! > at! >> ion (of co >> >> urse, the range check would catch this at runtime). >> >> This problem may show up with array accesses but also with other code for which we emit a ConvI2L node with a narrow type. For example, array allocation uses a ConvI2L to convert the integer array size to a long value (see TestLoopPeeling::testArrayAllocation). We solved several different instances of this problem in the past with "workaround-fixes" that just disabled loop optimizations in special cases (see below). Such a workaround fix is not feasible to fix all potential occurrences of this problem. TestLoopPeeling.java crashes JDK 7, 8 and 9. >> >> *Solution* >> To make the ConvI2L dependent on a range check, I added code to emit a narrow CastII node with a control dependency on the range check that is then used as input to the ConvI2L. Like this, we explicitly express the dependency and prevent loop optimizations from moving the ConvI2L above the range check. >> >> To make sure that the impact is as small as possible, the range check dependent CastII nodes are removed right after loop optimizations. Further, all optimizations that depend on the old shape of array address computations are adapted to be aware of the CastII node. >> >> With the fix, we could now remove the following old "workaround-fixes": >> https://bugs.openjdk.java.net/browse/JDK-4781451 >> https://bugs.openjdk.java.net/browse/JDK-4799512 >> https://bugs.openjdk.java.net/browse/JDK-6659207 >> https://bugs.openjdk.java.net/browse/JDK-6663854 >> For reference, the individual patches can be found here: >> http://cr.openjdk.java.net/~thartmann/6675699/backouts/ >> >> However, performance evaluation showed that backing out the old fixes causes significant regressions. It seems that aggressive splitting of ConvI2L nodes through phis leads to less optimal code due to more register spilling. I suspect that additional changes to the loop optimizations are necessary and would therefore like to leave the workaround fixes in for now. I filed JDK-8145313 to remove them later. Like this, we also reduce the impact/risk when backporting this fix to JDK 8 and potentially JDK 7. >> >> Roland pointed out that the changes in ConvI2LNode::Ideal() could potentially be merged into the CastIINode::Ideal() optimization introduced by his fix for JDK-8145322. After some investigation it turned out that the CastII optimization does not only affect memory addressing but also other CastII(AddI(..)) graph shapes. Making it more generic has a broader impact and therefore needs more investigation. I filed JDK-8147394 for this. >> >> ConvI2L nodes with a narrow type are also emitted by intrinsics: >> - GraphKit::array_element_address() >> - PhaseMacroExpand::array_element_address() >> - ArrayCopyNode::prepare_array_copy() >> I was not able to reproduce the problem with intrinsics. It's also not easily possible to make the CastII node range check dependent here because the range check is not always available from within the intrinsic. >> >> *Testing* >> I did extensive testing to make sure the fix does not introduce correctness or performance issues. >> - Different RBT test suites [1] with and without -Xcomp. >> - Full run of multiple CTW suites. >> - Verified changes in "PhaseIdealLoop::match_fill_loop" (loopTransform.cpp) by manually checking the output of [2] with -XX:+TraceOptimizeFill. >> - Verified changes in "IfNode::improve_address_types" (ifnode.cpp) by manually checking the output of [3] with -XX:+PrintOptoAssembly to make sure all range checks are folded. >> - Verified changes in superword.cpp by comparing output with -XX:+TraceSuperWord. >> - Performance runs (Footprint, JMH-Javac, SPECjbb2005, SPECjvm2008, Startup, Volano) on x86 and SPARC showed no regression >> >> Thanks, >> Tobias >> >> [1] RBT test suites: >> - hotspot/test/:hotspot_all >> - noncolo.testlist >> - vm.compiler.testlist >> - vm.regression.testlist >> - nsk.regression.testlist >> - nsk.split_verifier.testlist >> - nsk.stress.testlist >> - nsk.stress.jck.testlist >> - jdk/test/:jdk_jfr >> - jdk/test/:svc_tools >> - jdk/test/:jdk_instrument >> - jdk/test/:jdk_lang >> - jdk/test/:jdk_svc >> - nashorn/test/:tier1 >> - nashorn/test/:tier2 >> - nashorn/test/:tier3 >> Only without -Xcomp: >> - Kitchensink >> - runThese >> - Weblogic12medrec >> [2] test/compiler/intrinsics/6982370/Test6982370.java >> [3] test/compiler/rangechecks/TestExplicitRangeChecks.java >> From tobias.hartmann at oracle.com Fri Jan 15 14:33:19 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 15 Jan 2016 15:33:19 +0100 Subject: [9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type In-Reply-To: References: <5697C624.7040201@oracle.com> Message-ID: <5699032F.2070105@oracle.com> Thanks, Roland! Best, Tobias On 15.01.2016 15:01, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/ > > I agree with Vladimir on the #ifdef _LP64 but otherwise it looks good to me. > > Roland. > From tom.rodriguez at oracle.com Fri Jan 15 16:43:41 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 15 Jan 2016 08:43:41 -0800 Subject: RFR(S): 8147433: PrintNMethods no longer works with JVMCI Message-ID: http://cr.openjdk.java.net/~never/8147433/webrev/index.html https://bugs.openjdk.java.net/browse/JDK-8137167 moved the PrintNMethods related code into ciEnv but since JVMCI doesn?t use ciEnv PrintNMethods no longer works for it. This moves into CompileBroker with the other compilation related printing code. Tested with fastdebug -XX:+PrintNMethods running specjvm2008. tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Jan 15 18:10:48 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Jan 2016 10:10:48 -0800 Subject: [9] RFR (XS): 8147441: unchecked pending exceptions in the WhiteBox API's implementation In-Reply-To: <5698B114.4020106@oracle.com> References: <5698B114.4020106@oracle.com> Message-ID: <56993628.1060702@oracle.com> Seems fine. Thanks, Vladimir On 1/15/16 12:43 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8147441. > > https://bugs.openjdk.java.net/browse/JDK-8147441 > > Problem: The method codeBlob2objectArray is used by the implementation > of the WB API to fill in an object array with information about a code > blob. Although the codeBlob2objectArray method can cause various JNI > exceptions, there are two code locations where the VM does not check for > exceptions after codeBlob2objectArray returns. > > Solution: Add exception check to the above mentioned code locations. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8147441/webrev.00/ > > Testing: > - JPRT; > - all hotspot tests executed locally; all tests that pass with the > default version pass with the fixed version as well. > > Thank you and best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Fri Jan 15 18:14:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Jan 2016 10:14:28 -0800 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <56963C7A.8040203@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> <569409C5.2040805@oracle.com> <569506CA.8040001@oracle.com> <569552EE.8050809@oracle.com> <56963C7A.8040203@oracle.com> Message-ID: <56993704.7000503@oracle.com> Very good. Thanks, Vladimir On 1/13/16 4:00 AM, Tobias Hartmann wrote: > Thanks, Vladimir. > > On 12.01.2016 20:24, Vladimir Kozlov wrote: >>> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. >> >> Yes, that is right solution here. > > I changed the implementation to only capture the byte[] and char[] memory: > http://cr.openjdk.java.net/~thartmann/8144212/webrev.03/ > > The method GraphKit::capture_memory(src_type, dst_type) returns a new MergeMemNode if the src and dst types are different, merging the two. > > Best, > Tobias > >> On 1/12/16 5:59 AM, Tobias Hartmann wrote: >>> On 11.01.2016 21:00, Vladimir Kozlov wrote: >>>> On 1/11/16 7:20 AM, Tobias Hartmann wrote: >>>>> On 08.01.2016 20:41, Vladimir Kozlov wrote: >>>>>> On 1/8/16 2:37 AM, Tobias Hartmann wrote: >>>>>>> On 07.01.2016 21:49, Vladimir Kozlov wrote: >>>>>>>> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>>>>>>>> Hi Vladimir, >>>>>>>>> >>>>>>>>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>>>>>>>> Andrew is right. >>>>>>>>> >>>>>>>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >>>>>>>> >>>>>>>> Right. It was the root of this bug, see below. >>>>>>>> >>>>>>>>> >>>>>>>>> I fixed this for the inflate and compress intrinsics. >>>>>>>>> >>>>>>>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>>>>>>>> StrInflatedCopyNode is not memory node. >>>>>>>>> >>>>>>>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >>>>>>>> >>>>>>>> I did not get the question. Is it before your webrev.01 change? Or even with the change? >>>>>>> >>>>>>> I meant with webrev.01 but you answered my question below. >>>>>>> >>>>>>>>> // This class defines a projection of the memory state of a store conditional node. >>>>>>>>> // These nodes return a value, but also update memory. >>>>>>>>> >>>>>>>>> But inflate does not return any value. >>>>>>>> >>>>>>>> Hmm, according to bottom type inflate produce memory: >>>>>>>> >>>>>>>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >>>>>>>> >>>>>>>> So it really does not need SCMemProjNode. Sorry about that. >>>>>>>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. >>>>>>> >>>>>>> Exactly. >>>>>>> >>>>>>>> Instead of SCMemProjNode we should have to change the idx of your dst_type: >>>>>>>> >>>>>>>> set_memory(str, dst_type); >>>>>>> >>>>>>> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. >>>>>>> >>>>>>>> And you should rollback part of changes in escape.cpp and macro.cpp. >>>>>>> >>>>>>> Okay, I'll to that. >>>>>>> >>>>>>>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>>>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >>>>>>>> >>>>>>>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. >>>>>>> >>>>>>> Okay, should we then use BOTTOM for both the input and output type? >>>>>> >>>>>> Only input. Output type corresponds to dst array type which you set correctly now. >>>>> >>>>> It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case: >>>>> StoreC >>>>> inflate_string >>>>> LoadC >>>>> >>>>> The memory graph (def->use) now looks like this: >>>>> LoadC -> inflate_string -> ByteMem >>>>> ... StoreC-> CharMem >>>> >>>> I did not get this. If StoreC node is created before inflate_string - inflate_string should point to it be barrier for LoadC. >>> >>> Note that the StoreC and inflate_string are *not* writing to the same char[] array. The test looks like this: >>> >>> char c1[] = new char[1]; >>> char c2[] = new char[1]; >>> >>> c2[0] = 42; >>> // Inflate String from byte[] to char[] >>> s.getChars(0, 1, c1, 0); >>> // Read char[] memory written before inflation >>> return c2[0]; >>> >>> The result should be 42. The problem is that inflate_string does not point to StoreC because inflate_string uses a byte[] as input and in this case also writes to a different char[]. Even if we set the input to BOTTOM, inflate_string points to 7 Parm (BOTTOM) but not to the char[] memory produced by 96 StoreC: >>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >>> >>> 349 LoadUS then reads from the output char[] memory of inflate_string which does not include the result of StoreC. The test fails because the return value is != 42. >>> >>> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. >>> >>>> If StoreC followed inflate_string and LoadC followed StoreC - LoadC should point to StoreC. If LoadC does not follow StoreC then result is relaxed. >>> >>> Yes, these cases work fine. >>> >>> Thanks, >>> Tobias >>> >>>>> The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2'). >>>>> >>>>> Setting the input to BOTTOM, generates the following graph: >>>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >>>>> The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails. >>>>> >>>>> I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this: >>>>> LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem)) >>>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png >>>>> >>>>> Here is the new webrev: >>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/ >>>>> Probably, we could also only capture the byte and char slices instead of merging everything. What do you think? >>>>> >>>>> Best, >>>>> Tobias >>>>> >>>>>>>>> Related question: >>>>>>>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>>>>>>>> char[int:>=0]:exact+any * >>>>>>>>> >>>>>>>>> which is equal to the type of the char load. >>>>>>>> >>>>>>>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? >>>>>>> >>>>>>> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: >>>>>>> >>>>>>> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) >>>>>>> // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>>> // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>>> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) >>>>>>> // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) >>>>>>> // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>>> >>>>>>> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. >>>>>>> >>>>>>>> Should we also be more careful in inflate_string_slow()? Is it used? >>>>>>> >>>>>>> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >>>>>>> >>>>>>>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>>>>>>>> char[int:1]:NotNull:exact * >>>>>>>>> >>>>>>>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >>>>>>>> >>>>>>>> It is indeed strange. What memory type of LoadUS? It could be bug. >>>>>>> >>>>>>> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. >>>>>>> >>>>>>> I will look into this again and try to understand what happens. >>>>>> >>>>>> It could that aryptr is pointer to array and load type is pointer to array's element. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Tobias >>>>>>> >>>>>>>>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>>>>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>>>>>>>> >>>>>>>>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>>>>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>>>>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>>>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>>>>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>>>>>>>> >>>>>>>>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>>>>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>>>>>>>> // compiler because the CPU does all the ordering for us. >>>>>>>>>>>> >>>>>>>>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>>>>>>>> memory ordering, loads are never moved before dependent stores. >>>>>>>>>>>> >>>>>>>>>>>> Or did I misunderstand your question? >>>>>>>>>>> >>>>>>>>>>> No, I don't think so. I was just checking: I am very aware that >>>>>>>>>>> HotSpot has presented those of use with relaxed memory order machines >>>>>>>>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>>>>>>>> surprised that C2 needs this barrier, given that there is a >>>>>>>>>>> read-after-write dependency, but never mind. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Andrew. >>>>>>>>>>> From vladimir.kozlov at oracle.com Fri Jan 15 18:18:40 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Jan 2016 10:18:40 -0800 Subject: RFR(S): 8147433: PrintNMethods no longer works with JVMCI In-Reply-To: References: Message-ID: <56993800.3080301@oracle.com> Good. Thank you for fixing this. Vladimir On 1/15/16 8:43 AM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8147433/webrev/index.html > > https://bugs.openjdk.java.net/browse/JDK-8137167 moved the PrintNMethods > related code into ciEnv but since JVMCI doesn?t use ciEnv PrintNMethods > no longer works for it. This moves into CompileBroker with the other > compilation related printing code. Tested with fastdebug > -XX:+PrintNMethods running specjvm2008. > > tom From christian.thalinger at oracle.com Fri Jan 15 18:20:46 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 15 Jan 2016 08:20:46 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: <32315674-1303-4A27-8FFD-AE40E8868F27@oracle.com> References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> <7C1CBFFE-9A7C-4195-A8EA-BD7B94092E4F@oracle.com> <32315674-1303-4A27-8FFD-AE40E8868F27@oracle.com> Message-ID: Thanks, Doug. > On Jan 14, 2016, at 11:55 AM, Doug Simon wrote: > > Looks good. > >> On 14 Jan 2016, at 22:50, Christian Thalinger wrote: >> >>> >>> On Jan 14, 2016, at 2:44 AM, Doug Simon wrote: >>> >>>> >>>> On 14 Jan 2016, at 06:58, Christian Thalinger wrote: >>>> >>>>> >>>>> On Jan 12, 2016, at 12:39 PM, Christian Thalinger wrote: >>>>> >>>>>> >>>>>> On Jan 12, 2016, at 12:14 PM, Christian Thalinger wrote: >>>>>> >>>>>>> >>>>>>> On Jan 12, 2016, at 12:03 PM, Doug Simon wrote: >>>>>>> >>>>>>>> >>>>>>>> On 12 Jan 2016, at 22:39, Christian Thalinger wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: >>>>>>>>> >>>>>>>>> If we?re going with an enum, you could put accessors directly in the enum: >>>>>>>>> >>>>>>>>> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >>>>>>>>> >>>>>>>>> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >>>>>>>>> >>>>>>>>> You could then type the value of the options and check the right accessor is used: >>>>>>>>> >>>>>>>>> public enum Option { >>>>>>>>> ImplicitStableValues(boolean.class), >>>>>>>>> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >>>>>>>>> PrintConfig(boolean.class), >>>>>>>>> PrintFlags(boolean.class), >>>>>>>>> ShowFlags(boolean.class), >>>>>>>>> TraceMethodDataFilter(String.class), >>>>>>>>> TrustFinalDefaultFields(String.class); >>>>>>>>> >>>>>>>>> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. >>>>>>>> >>>>>>>> Excellent idea! I was also thinking about adding the default value to the enum. >>>>>>> >>>>>>> Can you do that without having to box the default value? >>>>>> >>>>>> No, we have to box but we can initialize all flags in the constructor: >>>>>> >>>>>> http://cr.openjdk.java.net/~twisti/8146820/webrev.02/ >>>> >>>> Do we agree on the change? >>> >>> I would prefer it if the value was lazy initialized (for non-AOT runtimes): >> >> It?s not different in AOT-land because these cannot be constants. >> >>> >>> Also, you can remove all the static fields that just cache a (possibly unboxed) option value and use the option directly. For example: >>> >>> diff -r 1034ff44c5d0 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java >>> --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Tue Jan 12 15:04:27 2016 +0100 >>> +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Thu Jan 14 13:40:28 2016 +0100 >>> @@ -29,6 +29,7 @@ >>> import java.lang.reflect.Field; >>> >>> import jdk.vm.ci.common.JVMCIError; >>> +import jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.Option; >>> import jdk.vm.ci.meta.JavaType; >>> import jdk.vm.ci.meta.LocationIdentity; >>> import jdk.vm.ci.meta.MetaAccessProvider; >>> @@ -41,11 +42,6 @@ >>> */ >>> class HotSpotResolvedJavaFieldImpl implements HotSpotResolvedJavaField, HotSpotProxified { >>> >>> - /** >>> - * Mark well-known stable fields as such. >>> - */ >>> - private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); >>> - >>> private final HotSpotResolvedObjectTypeImpl holder; >>> private final String name; >>> private JavaType type; >>> @@ -198,7 +194,7 @@ >>> return true; >>> } >>> assert getAnnotation(Stable.class) == null; >>> - if (ImplicitStableValues && isImplicitStableField()) { >>> + if (Option.ImplicitStableValues.getBoolean() && isImplicitStableField()) { >>> return true; >>> } >>> return false; >>> >>> None of the current options are used in tight loops where the cost of the unboxing (if any) would matter. >> >> Right. >> >>> >>> Lastly, since you?ve added PrintFlags and ShowFlags, why not add a help message to each option. For example: >>> >>> ImplicitStableValues(boolean.class, true, ?Mark well-known stable fields as such.?), >> >> We should. >> >> http://cr.openjdk.java.net/~twisti/8146820/webrev.03/ >> >> $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.ShowFlags=true InitGraal >> [List of JVMCI options] >> boolean ImplicitStableValues := true Mark well-known stable fields as such. >> boolean InitTimer := false Specifies if initialization timing is enabled. >> boolean PrintConfig := false Prints all HotSpotVMConfig fields. >> boolean PrintFlags := false Prints all JVMCI flags and exits. >> boolean ShowFlags = true Prints all JVMCI flags and continues. >> String TraceMethodDataFilter := null >> boolean TrustFinalDefaultFields := true Determines whether to treat final fields with default values as constant. > From vladimir.kozlov at oracle.com Fri Jan 15 18:25:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Jan 2016 10:25:05 -0800 Subject: [9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type In-Reply-To: <5699021F.90500@oracle.com> References: <5697C624.7040201@oracle.com> <5697EFFF.90305@oracle.com> <5699021F.90500@oracle.com> Message-ID: <56993981.8020703@oracle.com> This looks good. Thanks, Vladimir On 1/15/16 6:28 AM, Tobias Hartmann wrote: > Thanks, Vladimir. > > On 14.01.2016 19:59, Vladimir Kozlov wrote: >> You have to update code for 8146999 changes when Roland push it. > > Yes, I'll do so but Roland mentioned that he still has problems with his 814699 fix. > >> The only thing I don't like about changes is using #ifdef _LP64 for part of changes. >> I know where it is coming from (ConvI2L for loop indexing) but as you said ConvI2L could be generated in other cases too. Should the test cast->has_range_check() return 'false' in 32-bit? > > I added the _LP64 ifdefs because we only emit a narrowed ConvI2L on 64 bit. But I agree - it's cleaner without those. As you suggested, I removed the ifdefs and changed has_range_check() to return false on 32 bit. > > Here is the new webrev: > http://cr.openjdk.java.net/~thartmann/6675699/webrev.02/ > > Thanks, > Tobias > >> On 1/14/16 8:00 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-6675699 >>> http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/ >>> >>> *Problem* >>> The problem is that ConvI2L nodes with a narrow type (used to convert integer array indices to long values) are not dependent on the corresponding range check that proves that the input value is always in the (integer-)range. As a result, the ConvI2L node may flow above the range check during loop optimizations and end up with an input that is not in its type range. The node is then replaced by TOP causing the data path to be eliminated. However, because there is no control dependency on the corresponding range check, the control path from the peeled iteration that uses the result of the ConvI2L may not be eliminated. We crash because we are potentially using a value that is not available. >>> >>> For example, TestLoopPeeling::testArrayAccess() triggers loop peeling because the loop contains an invariant check. The array store in line 66 is moved out of the loop and reachable from the peeled and old iterations of the loop. However, the array index computation consisting of a LShiftL(ConvI2L(Phi)) remains in each loop because it has loop variant usages and is not dependent on the range check that was moved out of the loop. The peeled iteration of the loop uses storeIndex == -1 causing the ConvI2L to be replaced by TOP because -1 is not in its [0, MAX_INT] range. The TOP is propagated downwards and ends up as one of the inputs to the Phi that merges the array index from the peeled and old loop exits. The Phi replaced by it's only remaining input and the store ends up using the index from the old iteration although it's still reachable from the peeled iteration. We crash because we potentially use the index value from the old iteration while coming from the peeled it! e! > r! >> at! >>> ion (of co >>> >>> urse, the range check would catch this at runtime). >>> >>> This problem may show up with array accesses but also with other code for which we emit a ConvI2L node with a narrow type. For example, array allocation uses a ConvI2L to convert the integer array size to a long value (see TestLoopPeeling::testArrayAllocation). We solved several different instances of this problem in the past with "workaround-fixes" that just disabled loop optimizations in special cases (see below). Such a workaround fix is not feasible to fix all potential occurrences of this problem. TestLoopPeeling.java crashes JDK 7, 8 and 9. >>> >>> *Solution* >>> To make the ConvI2L dependent on a range check, I added code to emit a narrow CastII node with a control dependency on the range check that is then used as input to the ConvI2L. Like this, we explicitly express the dependency and prevent loop optimizations from moving the ConvI2L above the range check. >>> >>> To make sure that the impact is as small as possible, the range check dependent CastII nodes are removed right after loop optimizations. Further, all optimizations that depend on the old shape of array address computations are adapted to be aware of the CastII node. >>> >>> With the fix, we could now remove the following old "workaround-fixes": >>> https://bugs.openjdk.java.net/browse/JDK-4781451 >>> https://bugs.openjdk.java.net/browse/JDK-4799512 >>> https://bugs.openjdk.java.net/browse/JDK-6659207 >>> https://bugs.openjdk.java.net/browse/JDK-6663854 >>> For reference, the individual patches can be found here: >>> http://cr.openjdk.java.net/~thartmann/6675699/backouts/ >>> >>> However, performance evaluation showed that backing out the old fixes causes significant regressions. It seems that aggressive splitting of ConvI2L nodes through phis leads to less optimal code due to more register spilling. I suspect that additional changes to the loop optimizations are necessary and would therefore like to leave the workaround fixes in for now. I filed JDK-8145313 to remove them later. Like this, we also reduce the impact/risk when backporting this fix to JDK 8 and potentially JDK 7. >>> >>> Roland pointed out that the changes in ConvI2LNode::Ideal() could potentially be merged into the CastIINode::Ideal() optimization introduced by his fix for JDK-8145322. After some investigation it turned out that the CastII optimization does not only affect memory addressing but also other CastII(AddI(..)) graph shapes. Making it more generic has a broader impact and therefore needs more investigation. I filed JDK-8147394 for this. >>> >>> ConvI2L nodes with a narrow type are also emitted by intrinsics: >>> - GraphKit::array_element_address() >>> - PhaseMacroExpand::array_element_address() >>> - ArrayCopyNode::prepare_array_copy() >>> I was not able to reproduce the problem with intrinsics. It's also not easily possible to make the CastII node range check dependent here because the range check is not always available from within the intrinsic. >>> >>> *Testing* >>> I did extensive testing to make sure the fix does not introduce correctness or performance issues. >>> - Different RBT test suites [1] with and without -Xcomp. >>> - Full run of multiple CTW suites. >>> - Verified changes in "PhaseIdealLoop::match_fill_loop" (loopTransform.cpp) by manually checking the output of [2] with -XX:+TraceOptimizeFill. >>> - Verified changes in "IfNode::improve_address_types" (ifnode.cpp) by manually checking the output of [3] with -XX:+PrintOptoAssembly to make sure all range checks are folded. >>> - Verified changes in superword.cpp by comparing output with -XX:+TraceSuperWord. >>> - Performance runs (Footprint, JMH-Javac, SPECjbb2005, SPECjvm2008, Startup, Volano) on x86 and SPARC showed no regression >>> >>> Thanks, >>> Tobias >>> >>> [1] RBT test suites: >>> - hotspot/test/:hotspot_all >>> - noncolo.testlist >>> - vm.compiler.testlist >>> - vm.regression.testlist >>> - nsk.regression.testlist >>> - nsk.split_verifier.testlist >>> - nsk.stress.testlist >>> - nsk.stress.jck.testlist >>> - jdk/test/:jdk_jfr >>> - jdk/test/:svc_tools >>> - jdk/test/:jdk_instrument >>> - jdk/test/:jdk_lang >>> - jdk/test/:jdk_svc >>> - nashorn/test/:tier1 >>> - nashorn/test/:tier2 >>> - nashorn/test/:tier3 >>> Only without -Xcomp: >>> - Kitchensink >>> - runThese >>> - Weblogic12medrec >>> [2] test/compiler/intrinsics/6982370/Test6982370.java >>> [3] test/compiler/rangechecks/TestExplicitRangeChecks.java >>> From christian.thalinger at oracle.com Fri Jan 15 22:55:49 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 15 Jan 2016 12:55:49 -1000 Subject: RFR(S): 8147433: PrintNMethods no longer works with JVMCI In-Reply-To: References: Message-ID: Looks good. > On Jan 15, 2016, at 6:43 AM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8147433/webrev/index.html > > https://bugs.openjdk.java.net/browse/JDK-8137167 moved the PrintNMethods related code into ciEnv but since JVMCI doesn?t use ciEnv PrintNMethods no longer works for it. This moves into CompileBroker with the other compilation related printing code. Tested with fastdebug -XX:+PrintNMethods running specjvm2008. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Jan 15 23:30:06 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 15 Jan 2016 13:30:06 -1000 Subject: RFR(S): 8147433: PrintNMethods no longer works with JVMCI In-Reply-To: References: Message-ID: <48DBC395-B4EB-42C7-8F86-969AAB5D86A7@oracle.com> Tom, can you push this yourself? > On Jan 15, 2016, at 12:55 PM, Christian Thalinger wrote: > > Looks good. > >> On Jan 15, 2016, at 6:43 AM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8147433/webrev/index.html >> >> https://bugs.openjdk.java.net/browse/JDK-8137167 moved the PrintNMethods related code into ciEnv but since JVMCI doesn?t use ciEnv PrintNMethods no longer works for it. This moves into CompileBroker with the other compilation related printing code. Tested with fastdebug -XX:+PrintNMethods running specjvm2008. >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe.darcy at oracle.com Sat Jan 16 01:33:56 2016 From: joe.darcy at oracle.com (Joseph D. Darcy) Date: Fri, 15 Jan 2016 17:33:56 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A569DFF26@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569DFF26@ORSMSX106.amr.corp.intel.com> Message-ID: <56999E04.5040207@oracle.com> Hello, Catching up on email, how were these test cases generated or chosen? In other words, in what sense are they corners? The data would be easier to read if the numbers were aligned by column (they don't appear that way in the webrev at least). What is the code coverage of the new intrinsics with this set of tests? Theses tests should not be separated from the implementation for long; in other words, since the new implementation has already been pushed to a HotSpot forest, test coverage for that new implementation should not lag behind. Thanks, -Joe On 12/22/2015 5:41 PM, Deshpande, Vivek R wrote: > HI All > > I have uploaded the patch for sin and cos tests with input and allowed outputs > at this location for your review. > http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev.00/ > Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 > Thank you. > > Regards, > Vivek > > -----Original Message----- > From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] > Sent: Friday, December 04, 2015 4:50 PM > To: Deshpande, Vivek R; Vladimir Kozlov > Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler > Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib > > Hi Vivek, > > On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >> Hi >> >> Sure I will add the tests. Shall I use StrictMath result as a reference for exact result. >> Let me know your thoughts. > As a rough test of another sin/cos implementation, StrictMath.{sin, cos} can be used a reference with the following caveat: there isn't an indication of which why the error is in a StrictMath result. Let me given an example, if > > StrictMath.sin(x) => y > > then one of the following should be true > > Math.sin(x) => y > Math.sin(x) => Math.nextUp(y) > Math.sin(x) => Math.nextDown(y) > > That is, Math.sin(x) should either be the same as StrictMath.sin(x) OR equal to one of the floating-point numbers adjacent to that result. Of these three options, only two area allowed by the accuracy requirements of the StrictMath.sin specification. However, since StrictMath.sin doesn't give an indication of which way its error went (if it rounded up or down), there is no indication without additional work which of > nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't buggy). > > HTH, > > -Joe > > >> Regards, >> Vivek >> >> -----Original Message----- >> From: joe darcy [mailto:joe.darcy at oracle.com] >> Sent: Thursday, December 03, 2015 1:29 PM >> To: Vladimir Kozlov; Deshpande, Vivek R >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >> lib >> >> Hello, >> >> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>> Vivek, >>> >>> I think Joe is asking you to write these tests as hotspot regression >>> test in hotspot/test/compiler. >> Exactly; if not generally applicable sin/cos tests that could be hosted in the jdk repo (alongside the regression and unit tests for java.lang.Math), then test of intrinsics in the HotSpot repo alongside other tests targeting intrinsics. >> >> Thanks, >> >> -Joe >> >>> Vladimir >>> >>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>> Hi Joe >>>> >>>> It would be great if you would please share the additional tests >>>> with us. >>>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>> Sent: Thursday, December 03, 2015 1:17 PM >>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> I think it is unwise for this large of an implementation change to >>>> be pushed with no tests targeting the specifics of the new implementation. >>>> >>>> The worst-case tests in the jdk repo are the mathematical worst >>>> cases for floating-point approximations, in other words the cases >>>> were the exact mathematical answer is closes to half-way between two >>>> representation floating-point numbers. Passing such tests is >>>> necessary but not sufficient condition for a new implementation. >>>> >>>> Chers, >>>> >>>> -Joe >>>> >>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>> Okay, looks reasonable to me. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>> Hi Vladimir >>>>>> >>>>>> This is the link for the updated webrev with latest hotspot source >>>>>> as base for your review. >>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>> Thank you. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: Deshpande, Vivek R >>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> Hi Vladimir >>>>>> >>>>>> This is the link for the updated webrev for your review. >>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>> Thank you. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>> To: Deshpande, Vivek R; joe darcy >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> Please send link to new webrev on cr server. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> Please find the webrev with your suggested updates attached with >>>>>>> the mail. >>>>>>> We will update it in the jbs entry soon. >>>>>>> Please let me know if it needs further changes. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Deshpande, Vivek R >>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> HI Vladimir, Joe >>>>>>> >>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>> have mentioned. It passed those tests. >>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>> The performance gain is 3.2x over base jdk, that is over current >>>>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>>>> >>>>>>> Could I get those tests around the boundary values. Would >>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>> If yes, then it has passed those boundary cases. >>>>>>> >>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>> for libm and send out the webrev soon. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Just getting added to the thread.. >>>>>>> >>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>> Thank you, for explanation, Vivek. >>>>>>>> >>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>>>> Hotspot tests. >>>>>>>> >>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi Vladimir >>>>>>>>> >>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>>>> switch between FDLIBM and LIBM. >>>>>>>>> >>>>>>>>> Quick explanation: >>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>> should be = 0.19457293629570216 >>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>> 0.1945729362957022 >>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>> library result is between the above two values and Exact result >>>>>>>>> would be pretty close to it. >>>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>>> StrictMath >>>>>>>>> - 1ulp, according to our test. >>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, >>>>>>>> I >>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>> generated by JIT compilers: >>>>>>>> >>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin >>>>>>>> % >>>>>>>> 28 >>>>>>>> do >>>>>>>> u >>>>>>>> ble%29 >>>>>>>> >>>>>>> That interpretation of the spec is not quite right. For the Math >>>>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>>>> closest to the exact result must be returned. For the methods >>>>>>> with a >>>>>>> 1 ulp error bound, either of the floating-point result bracketing >>>>>>> the true result can be returned, subject to the monotonicity >>>>>>> constraints of the specification of the particular method. >>>>>>> >>>>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>> I was thinking about using existing >>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>>>>> versions of functions which accept intrinsic ID instead of >>>>>>>> methodHandle. >>>>>>>> >>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>> >>>>>>>>> Also the performance gain ~4x is with >>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>> LIBM code and compilers use FDLIB? >>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>> >>>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>> boost to the StrictMath methods that have been ported. >>>>>>> >>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>> testing. >>>>>>> For example, part of patch says >>>>>>> >>>>>>> # For sin >>>>>>> >>>>>>> +// This means that the main path is actually only taken for >>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>> >>>>>>> # For cos >>>>>>> >>>>>>> +// This means that the main path is actually only taken for >>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>> >>>>>>> If nothing else, there are no tests at around those boundary >>>>>>> values, which is unacceptable. There should also be some tests of >>>>>>> values of interest to the algorithm in question. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> -Joe >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>> questions and give more data if needed. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>> log() changes did not have flags. >>>>>>>>>> >>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>> >>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>> Hi Vivek, >>>>>>>>> >>>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>>> file bugs and fixed them after FC. >>>>>>>>> >>>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>>> the only thing holding it from push. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>> Hi all >>>>>>>>>>> >>>>>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>>>>> and >>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>>> implementation. >>>>>>>>>>> >>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and cos. >>>>>>>>>>> >>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>> >>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>> >>>>>>>>>>> Bug-id: >>>>>>>>>>> >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>> webrev: >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>> >>>>>>>>>>> Thanks and regards, >>>>>>>>>>> >>>>>>>>>>> Vivek >>>>>>>>>>> From vladimir.kozlov at oracle.com Sat Jan 16 01:58:46 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 15 Jan 2016 17:58:46 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <56999E04.5040207@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569DFF26@ORSMSX106.amr.corp.intel.com> <56999E04.5040207@oracle.com> Message-ID: <5699A3D6.6080305@oracle.com> Note, the test was pushed together with VM changes into hs-comp repo: http://hg.openjdk.java.net/jdk9/hs-comp/jdk/rev/ddd59a780769 New sin/cos code is tested in all running modes since it is used by Interpreter and JITed code (C1 and C2). I will let Vivek answer questions about the test. Regards, Vladimir On 1/15/16 5:33 PM, Joseph D. Darcy wrote: > Hello, > > Catching up on email, how were these test cases generated or chosen? In > other words, in what sense are they corners? > > The data would be easier to read if the numbers were aligned by column > (they don't appear that way in the webrev at least). > > What is the code coverage of the new intrinsics with this set of tests? > > Theses tests should not be separated from the implementation for long; > in other words, since the new implementation has already been pushed to > a HotSpot forest, test coverage for that new implementation should not > lag behind. > > Thanks, > > -Joe > > On 12/22/2015 5:41 PM, Deshpande, Vivek R wrote: >> HI All >> >> I have uploaded the patch for sin and cos tests with input and allowed >> outputs >> at this location for your review. >> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev.00/ >> Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 >> Thank you. >> >> Regards, >> Vivek >> >> -----Original Message----- >> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] >> Sent: Friday, December 04, 2015 4:50 PM >> To: Deshpande, Vivek R; Vladimir Kozlov >> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib >> >> Hi Vivek, >> >> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >>> Hi >>> >>> Sure I will add the tests. Shall I use StrictMath result as a >>> reference for exact result. >>> Let me know your thoughts. >> As a rough test of another sin/cos implementation, StrictMath.{sin, >> cos} can be used a reference with the following caveat: there isn't an >> indication of which why the error is in a StrictMath result. Let me >> given an example, if >> >> StrictMath.sin(x) => y >> >> then one of the following should be true >> >> Math.sin(x) => y >> Math.sin(x) => Math.nextUp(y) >> Math.sin(x) => Math.nextDown(y) >> >> That is, Math.sin(x) should either be the same as StrictMath.sin(x) OR >> equal to one of the floating-point numbers adjacent to that result. Of >> these three options, only two area allowed by the accuracy >> requirements of the StrictMath.sin specification. However, since >> StrictMath.sin doesn't give an indication of which way its error went >> (if it rounded up or down), there is no indication without additional >> work which of >> nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't >> buggy). >> >> HTH, >> >> -Joe >> >> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: joe darcy [mailto:joe.darcy at oracle.com] >>> Sent: Thursday, December 03, 2015 1:29 PM >>> To: Vladimir Kozlov; Deshpande, Vivek R >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>> lib >>> >>> Hello, >>> >>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>>> Vivek, >>>> >>>> I think Joe is asking you to write these tests as hotspot regression >>>> test in hotspot/test/compiler. >>> Exactly; if not generally applicable sin/cos tests that could be >>> hosted in the jdk repo (alongside the regression and unit tests for >>> java.lang.Math), then test of intrinsics in the HotSpot repo >>> alongside other tests targeting intrinsics. >>> >>> Thanks, >>> >>> -Joe >>> >>>> Vladimir >>>> >>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>>> Hi Joe >>>>> >>>>> It would be great if you would please share the additional tests >>>>> with us. >>>>> >>>>> Regards, >>>>> Vivek >>>>> >>>>> -----Original Message----- >>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>> Sent: Thursday, December 03, 2015 1:17 PM >>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>> math lib >>>>> >>>>> I think it is unwise for this large of an implementation change to >>>>> be pushed with no tests targeting the specifics of the new >>>>> implementation. >>>>> >>>>> The worst-case tests in the jdk repo are the mathematical worst >>>>> cases for floating-point approximations, in other words the cases >>>>> were the exact mathematical answer is closes to half-way between two >>>>> representation floating-point numbers. Passing such tests is >>>>> necessary but not sufficient condition for a new implementation. >>>>> >>>>> Chers, >>>>> >>>>> -Joe >>>>> >>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>>> Okay, looks reasonable to me. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>>> Hi Vladimir >>>>>>> >>>>>>> This is the link for the updated webrev with latest hotspot source >>>>>>> as base for your review. >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Deshpande, Vivek R >>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Hi Vladimir >>>>>>> >>>>>>> This is the link for the updated webrev for your review. >>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>>> Thank you. >>>>>>> >>>>>>> Regards, >>>>>>> Vivek >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>>> To: Deshpande, Vivek R; joe darcy >>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>> math lib >>>>>>> >>>>>>> Please send link to new webrev on cr server. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> Please find the webrev with your suggested updates attached with >>>>>>>> the mail. >>>>>>>> We will update it in the jbs entry soon. >>>>>>>> Please let me know if it needs further changes. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Deshpande, Vivek R >>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> HI Vladimir, Joe >>>>>>>> >>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>>> have mentioned. It passed those tests. >>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>>> The performance gain is 3.2x over base jdk, that is over current >>>>>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>>>>> >>>>>>>> Could I get those tests around the boundary values. Would >>>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>>> If yes, then it has passed those boundary cases. >>>>>>>> >>>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>>> for libm and send out the webrev soon. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Just getting added to the thread.. >>>>>>>> >>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>>> Thank you, for explanation, Vivek. >>>>>>>>> >>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>>>>> Hotspot tests. >>>>>>>>> >>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>>> Hi Vladimir >>>>>>>>>> >>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>>>>> switch between FDLIBM and LIBM. >>>>>>>>>> >>>>>>>>>> Quick explanation: >>>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>>> should be = 0.19457293629570216 >>>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>>> 0.1945729362957022 >>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>>> library result is between the above two values and Exact result >>>>>>>>>> would be pretty close to it. >>>>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>>>> StrictMath >>>>>>>>>> - 1ulp, according to our test. >>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, >>>>>>>>> I >>>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>>> generated by JIT compilers: >>>>>>>>> >>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin >>>>>>>>> % >>>>>>>>> 28 >>>>>>>>> do >>>>>>>>> u >>>>>>>>> ble%29 >>>>>>>>> >>>>>>>> That interpretation of the spec is not quite right. For the Math >>>>>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>>>>> closest to the exact result must be returned. For the methods >>>>>>>> with a >>>>>>>> 1 ulp error bound, either of the floating-point result bracketing >>>>>>>> the true result can be returned, subject to the monotonicity >>>>>>>> constraints of the specification of the particular method. >>>>>>>> >>>>>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>> I was thinking about using existing >>>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>>>>>> versions of functions which accept intrinsic ID instead of >>>>>>>>> methodHandle. >>>>>>>>> >>>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>>> >>>>>>>>>> Also the performance gain ~4x is with >>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>>> LIBM code and compilers use FDLIB? >>>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>>> >>>>>>>> I'm part way through porting the FDLIBM code to Java (JDK-8134780: >>>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>>> boost to the StrictMath methods that have been ported. >>>>>>>> >>>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>>> testing. >>>>>>>> For example, part of patch says >>>>>>>> >>>>>>>> # For sin >>>>>>>> >>>>>>>> +// This means that the main path is actually only taken for >>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>> >>>>>>>> # For cos >>>>>>>> >>>>>>>> +// This means that the main path is actually only taken for >>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>> >>>>>>>> If nothing else, there are no tests at around those boundary >>>>>>>> values, which is unacceptable. There should also be some tests of >>>>>>>> values of interest to the algorithm in question. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> -Joe >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>>> questions and give more data if needed. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Vivek >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>>> the math lib >>>>>>>>>> >>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>>> log() changes did not have flags. >>>>>>>>>>> >>>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>>> >>>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>>> Hi Vivek, >>>>>>>>>> >>>>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>>>> file bugs and fixed them after FC. >>>>>>>>>> >>>>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>>>> the only thing holding it from push. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>>> Hi all >>>>>>>>>>>> >>>>>>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>>>>>> and >>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel LIBM >>>>>>>>>>>> implementation. >>>>>>>>>>>> >>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and >>>>>>>>>>>> cos. >>>>>>>>>>>> >>>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>>> >>>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>>> >>>>>>>>>>>> Bug-id: >>>>>>>>>>>> >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>>> webrev: >>>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>>> >>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>> >>>>>>>>>>>> Vivek >>>>>>>>>>>> > From joe.darcy at oracle.com Sat Jan 16 02:28:31 2016 From: joe.darcy at oracle.com (joe darcy) Date: Fri, 15 Jan 2016 18:28:31 -0800 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <5699A3D6.6080305@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569DFF26@ORSMSX106.amr.corp.intel.com> <56999E04.5040207@oracle.com> <5699A3D6.6080305@oracle.com> Message-ID: <5699AACF.6080608@oracle.com> Ah okay; I overlooked the separate push of the tests. Thanks, -Joe On 1/15/2016 5:58 PM, Vladimir Kozlov wrote: > Note, the test was pushed together with VM changes into hs-comp repo: > > http://hg.openjdk.java.net/jdk9/hs-comp/jdk/rev/ddd59a780769 > > New sin/cos code is tested in all running modes since it is used by > Interpreter and JITed code (C1 and C2). > > I will let Vivek answer questions about the test. > > Regards, > Vladimir > > On 1/15/16 5:33 PM, Joseph D. Darcy wrote: >> Hello, >> >> Catching up on email, how were these test cases generated or chosen? In >> other words, in what sense are they corners? >> >> The data would be easier to read if the numbers were aligned by column >> (they don't appear that way in the webrev at least). >> >> What is the code coverage of the new intrinsics with this set of tests? >> >> Theses tests should not be separated from the implementation for long; >> in other words, since the new implementation has already been pushed to >> a HotSpot forest, test coverage for that new implementation should not >> lag behind. >> >> Thanks, >> >> -Joe >> >> On 12/22/2015 5:41 PM, Deshpande, Vivek R wrote: >>> HI All >>> >>> I have uploaded the patch for sin and cos tests with input and allowed >>> outputs >>> at this location for your review. >>> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webrev.00/ >>> >>> Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 >>> Thank you. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] >>> Sent: Friday, December 04, 2015 4:50 PM >>> To: Deshpande, Vivek R; Vladimir Kozlov >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> Hi Vivek, >>> >>> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >>>> Hi >>>> >>>> Sure I will add the tests. Shall I use StrictMath result as a >>>> reference for exact result. >>>> Let me know your thoughts. >>> As a rough test of another sin/cos implementation, StrictMath.{sin, >>> cos} can be used a reference with the following caveat: there isn't an >>> indication of which why the error is in a StrictMath result. Let me >>> given an example, if >>> >>> StrictMath.sin(x) => y >>> >>> then one of the following should be true >>> >>> Math.sin(x) => y >>> Math.sin(x) => Math.nextUp(y) >>> Math.sin(x) => Math.nextDown(y) >>> >>> That is, Math.sin(x) should either be the same as StrictMath.sin(x) OR >>> equal to one of the floating-point numbers adjacent to that result. Of >>> these three options, only two area allowed by the accuracy >>> requirements of the StrictMath.sin specification. However, since >>> StrictMath.sin doesn't give an indication of which way its error went >>> (if it rounded up or down), there is no indication without additional >>> work which of >>> nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin isn't >>> buggy). >>> >>> HTH, >>> >>> -Joe >>> >>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>> Sent: Thursday, December 03, 2015 1:29 PM >>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math >>>> lib >>>> >>>> Hello, >>>> >>>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>>>> Vivek, >>>>> >>>>> I think Joe is asking you to write these tests as hotspot regression >>>>> test in hotspot/test/compiler. >>>> Exactly; if not generally applicable sin/cos tests that could be >>>> hosted in the jdk repo (alongside the regression and unit tests for >>>> java.lang.Math), then test of intrinsics in the HotSpot repo >>>> alongside other tests targeting intrinsics. >>>> >>>> Thanks, >>>> >>>> -Joe >>>> >>>>> Vladimir >>>>> >>>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>>>> Hi Joe >>>>>> >>>>>> It would be great if you would please share the additional tests >>>>>> with us. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>> Sent: Thursday, December 03, 2015 1:17 PM >>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> I think it is unwise for this large of an implementation change to >>>>>> be pushed with no tests targeting the specifics of the new >>>>>> implementation. >>>>>> >>>>>> The worst-case tests in the jdk repo are the mathematical worst >>>>>> cases for floating-point approximations, in other words the cases >>>>>> were the exact mathematical answer is closes to half-way between two >>>>>> representation floating-point numbers. Passing such tests is >>>>>> necessary but not sufficient condition for a new implementation. >>>>>> >>>>>> Chers, >>>>>> >>>>>> -Joe >>>>>> >>>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>>>> Okay, looks reasonable to me. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev with latest hotspot source >>>>>>>> as base for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Deshpande, Vivek R >>>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>>>> To: Deshpande, Vivek R; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>> math lib >>>>>>>> >>>>>>>> Please send link to new webrev on cr server. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi Vladimir >>>>>>>>> >>>>>>>>> Please find the webrev with your suggested updates attached with >>>>>>>>> the mail. >>>>>>>>> We will update it in the jbs entry soon. >>>>>>>>> Please let me know if it needs further changes. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Deshpande, Vivek R >>>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>>> math lib >>>>>>>>> >>>>>>>>> HI Vladimir, Joe >>>>>>>>> >>>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>>>> have mentioned. It passed those tests. >>>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>>>> The performance gain is 3.2x over base jdk, that is over current >>>>>>>>> fsin/fcos intrinsic. This gain is more realistic. >>>>>>>>> >>>>>>>>> Could I get those tests around the boundary values. Would >>>>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>>>> If yes, then it has passed those boundary cases. >>>>>>>>> >>>>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>>>> for libm and send out the webrev soon. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>>>>> math lib >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Just getting added to the thread.. >>>>>>>>> >>>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>>>> Thank you, for explanation, Vivek. >>>>>>>>>> >>>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition to >>>>>>>>>> Hotspot tests. >>>>>>>>>> >>>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>>>> Hi Vladimir >>>>>>>>>>> >>>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>>>> StrictMath result and not exact result. So I added the flag to >>>>>>>>>>> switch between FDLIBM and LIBM. >>>>>>>>>>> >>>>>>>>>>> Quick explanation: >>>>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>>>> should be = 0.19457293629570216 >>>>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>>>> 0.1945729362957022 >>>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>>>> library result is between the above two values and Exact result >>>>>>>>>>> would be pretty close to it. >>>>>>>>>>> So here StrictMath result is less than quad-precision result, >>>>>>>>>>> Math result should be StrictMath or StrictMath + 1ulp and not >>>>>>>>>>> StrictMath >>>>>>>>>>> - 1ulp, according to our test. >>>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both direction, >>>>>>>>>> I >>>>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>>>> generated by JIT compilers: >>>>>>>>>> >>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#sin >>>>>>>>>> % >>>>>>>>>> 28 >>>>>>>>>> do >>>>>>>>>> u >>>>>>>>>> ble%29 >>>>>>>>>> >>>>>>>>> That interpretation of the spec is not quite right. For the Math >>>>>>>>> methods with a 1/2 ulp error bound, the floating-point result >>>>>>>>> closest to the exact result must be returned. For the methods >>>>>>>>> with a >>>>>>>>> 1 ulp error bound, either of the floating-point result bracketing >>>>>>>>> the true result can be returned, subject to the monotonicity >>>>>>>>> constraints of the specification of the particular method. >>>>>>>>> >>>>>>>>>>> I have done the experiments with XX:+UnlockDiagnosticVMOptions >>>>>>>>>>> -XX:DisableIntrinsic=_dsin and XX:+UnlockDiagnosticVMOptions >>>>>>>>>>> -XX:DisableIntrinsic=_dcos. With this option, the interpreter >>>>>>>>>>> would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>> I was thinking about using existing >>>>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add additional >>>>>>>>>> versions of functions which accept intrinsic ID instead of >>>>>>>>>> methodHandle. >>>>>>>>>> >>>>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>>>> >>>>>>>>>>> Also the performance gain ~4x is with >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>>>> LIBM code and compilers use FDLIB? >>>>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>>>> >>>>>>>>> I'm part way through porting the FDLIBM code to Java >>>>>>>>> (JDK-8134780: >>>>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>>>> boost to the StrictMath methods that have been ported. >>>>>>>>> >>>>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>>>> testing. >>>>>>>>> For example, part of patch says >>>>>>>>> >>>>>>>>> # For sin >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> # For cos >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> If nothing else, there are no tests at around those boundary >>>>>>>>> values, which is unacceptable. There should also be some tests of >>>>>>>>> values of interest to the algorithm in question. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> -Joe >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>>>> questions and give more data if needed. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Vivek >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>>>> To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net >>>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>>>> the math lib >>>>>>>>>>> >>>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>>>> log() changes did not have flags. >>>>>>>>>>>> >>>>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>>>> >>>>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>>>> Hi Vivek, >>>>>>>>>>> >>>>>>>>>>> I want to point that you can do this experiment later. We can >>>>>>>>>>> file bugs and fixed them after FC. >>>>>>>>>>> >>>>>>>>>>> For now, please, answer my question about flags only. This is >>>>>>>>>>> the only thing holding it from push. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>>>> Hi all >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to contribute a patch which optimizes Math.sin() >>>>>>>>>>>>> and >>>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel >>>>>>>>>>>>> LIBM >>>>>>>>>>>>> implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin and >>>>>>>>>>>>> cos. >>>>>>>>>>>>> >>>>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>>>> >>>>>>>>>>>>> Bug-id: >>>>>>>>>>>>> >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>>>> webrev: >>>>>>>>>>>>> >>>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Vivek >>>>>>>>>>>>> >> From tom.rodriguez at oracle.com Sat Jan 16 02:48:36 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 15 Jan 2016 18:48:36 -0800 Subject: RFR(S): 8147433: PrintNMethods no longer works with JVMCI In-Reply-To: <48DBC395-B4EB-42C7-8F86-969AAB5D86A7@oracle.com> References: <48DBC395-B4EB-42C7-8F86-969AAB5D86A7@oracle.com> Message-ID: <7D91D2AA-5C7F-46A1-8189-61F6CD9076BB@oracle.com> I thought I?d give it a try. I think I have it all set up right, so it?s a good test. tom > On Jan 15, 2016, at 3:30 PM, Christian Thalinger wrote: > > Tom, can you push this yourself? > >> On Jan 15, 2016, at 12:55 PM, Christian Thalinger > wrote: >> >> Looks good. >> >>> On Jan 15, 2016, at 6:43 AM, Tom Rodriguez > wrote: >>> >>> http://cr.openjdk.java.net/~never/8147433/webrev/index.html >>> >>> https://bugs.openjdk.java.net/browse/JDK-8137167 moved the PrintNMethods related code into ciEnv but since JVMCI doesn?t use ciEnv PrintNMethods no longer works for it. This moves into CompileBroker with the other compilation related printing code. Tested with fastdebug -XX:+PrintNMethods running specjvm2008. >>> >>> tom >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Mon Jan 18 07:09:51 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 18 Jan 2016 08:09:51 +0100 Subject: [9] RFR(S): 8144212: JDK 9 b93 breaks Apache Lucene due to compact strings In-Reply-To: <56993704.7000503@oracle.com> References: <568D0229.60908@oracle.com> <568D037E.7000105@redhat.com> <568D1148.1030901@oracle.com> <568D17E4.90301@redhat.com> <568DAA2A.9070704@oracle.com> <568E7BAB.5070908@oracle.com> <568ECF5C.6090407@oracle.com> <568F9183.9070909@oracle.com> <56901101.6050503@oracle.com> <5693C83F.9030100@oracle.com> <569409C5.2040805@oracle.com> <569506CA.8040001@oracle.com> <569552EE.8050809@oracle.com> <56963C7A.8040203@oracle.com> <56993704.7000503@oracle.com> Message-ID: <569C8FBF.2030402@oracle.com> Thanks, Vladimir! Best, Tobias On 15.01.2016 19:14, Vladimir Kozlov wrote: > Very good. > > Thanks, > Vladimir > > On 1/13/16 4:00 AM, Tobias Hartmann wrote: >> Thanks, Vladimir. >> >> On 12.01.2016 20:24, Vladimir Kozlov wrote: >>>> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. >>> >>> Yes, that is right solution here. >> >> I changed the implementation to only capture the byte[] and char[] memory: >> http://cr.openjdk.java.net/~thartmann/8144212/webrev.03/ >> >> The method GraphKit::capture_memory(src_type, dst_type) returns a new MergeMemNode if the src and dst types are different, merging the two. >> >> Best, >> Tobias >> >>> On 1/12/16 5:59 AM, Tobias Hartmann wrote: >>>> On 11.01.2016 21:00, Vladimir Kozlov wrote: >>>>> On 1/11/16 7:20 AM, Tobias Hartmann wrote: >>>>>> On 08.01.2016 20:41, Vladimir Kozlov wrote: >>>>>>> On 1/8/16 2:37 AM, Tobias Hartmann wrote: >>>>>>>> On 07.01.2016 21:49, Vladimir Kozlov wrote: >>>>>>>>> On 1/7/16 6:52 AM, Tobias Hartmann wrote: >>>>>>>>>> Hi Vladimir, >>>>>>>>>> >>>>>>>>>> On 07.01.2016 00:58, Vladimir Kozlov wrote: >>>>>>>>>>> Andrew is right. >>>>>>>>>> >>>>>>>>>> Yes, he's right that the membar is not needed in this case. I noticed that GraphKit::inflate_string() sets the output memory to TypeAryPtr::BYTES although inflate writes to a char[] array in this case. This caused the subsequent char load to be on a different slice allowing C2 to move the load to before the intrinsic. >>>>>>>>> >>>>>>>>> Right. It was the root of this bug, see below. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I fixed this for the inflate and compress intrinsics. >>>>>>>>>> >>>>>>>>>>> GraphKit::inflate_string() should have SCMemProjNode as compress_string() does to prevent loads move up. >>>>>>>>>>> StrInflatedCopyNode is not memory node. >>>>>>>>>> >>>>>>>>>> Okay, why are above changes not sufficient to prevent the load from moving up? Also, the comment for SCMemProjNode says: >>>>>>>>> >>>>>>>>> I did not get the question. Is it before your webrev.01 change? Or even with the change? >>>>>>>> >>>>>>>> I meant with webrev.01 but you answered my question below. >>>>>>>> >>>>>>>>>> // This class defines a projection of the memory state of a store conditional node. >>>>>>>>>> // These nodes return a value, but also update memory. >>>>>>>>>> >>>>>>>>>> But inflate does not return any value. >>>>>>>>> >>>>>>>>> Hmm, according to bottom type inflate produce memory: >>>>>>>>> >>>>>>>>> StrInflatedCopyNode::bottom_type() const { return Type::MEMORY; } >>>>>>>>> >>>>>>>>> So it really does not need SCMemProjNode. Sorry about that. >>>>>>>>> So load was LoadUS which is char load and originally memory slice of inflate was incorrect BYTES. >>>>>>>> >>>>>>>> Exactly. >>>>>>>> >>>>>>>>> Instead of SCMemProjNode we should have to change the idx of your dst_type: >>>>>>>>> >>>>>>>>> set_memory(str, dst_type); >>>>>>>> >>>>>>>> Yes, that's what I do now in webrev.01 by passing the dst_type as an argument to inflate_string. >>>>>>>> >>>>>>>>> And you should rollback part of changes in escape.cpp and macro.cpp. >>>>>>>> >>>>>>>> Okay, I'll to that. >>>>>>>> >>>>>>>>>> Here is the new webrev, including the SCMemProjNode and adapting escape analysis and macro expansion accordingly: >>>>>>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.01/ >>>>>>>>> >>>>>>>>> In general when src & dst arrays have different type we may need to use TypeOopPtr::BOTTOM to prevent related store & loads bypass these copy nodes. >>>>>>>> >>>>>>>> Okay, should we then use BOTTOM for both the input and output type? >>>>>>> >>>>>>> Only input. Output type corresponds to dst array type which you set correctly now. >>>>>> >>>>>> It seems like that this is not sufficient. As Roland pointed out (off-thread), there may still be a problem in the following case: >>>>>> StoreC >>>>>> inflate_string >>>>>> LoadC >>>>>> >>>>>> The memory graph (def->use) now looks like this: >>>>>> LoadC -> inflate_string -> ByteMem >>>>>> ... StoreC-> CharMem >>>>> >>>>> I did not get this. If StoreC node is created before inflate_string - inflate_string should point to it be barrier for LoadC. >>>> >>>> Note that the StoreC and inflate_string are *not* writing to the same char[] array. The test looks like this: >>>> >>>> char c1[] = new char[1]; >>>> char c2[] = new char[1]; >>>> >>>> c2[0] = 42; >>>> // Inflate String from byte[] to char[] >>>> s.getChars(0, 1, c1, 0); >>>> // Read char[] memory written before inflation >>>> return c2[0]; >>>> >>>> The result should be 42. The problem is that inflate_string does not point to StoreC because inflate_string uses a byte[] as input and in this case also writes to a different char[]. Even if we set the input to BOTTOM, inflate_string points to 7 Parm (BOTTOM) but not to the char[] memory produced by 96 StoreC: >>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >>>> >>>> 349 LoadUS then reads from the output char[] memory of inflate_string which does not include the result of StoreC. The test fails because the return value is != 42. >>>> >>>> My solution is to capture both the byte[] and char[] memory by using a MergeMem node as input to inflate_string. >>>> >>>>> If StoreC followed inflate_string and LoadC followed StoreC - LoadC should point to StoreC. If LoadC does not follow StoreC then result is relaxed. >>>> >>>> Yes, these cases work fine. >>>> >>>> Thanks, >>>> Tobias >>>> >>>>>> The intrinsic hides the dependency between LoadC and StoreC, causing the load to read from memory not containing the result of the StoreC. I was able to write a regression test for this (see 'TestStringIntrinsicMemoryFlow::testInflate2'). >>>>>> >>>>>> Setting the input to BOTTOM, generates the following graph: >>>>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_bottom.png >>>>>> The 349 LoadUS does not read the result of the 96 StoreC because the StrInflateCopyNode does not capture it's memory. The test fails. >>>>>> >>>>>> I adapted the fix to emit a MergeMemoryNode to capture the entire memory state as input to the intrinsic. The graph then looks like this: >>>>>> LoadC -> inflate_string -> MergeMem(ByteMem, StoreC(CharMem)) >>>>>> http://cr.openjdk.java.net/~thartmann/8144212/inflate_merge.png >>>>>> >>>>>> Here is the new webrev: >>>>>> http://cr.openjdk.java.net/~thartmann/8144212/webrev.02/ >>>>>> Probably, we could also only capture the byte and char slices instead of merging everything. What do you think? >>>>>> >>>>>> Best, >>>>>> Tobias >>>>>> >>>>>>>>>> Related question: >>>>>>>>>> In library_call.cpp, I now use TypeAryPtr::get_array_body_type(dst_elem) to get the correct TypeAryPtr for the destination (we support both BYTES and CHARS). For a char[] destination, it returns: >>>>>>>>>> char[int:>=0]:exact+any * >>>>>>>>>> >>>>>>>>>> which is equal to the type of the char load. >>>>>>>>> >>>>>>>>> Please, explain this. I thought string's array will always be byte[] when compressed strings are enabled. Is it used for getChars() which returns char array? >>>>>>>> >>>>>>>> Yes, both the compress and inflate intrinsics are used for different types of src and dst arrays. See comment in library_call.cpp: >>>>>>>> >>>>>>>> // compressIt == true --> generate a compressed copy operation (compress char[]/byte[] to byte[]) >>>>>>>> // int StringUTF16.compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>>>> // int StringUTF16.compress(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>>>> // compressIt == false --> generate an inflated copy operation (inflate byte[] to char[]/byte[]) >>>>>>>> // void StringLatin1.inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) >>>>>>>> // void StringLatin1.inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) >>>>>>>> >>>>>>>> I.e., the inflate intrinsic is used for inflation from byte[] to byte[]/char[]. >>>>>>>> >>>>>>>>> Should we also be more careful in inflate_string_slow()? Is it used? >>>>>>>> >>>>>>>> No, inflate_string_slow() is only called from PhaseStringOpts::copy_latin1_string() where it is used to inflate from byte[] to byte[]. >>>>>>>> >>>>>>>>>> I also tried to derive the type from the array by using dst_type->isa_aryptr(). However, this returns a more specific type: >>>>>>>>>> char[int:1]:NotNull:exact * >>>>>>>>>> >>>>>>>>>> Using this results in C2 assuming that the subsequent char load is independent and again moving it to before the intrinsic. I don't understand why that is. Shouldn't the second type be a "subtype" of the first type? >>>>>>>>> >>>>>>>>> It is indeed strange. What memory type of LoadUS? It could be bug. >>>>>>>> >>>>>>>> LoadUS has memory type "char[int:>=0]:exact+any *" which has alias index 4. dst_type->isa_aryptr() returns memory type "char[int:1]:NotNull:exact *" which has alias index 8. >>>>>>>> >>>>>>>> I will look into this again and try to understand what happens. >>>>>>> >>>>>>> It could that aryptr is pointer to array and load type is pointer to array's element. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tobias >>>>>>>> >>>>>>>>>>> On 1/6/16 5:34 AM, Andrew Haley wrote: >>>>>>>>>>>> On 01/06/2016 01:06 PM, Tobias Hartmann wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The problem here is that C2 reorders memory instructions and moves >>>>>>>>>>>>> an array load before an array store. The MemBarCPUOrder is now used >>>>>>>>>>>>> (compiler internally) to prevent this. We do the same for normal >>>>>>>>>>>>> array copys in PhaseMacroExpand::expand_arraycopy_node(). No actual >>>>>>>>>>>>> code is emitted. See also the comment in memnode.hpp: >>>>>>>>>>>>> >>>>>>>>>>>>> // Ordering within the same CPU. Used to order unsafe memory references >>>>>>>>>>>>> // inside the compiler when we lack alias info. Not needed "outside" the >>>>>>>>>>>>> // compiler because the CPU does all the ordering for us. >>>>>>>>>>>>> >>>>>>>>>>>>> "CPU does all the ordering for us" means that even with a relaxed >>>>>>>>>>>>> memory ordering, loads are never moved before dependent stores. >>>>>>>>>>>>> >>>>>>>>>>>>> Or did I misunderstand your question? >>>>>>>>>>>> >>>>>>>>>>>> No, I don't think so. I was just checking: I am very aware that >>>>>>>>>>>> HotSpot has presented those of use with relaxed memory order machines >>>>>>>>>>>> with some interesting gotchas over the years, that's all. I'm a bit >>>>>>>>>>>> surprised that C2 needs this barrier, given that there is a >>>>>>>>>>>> read-after-write dependency, but never mind. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Andrew. >>>>>>>>>>>> From tobias.hartmann at oracle.com Mon Jan 18 07:10:14 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 18 Jan 2016 08:10:14 +0100 Subject: [9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type In-Reply-To: <56993981.8020703@oracle.com> References: <5697C624.7040201@oracle.com> <5697EFFF.90305@oracle.com> <5699021F.90500@oracle.com> <56993981.8020703@oracle.com> Message-ID: <569C8FD6.9090806@oracle.com> Thanks, Vladimir! Best, Tobias On 15.01.2016 19:25, Vladimir Kozlov wrote: > This looks good. > > Thanks, > Vladimir > > On 1/15/16 6:28 AM, Tobias Hartmann wrote: >> Thanks, Vladimir. >> >> On 14.01.2016 19:59, Vladimir Kozlov wrote: >>> You have to update code for 8146999 changes when Roland push it. >> >> Yes, I'll do so but Roland mentioned that he still has problems with his 814699 fix. >> >>> The only thing I don't like about changes is using #ifdef _LP64 for part of changes. >>> I know where it is coming from (ConvI2L for loop indexing) but as you said ConvI2L could be generated in other cases too. Should the test cast->has_range_check() return 'false' in 32-bit? >> >> I added the _LP64 ifdefs because we only emit a narrowed ConvI2L on 64 bit. But I agree - it's cleaner without those. As you suggested, I removed the ifdefs and changed has_range_check() to return false on 32 bit. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~thartmann/6675699/webrev.02/ >> >> Thanks, >> Tobias >> >>> On 1/14/16 8:00 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-6675699 >>>> http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/ >>>> >>>> *Problem* >>>> The problem is that ConvI2L nodes with a narrow type (used to convert integer array indices to long values) are not dependent on the corresponding range check that proves that the input value is always in the (integer-)range. As a result, the ConvI2L node may flow above the range check during loop optimizations and end up with an input that is not in its type range. The node is then replaced by TOP causing the data path to be eliminated. However, because there is no control dependency on the corresponding range check, the control path from the peeled iteration that uses the result of the ConvI2L may not be eliminated. We crash because we are potentially using a value that is not available. >>>> >>>> For example, TestLoopPeeling::testArrayAccess() triggers loop peeling because the loop contains an invariant check. The array store in line 66 is moved out of the loop and reachable from the peeled and old iterations of the loop. However, the array index computation consisting of a LShiftL(ConvI2L(Phi)) remains in each loop because it has loop variant usages and is not dependent on the range check that was moved out of the loop. The peeled iteration of the loop uses storeIndex == -1 causing the ConvI2L to be replaced by TOP because -1 is not in its [0, MAX_INT] range. The TOP is propagated downwards and ends up as one of the inputs to the Phi that merges the array index from the peeled and old loop exits. The Phi replaced by it's only remaining input and the store ends up using the index from the old iteration although it's still reachable from the peeled iteration. We crash because we potentially use the index value from the old iteration while coming from the peeled i! t! > e! >> r! >>> at! >>>> ion (of co >>>> >>>> urse, the range check would catch this at runtime). >>>> >>>> This problem may show up with array accesses but also with other code for which we emit a ConvI2L node with a narrow type. For example, array allocation uses a ConvI2L to convert the integer array size to a long value (see TestLoopPeeling::testArrayAllocation). We solved several different instances of this problem in the past with "workaround-fixes" that just disabled loop optimizations in special cases (see below). Such a workaround fix is not feasible to fix all potential occurrences of this problem. TestLoopPeeling.java crashes JDK 7, 8 and 9. >>>> >>>> *Solution* >>>> To make the ConvI2L dependent on a range check, I added code to emit a narrow CastII node with a control dependency on the range check that is then used as input to the ConvI2L. Like this, we explicitly express the dependency and prevent loop optimizations from moving the ConvI2L above the range check. >>>> >>>> To make sure that the impact is as small as possible, the range check dependent CastII nodes are removed right after loop optimizations. Further, all optimizations that depend on the old shape of array address computations are adapted to be aware of the CastII node. >>>> >>>> With the fix, we could now remove the following old "workaround-fixes": >>>> https://bugs.openjdk.java.net/browse/JDK-4781451 >>>> https://bugs.openjdk.java.net/browse/JDK-4799512 >>>> https://bugs.openjdk.java.net/browse/JDK-6659207 >>>> https://bugs.openjdk.java.net/browse/JDK-6663854 >>>> For reference, the individual patches can be found here: >>>> http://cr.openjdk.java.net/~thartmann/6675699/backouts/ >>>> >>>> However, performance evaluation showed that backing out the old fixes causes significant regressions. It seems that aggressive splitting of ConvI2L nodes through phis leads to less optimal code due to more register spilling. I suspect that additional changes to the loop optimizations are necessary and would therefore like to leave the workaround fixes in for now. I filed JDK-8145313 to remove them later. Like this, we also reduce the impact/risk when backporting this fix to JDK 8 and potentially JDK 7. >>>> >>>> Roland pointed out that the changes in ConvI2LNode::Ideal() could potentially be merged into the CastIINode::Ideal() optimization introduced by his fix for JDK-8145322. After some investigation it turned out that the CastII optimization does not only affect memory addressing but also other CastII(AddI(..)) graph shapes. Making it more generic has a broader impact and therefore needs more investigation. I filed JDK-8147394 for this. >>>> >>>> ConvI2L nodes with a narrow type are also emitted by intrinsics: >>>> - GraphKit::array_element_address() >>>> - PhaseMacroExpand::array_element_address() >>>> - ArrayCopyNode::prepare_array_copy() >>>> I was not able to reproduce the problem with intrinsics. It's also not easily possible to make the CastII node range check dependent here because the range check is not always available from within the intrinsic. >>>> >>>> *Testing* >>>> I did extensive testing to make sure the fix does not introduce correctness or performance issues. >>>> - Different RBT test suites [1] with and without -Xcomp. >>>> - Full run of multiple CTW suites. >>>> - Verified changes in "PhaseIdealLoop::match_fill_loop" (loopTransform.cpp) by manually checking the output of [2] with -XX:+TraceOptimizeFill. >>>> - Verified changes in "IfNode::improve_address_types" (ifnode.cpp) by manually checking the output of [3] with -XX:+PrintOptoAssembly to make sure all range checks are folded. >>>> - Verified changes in superword.cpp by comparing output with -XX:+TraceSuperWord. >>>> - Performance runs (Footprint, JMH-Javac, SPECjbb2005, SPECjvm2008, Startup, Volano) on x86 and SPARC showed no regression >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] RBT test suites: >>>> - hotspot/test/:hotspot_all >>>> - noncolo.testlist >>>> - vm.compiler.testlist >>>> - vm.regression.testlist >>>> - nsk.regression.testlist >>>> - nsk.split_verifier.testlist >>>> - nsk.stress.testlist >>>> - nsk.stress.jck.testlist >>>> - jdk/test/:jdk_jfr >>>> - jdk/test/:svc_tools >>>> - jdk/test/:jdk_instrument >>>> - jdk/test/:jdk_lang >>>> - jdk/test/:jdk_svc >>>> - nashorn/test/:tier1 >>>> - nashorn/test/:tier2 >>>> - nashorn/test/:tier3 >>>> Only without -Xcomp: >>>> - Kitchensink >>>> - runThese >>>> - Weblogic12medrec >>>> [2] test/compiler/intrinsics/6982370/Test6982370.java >>>> [3] test/compiler/rangechecks/TestExplicitRangeChecks.java >>>> From zoltan.majo at oracle.com Mon Jan 18 07:44:42 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 18 Jan 2016 08:44:42 +0100 Subject: [9] RFR (XS): 8147441: unchecked pending exceptions in the WhiteBox API's implementation In-Reply-To: <56993628.1060702@oracle.com> References: <5698B114.4020106@oracle.com> <56993628.1060702@oracle.com> Message-ID: <569C97EA.6020409@oracle.com> Thank you, Vladimir, for the review! Best regards, Zoltan On 01/15/2016 07:10 PM, Vladimir Kozlov wrote: > Seems fine. > > Thanks, > Vladimir > > On 1/15/16 12:43 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8147441. >> >> https://bugs.openjdk.java.net/browse/JDK-8147441 >> >> Problem: The method codeBlob2objectArray is used by the implementation >> of the WB API to fill in an object array with information about a code >> blob. Although the codeBlob2objectArray method can cause various JNI >> exceptions, there are two code locations where the VM does not check for >> exceptions after codeBlob2objectArray returns. >> >> Solution: Add exception check to the above mentioned code locations. >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8147441/webrev.00/ >> >> Testing: >> - JPRT; >> - all hotspot tests executed locally; all tests that pass with the >> default version pass with the fixed version as well. >> >> Thank you and best regards, >> >> >> Zoltan >> From vladimir.x.ivanov at oracle.com Mon Jan 18 12:54:48 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 18 Jan 2016 15:54:48 +0300 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode Message-ID: <569CE098.4030807@oracle.com> http://cr.openjdk.java.net/~vlivanov/7177745/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-7177745 JVM aggressively inlines through CallSites, even for mutable and volatile flavors. It's the key optimization for making invokedynamic performant. When a CallSite.target is updated, JVM invalidates all affected nmethods and try to recompile them later. If a call site target regularly changes, JVM will eventually mark (after PerMethodRecompilationCutoff invalidations) all hot methods which have the call site bound as non-compilable. It leads to significant peak performance reduction, because all affected methods will always be executed in interpreter mode since then. The fix is to avoid updating recompilation count when corresponding nmethod is invalidated due to a call site target change. I filed a separate RFE (JDK-8147550 [1]) to consider slow non-inlined code shape for unstable call sites, as John suggested [2]. Testing: regression test, octane, JPRT. Thanks! Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8147550 [2] https://bugs.openjdk.java.net/browse/JDK-7177745?focusedCommentId=13821545&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13821545 From aleksey.shipilev at oracle.com Mon Jan 18 13:15:58 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 18 Jan 2016 16:15:58 +0300 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <569CE098.4030807@oracle.com> References: <569CE098.4030807@oracle.com> Message-ID: <569CE58E.6030805@oracle.com> On 18.01.2016 15:54, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/7177745/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-7177745 Finally. Cheers, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roland.schatz at oracle.com Mon Jan 18 16:40:52 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Mon, 18 Jan 2016 17:40:52 +0100 Subject: RFR(XS): 8147564: [JVMCI] remove unused method CodeCacheProvider.needsDataPatch Message-ID: <569D1594.6060502@oracle.com> Hi, Please review this small patch: webrev: http://cr.openjdk.java.net/~rschatz/JDK-8147564/webrev.00/ jira: https://bugs.openjdk.java.net/browse/JDK-8147564 The removed method always returned false, because there is no class implementing both the JavaConstant and the HotSpotMetaspaceConstant interfaces. Thanks, Roland From roland.westrelin at oracle.com Mon Jan 18 20:39:50 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 18 Jan 2016 21:39:50 +0100 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure In-Reply-To: References: Message-ID: > http://cr.openjdk.java.net/~roland/8146999/webrev.00/ Further testing revealed this bug was hiding another one. http://cr.openjdk.java.net/~roland/8146999/webrev.01/ In PhiNode::unique_input(), uncast() could step over the CheckCastPP node that follows an Allocation. If that happens, a new CheckCastPP will be created and will replace the CheckCastPP of the allocation but with a different control and it could cause a safepoint to be in between the allocation and the CheckCastPP and assert failures with "there should be a oop in OopMap instead of a live raw oop at safepoint?. Roland. From vladimir.kozlov at oracle.com Mon Jan 18 23:07:57 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 18 Jan 2016 15:07:57 -0800 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure In-Reply-To: References: Message-ID: <569D704D.6080908@oracle.com> Looks good. Thanks, Vladimir On 1/18/16 12:39 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8146999/webrev.00/ > > Further testing revealed this bug was hiding another one. > > http://cr.openjdk.java.net/~roland/8146999/webrev.01/ > > In PhiNode::unique_input(), uncast() could step over the CheckCastPP node that follows an Allocation. If that happens, a new CheckCastPP will be created and will replace the CheckCastPP of the allocation but with a different control and it could cause a safepoint to be in between the allocation and the CheckCastPP and assert failures with "there should be a oop in OopMap instead of a live raw oop at safepoint?. > > Roland. > From tobias.hartmann at oracle.com Tue Jan 19 08:18:04 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 19 Jan 2016 09:18:04 +0100 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure In-Reply-To: References: Message-ID: <569DF13C.8020700@oracle.com> Hi Roland, this looks good to me but please make sure that you merge your node.hpp changes with my 6675699 changes: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/diff/bfb7a8a004de/src/share/vm/opto/node.hpp Best, Tobias On 18.01.2016 21:39, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8146999/webrev.00/ > > Further testing revealed this bug was hiding another one. > > http://cr.openjdk.java.net/~roland/8146999/webrev.01/ > > In PhiNode::unique_input(), uncast() could step over the CheckCastPP node that follows an Allocation. If that happens, a new CheckCastPP will be created and will replace the CheckCastPP of the allocation but with a different control and it could cause a safepoint to be in between the allocation and the CheckCastPP and assert failures with "there should be a oop in OopMap instead of a live raw oop at safepoint?. > > Roland. > From roland.westrelin at oracle.com Tue Jan 19 10:09:57 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 19 Jan 2016 11:09:57 +0100 Subject: RFR(XS): 8146999: hotspot/test/compiler/c2/8007294/Test8007294.java test nightly failure In-Reply-To: <569DF13C.8020700@oracle.com> References: <569DF13C.8020700@oracle.com> Message-ID: <2FCABBE1-182F-4608-A79E-7E62ADF9B7EF@oracle.com> Thanks Vladimir & Tobias for the review. Roland. From andreas.eriksson at oracle.com Tue Jan 19 12:32:30 2016 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Tue, 19 Jan 2016 13:32:30 +0100 Subject: RFR(S): 8146096: [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts Message-ID: <569E2CDE.3060805@oracle.com> Hi, Can I please have a review for the removal of hotspot/test/compiler/loopopts/UseCountedLoopSafepoints.java. The test needs to do a loop that takes more than two seconds to execute fully without doing a safepointing call. For this expensive atomic operations were used. The problem is that on certain embedded platforms they are too expensive, and the test times out. The loop length could probably be reduced, and it should still work on faster machines. However, the test is not very useful, so I think it's better to just remove it to avoid future problems. Bug: https://bugs.openjdk.java.net/browse/JDK-8146096 Test to be removed: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/d84a55e7aaf8/test/compiler/loopopts/UseCountedLoopSafepoints.java (I can create a webrev if you think it necessary.) Thanks, Andreas From vladimir.x.ivanov at oracle.com Tue Jan 19 12:50:17 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Jan 2016 15:50:17 +0300 Subject: RFR(S): 8146096: [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: <569E2CDE.3060805@oracle.com> References: <569E2CDE.3060805@oracle.com> Message-ID: <569E3109.8090107@oracle.com> As an idea to improve the test: spawn a thread which executes the counted loop and then use WhiteBox.forceSafepoint() to trigger a safepoint. If the test times out, it means there's no safepoint in the loop. Also, it also simplifies the implementation - no need to spawn a child process, the check can be done in-process. Best regards, Vladimir Ivanov On 1/19/16 3:32 PM, Andreas Eriksson wrote: > Hi, > > Can I please have a review for the removal of > hotspot/test/compiler/loopopts/UseCountedLoopSafepoints.java. > > The test needs to do a loop that takes more than two seconds to execute > fully without doing a safepointing call. For this expensive atomic > operations were used. The problem is that on certain embedded platforms > they are too expensive, and the test times out. > The loop length could probably be reduced, and it should still work on > faster machines. However, the test is not very useful, so I think it's > better to just remove it to avoid future problems. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8146096 > Test to be removed: > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/d84a55e7aaf8/test/compiler/loopopts/UseCountedLoopSafepoints.java > > (I can create a webrev if you think it necessary.) > > Thanks, > Andreas From roland.westrelin at oracle.com Tue Jan 19 15:22:35 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 19 Jan 2016 16:22:35 +0100 Subject: Request for Reviews (S): JDK-8003585 strength reduce or eliminate range checks for power-of-two sized arrays In-Reply-To: <5697E923.6000908@oracle.com> References: <440F2280-4B25-4AE6-A4F6-DDD4EB529636@oracle.com> <52FC129D.7040409@oracle.com> <52FE6A08.20400@oracle.com> <52FE7313.3060404@oracle.com> <530209A8.1020501@oracle.com> <38EE6922-0B9C-49A6-B54D-E78BA0EFECB1@oracle.com> <8232A81B-6B78-4F61-A8EC-1A3DF3938648@oracle.com> <70FBA4CF-CF05-4232-AFEC-202E93BFA930@oracle.com> <5697E923.6000908@oracle.com> Message-ID: <0317CD9D-F104-4AFE-BB75-3966C8DF8421@oracle.com> Thanks for taking another look at this, Vladimir. > I know it is duplication but CmpU creation should be under conditions otherwise you are creating and transforming dead node. > > + Node* ncmp = phase->transform(new CmpUNode(cmp1, cmp2)); > + if (_test._test == BoolTest::le || _test._test == BoolTest::eq) { > > The test does not cover next conversions: > > + // Change (arraylength <= 0) or (arraylength == 0) > + // into (arraylength u<= 0) > + // Also change (arraylength != 0) into (arraylength u> 0) Here is a new webrev: http://cr.openjdk.java.net/~roland/8003585/webrev.02/ Roland. > > Thanks, > Vladimir > > On 1/7/16 1:29 AM, Roland Westrelin wrote: >> Can I get a review for this? >> >> Roland. >> >>> On Oct 5, 2015, at 12:51 PM, Roland Westrelin wrote: >>> >>> Here is a new webrev: >>> >>> http://cr.openjdk.java.net/~roland/8003585/webrev.01/ >>> >>> Roland. >>> >>>> On Oct 2, 2015, at 3:30 PM, Roland Westrelin wrote: >>>> >>>> Hi Chris, >>>> >>>>> Thanks for picking it up! It mostly looks good to me. (Not a Reviewer) >>>> >>>> Thanks for looking at this again. >>>> >>>>> What I really needed with my earlier webrev was some instructions as to what test to write -- since the Java corelibs can come across this optimization a lot (e.g. HashMap), I didn't have a good idea of what kind of test really needs to be written. >>>>> >>>>> A couple of issues with this webrev: >>>>> >>>>> 1. In subnode.cpp, line 1346: >>>>> >>>>> 1344 } else if (_test._test == BoolTest::lt && >>>>> 1345 cmp2->Opcode() == Op_AddI && >>>>> 1346 cmp2->in(2)->find_int_con(1)) { >>>>> 1347 bound = cmp2->in(1); >>>>> 1348 } >>>>> >>>>> I think it should be >>>>> cmp2->in(2)->find_int_con(0) == 1 >>>>> instead, because the value passed into this function is actually for a "fallback when no int constant is found". Passing the expected value (1) to it defeats the purpose. >>>> >>>> You?re right. Thanks for spotting that. >>>> >>>>> jint find_int_con(jint value_if_unknown) const { >>>>> const TypeInt* t = find_int_type(); >>>>> return (t != NULL && t->is_con()) ? t->get_con() : value_if_unknown; >>>>> } >>>>> >>>>> 2. Formattign nitpick: could you please trim the spaces before the new's on lines 1368, 1369 and 1387 >>>> >>>> Sure. >>>> >>>> I?ll send an updated webrev. >>>> >>>> Roland. >>>> >>>>> >>>>> Thanks, >>>>> Kris (OpenJDK username: krismo) >>>>> >>>>> On Wed, Sep 30, 2015 at 1:34 AM, Roland Westrelin wrote: >>>>> I?m picking that one up. Here is a new webrev: >>>>> >>>>> http://cr.openjdk.java.net/~roland/8003585/webrev.00/ >>>>> >>>>> The only change to c2 compared to the previous webrev is that ((x & m) u< m+1) is optimized the same way ((x & m) u<= m) is. Actually, I don?t think that C2 currently produces the ((x & m) u<= m) shape. The IfNode::fold_compares() logic produces the ((x & m) u< m+1) variant. I also added a test case to check the validity of the transformations and ran usual testing on the change. >>>>> >>>>> Roland. >>> >> From roland.westrelin at oracle.com Tue Jan 19 17:06:35 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 19 Jan 2016 18:06:35 +0100 Subject: RFR(XS): 8147386: assert(size == calc_size) failed: incorrect size calculattion x86_32.ad Message-ID: <9CE48190-9B0F-4571-937D-5F4162EA5296@oracle.com> http://cr.openjdk.java.net/~roland/8147386/webrev.00/ src_offset/dst_offset are incremented in the size computation code and then used if cbuf is not null but now have the wrong value. Roland. From vladimir.x.ivanov at oracle.com Tue Jan 19 17:59:01 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 19 Jan 2016 20:59:01 +0300 Subject: RFR(XS): 8147386: assert(size == calc_size) failed: incorrect size calculattion x86_32.ad In-Reply-To: <9CE48190-9B0F-4571-937D-5F4162EA5296@oracle.com> References: <9CE48190-9B0F-4571-937D-5F4162EA5296@oracle.com> Message-ID: <569E7965.9080800@oracle.com> Looks good. Best regards, Vladimir Ivanov On 1/19/16 8:06 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8147386/webrev.00/ > > src_offset/dst_offset are incremented in the size computation code and then used if cbuf is not null but now have the wrong value. > > Roland. > From vladimir.kozlov at oracle.com Tue Jan 19 18:20:58 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Jan 2016 10:20:58 -0800 Subject: RFR(XS): 8147386: assert(size == calc_size) failed: incorrect size calculattion x86_32.ad In-Reply-To: <9CE48190-9B0F-4571-937D-5F4162EA5296@oracle.com> References: <9CE48190-9B0F-4571-937D-5F4162EA5296@oracle.com> Message-ID: <569E7E8A.5080809@oracle.com> Good. Thanks, Vladimir On 1/19/16 9:06 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8147386/webrev.00/ > > src_offset/dst_offset are incremented in the size computation code and then used if cbuf is not null but now have the wrong value. > > Roland. > From vladimir.kozlov at oracle.com Tue Jan 19 18:30:34 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Jan 2016 10:30:34 -0800 Subject: [9] RFR (XS) 8146119: java/lang/Math/PowTests.java fails on solaris-x64 using -Xcomp In-Reply-To: <567B0CD6.2070107@oracle.com> References: <567B0B90.7000004@oracle.com> <567B0CD6.2070107@oracle.com> Message-ID: <569E80CA.6080806@oracle.com> I am pushing this change since we decided to keep 12.4SS C++ update. Thanks, Vladimir On 12/23/15 1:06 PM, Vladimir Kozlov wrote: > Thanks! > > On 12/23/15 1:05 PM, Christian Thalinger wrote: >> Unfortunate but looks good. >> >>> On Dec 23, 2015, at 11:01 AM, Vladimir Kozlov wrote: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8146119 >>> >>> http://cr.openjdk.java.net/~kvn/8146119/webrev/ >>> >>> New SunStudio C++ compiler generates incorrect code in library_call.cpp. All build versions are affected. >>> It is also failed with -xO0 level so I removed any optimizations. >>> >>> Tested with failed test. >>> >>> Thanks, >>> Vladimir >> From volker.simonis at gmail.com Tue Jan 19 18:57:06 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 19 Jan 2016 19:57:06 +0100 Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change Message-ID: Hi, can somebody please review and sponsor this change. Despite the bug summary, I still had to do some small shared changes to make this work, so unfortunately I can not push this on my own. The change also affects aarch64 (although it is minimal and I don't expect it to break anything) so I cc-ed aarch64-port-dev. http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ https://bugs.openjdk.java.net/browse/JDK-8145336 As described in the bug, this change only fixes the string intrinsics for the -XX:-UseCompactStrings mode which is still the default on ppc64. Additionally, support for the new StrIndexOfChar intrinsic was added because we already had a similar intrinsic for constant string needles of length one anyway. A later change (which we're already working on) will add the intrinsics which can handle compact strings. The current intrinsics can handle both, the new byte-array based string representation as well as the old char-array based string representation because we internally still use the new hotspot with older versions of the class libraries. I've also ported some of our internal string tests into a new regression test (TestStringIntrinsics2.java) because the existing tests didn't exercise all of our intrinsics. Following the shared changes I had to do: Until now, UseSSE42Intrinsics was a global shared option which was used to control the availability of the stringIndexOf intrinsics. But UseSSE42Intrinsics is actually a x86-specific feature so it doesn't make a lot of sense to define it for other architectures. I've therefore moved the flag to globals_x86.hpp and changed the condition which checks for the ability of the stringIndexOf intrinsics from: if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { to: if (!Matcher::match_rule_supported(Op_StrIndexOf)) { The Matcher::match_rule_supported() method already calls Matcher::has_match_rule() anyway. And it is implemented in the .ad file so I've moved the check for UseSSE42Intrinsics into x86.ad. Other platforms can now decide in their .ad file if they unconditionally support the intrinsic or if they need a special feature check. This change was already briefly discussed in [1]. The other shared change I had to make was in LibraryCallKit::make_string_method_node() for the "Op_StrEquals" case. We have optimized intrinsics for the case that one of the strings to compare is constant, but the StrEqualsNode is constructed without taking into account that one of the string length values could be a constant. This prevented our optimized instruction from being matched in the ad-file. All the other changes are ppc-specific. Thank you and best regards, Volker [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/thread.html#20400 From michael.c.berg at intel.com Tue Jan 19 19:07:35 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 19 Jan 2016 19:07:35 +0000 Subject: RFR(XS): 8147386: assert(size == calc_size) failed: incorrect size calculattion x86_32.ad In-Reply-To: <569E7E8A.5080809@oracle.com> References: <9CE48190-9B0F-4571-937D-5F4162EA5296@oracle.com> <569E7E8A.5080809@oracle.com> Message-ID: Looks ok. -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Tuesday, January 19, 2016 10:21 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(XS): 8147386: assert(size == calc_size) failed: incorrect size calculattion x86_32.ad Good. Thanks, Vladimir On 1/19/16 9:06 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8147386/webrev.00/ > > src_offset/dst_offset are incremented in the size computation code and then used if cbuf is not null but now have the wrong value. > > Roland. > From tom.rodriguez at oracle.com Tue Jan 19 19:32:39 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 19 Jan 2016 11:32:39 -0800 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output Message-ID: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> http://cr.openjdk.java.net/~never/8147432/webrev/index.html https://bugs.openjdk.java.net/browse/JDK-8147432 Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Jan 19 19:58:36 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Jan 2016 09:58:36 -1000 Subject: RFR(XS): 8147564: [JVMCI] remove unused method CodeCacheProvider.needsDataPatch In-Reply-To: <569D1594.6060502@oracle.com> References: <569D1594.6060502@oracle.com> Message-ID: <61268260-8FD9-43DA-93F1-866C3D769E10@oracle.com> Looks good. > On Jan 18, 2016, at 6:40 AM, Roland Schatz wrote: > > Hi, > > Please review this small patch: > webrev: http://cr.openjdk.java.net/~rschatz/JDK-8147564/webrev.00/ > jira: https://bugs.openjdk.java.net/browse/JDK-8147564 > > The removed method always returned false, because there is no class implementing both the JavaConstant and the HotSpotMetaspaceConstant interfaces. > > Thanks, > Roland From vladimir.kozlov at oracle.com Tue Jan 19 20:02:07 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Jan 2016 12:02:07 -0800 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> Message-ID: <569E963F.8060901@oracle.com> Looks good. Thanks, Vladimir On 1/19/16 11:32 AM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8147432/webrev/index.html > https://bugs.openjdk.java.net/browse/JDK-8147432 > > Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new > CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? > style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of > inlined byte codes for use by things like CITimeEach. Additionally I removed the > CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and > PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. > > tom From vladimir.kozlov at oracle.com Tue Jan 19 20:04:41 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Jan 2016 12:04:41 -0800 Subject: Request for Reviews (S): JDK-8003585 strength reduce or eliminate range checks for power-of-two sized arrays In-Reply-To: <0317CD9D-F104-4AFE-BB75-3966C8DF8421@oracle.com> References: <440F2280-4B25-4AE6-A4F6-DDD4EB529636@oracle.com> <52FC129D.7040409@oracle.com> <52FE6A08.20400@oracle.com> <52FE7313.3060404@oracle.com> <530209A8.1020501@oracle.com> <38EE6922-0B9C-49A6-B54D-E78BA0EFECB1@oracle.com> <8232A81B-6B78-4F61-A8EC-1A3DF3938648@oracle.com> <70FBA4CF-CF05-4232-AFEC-202E93BFA930@oracle.com> <5697E923.6000908@oracle.com> <0317CD9D-F104-4AFE-BB75-3966C8DF8421@oracle.com> Message-ID: <569E96D9.1070909@oracle.com> Thanks! Looks good. Vladimir On 1/19/16 7:22 AM, Roland Westrelin wrote: > Thanks for taking another look at this, Vladimir. > >> I know it is duplication but CmpU creation should be under conditions otherwise you are creating and transforming dead node. >> >> + Node* ncmp = phase->transform(new CmpUNode(cmp1, cmp2)); >> + if (_test._test == BoolTest::le || _test._test == BoolTest::eq) { >> >> The test does not cover next conversions: >> >> + // Change (arraylength <= 0) or (arraylength == 0) >> + // into (arraylength u<= 0) >> + // Also change (arraylength != 0) into (arraylength u> 0) > > Here is a new webrev: > > http://cr.openjdk.java.net/~roland/8003585/webrev.02/ > > Roland. > >> >> Thanks, >> Vladimir >> >> On 1/7/16 1:29 AM, Roland Westrelin wrote: >>> Can I get a review for this? >>> >>> Roland. >>> >>>> On Oct 5, 2015, at 12:51 PM, Roland Westrelin wrote: >>>> >>>> Here is a new webrev: >>>> >>>> http://cr.openjdk.java.net/~roland/8003585/webrev.01/ >>>> >>>> Roland. >>>> >>>>> On Oct 2, 2015, at 3:30 PM, Roland Westrelin wrote: >>>>> >>>>> Hi Chris, >>>>> >>>>>> Thanks for picking it up! It mostly looks good to me. (Not a Reviewer) >>>>> >>>>> Thanks for looking at this again. >>>>> >>>>>> What I really needed with my earlier webrev was some instructions as to what test to write -- since the Java corelibs can come across this optimization a lot (e.g. HashMap), I didn't have a good idea of what kind of test really needs to be written. >>>>>> >>>>>> A couple of issues with this webrev: >>>>>> >>>>>> 1. In subnode.cpp, line 1346: >>>>>> >>>>>> 1344 } else if (_test._test == BoolTest::lt && >>>>>> 1345 cmp2->Opcode() == Op_AddI && >>>>>> 1346 cmp2->in(2)->find_int_con(1)) { >>>>>> 1347 bound = cmp2->in(1); >>>>>> 1348 } >>>>>> >>>>>> I think it should be >>>>>> cmp2->in(2)->find_int_con(0) == 1 >>>>>> instead, because the value passed into this function is actually for a "fallback when no int constant is found". Passing the expected value (1) to it defeats the purpose. >>>>> >>>>> You?re right. Thanks for spotting that. >>>>> >>>>>> jint find_int_con(jint value_if_unknown) const { >>>>>> const TypeInt* t = find_int_type(); >>>>>> return (t != NULL && t->is_con()) ? t->get_con() : value_if_unknown; >>>>>> } >>>>>> >>>>>> 2. Formattign nitpick: could you please trim the spaces before the new's on lines 1368, 1369 and 1387 >>>>> >>>>> Sure. >>>>> >>>>> I?ll send an updated webrev. >>>>> >>>>> Roland. >>>>> >>>>>> >>>>>> Thanks, >>>>>> Kris (OpenJDK username: krismo) >>>>>> >>>>>> On Wed, Sep 30, 2015 at 1:34 AM, Roland Westrelin wrote: >>>>>> I?m picking that one up. Here is a new webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~roland/8003585/webrev.00/ >>>>>> >>>>>> The only change to c2 compared to the previous webrev is that ((x & m) u< m+1) is optimized the same way ((x & m) u<= m) is. Actually, I don?t think that C2 currently produces the ((x & m) u<= m) shape. The IfNode::fold_compares() logic produces the ((x & m) u< m+1) variant. I also added a test case to check the validity of the transformations and ran usual testing on the change. >>>>>> >>>>>> Roland. >>>> >>> > From christian.thalinger at oracle.com Tue Jan 19 20:12:31 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Jan 2016 10:12:31 -1000 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> Message-ID: <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> src/share/vm/compiler/compileBroker.cpp: + failure_reason = ci_env.failure_reason(); + retry_message = ci_env.retry_message(); ci_env.report_failure(ci_env.failure_reason()); Why not use failure_reason? src/share/vm/jvmci/jvmciCompiler.cpp: + oop failure_message = CompilationRequestResult::failureMessage(result_object); + if (failure_message != NULL) { + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; failure_message is guaranteed to be non-null. + oop result_object = (oop) result.get_jobject(); + if (result_object != NULL) { Looks like there is nothing to handle the null case. Should we? > On Jan 19, 2016, at 9:32 AM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8147432/webrev/index.html > https://bugs.openjdk.java.net/browse/JDK-8147432 > > Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Jan 19 20:36:57 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Jan 2016 12:36:57 -0800 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <569CE098.4030807@oracle.com> References: <569CE098.4030807@oracle.com> Message-ID: <569E9E69.4070202@oracle.com> Looks fine but in vmStructs.cpp you should replace the field declaration instead of just removing old one. Also look if SA access it. Thanks, Vladimir On 1/18/16 4:54 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/7177745/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-7177745 > > JVM aggressively inlines through CallSites, even for mutable and volatile flavors. It's the key optimization for making > invokedynamic performant. > > When a CallSite.target is updated, JVM invalidates all affected nmethods and try to recompile them later. If a call site > target regularly changes, JVM will eventually mark (after PerMethodRecompilationCutoff invalidations) all hot methods > which have the call site bound as non-compilable. It leads to significant peak performance reduction, because all > affected methods will always be executed in interpreter mode since then. > > The fix is to avoid updating recompilation count when corresponding nmethod is invalidated due to a call site target > change. > > I filed a separate RFE (JDK-8147550 [1]) to consider slow non-inlined code shape for unstable call sites, as John > suggested [2]. > > Testing: regression test, octane, JPRT. > > Thanks! > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8147550 > [2] > https://bugs.openjdk.java.net/browse/JDK-7177745?focusedCommentId=13821545&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13821545 > From tom.rodriguez at oracle.com Tue Jan 19 20:40:48 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 19 Jan 2016 12:40:48 -0800 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> Message-ID: <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> > On Jan 19, 2016, at 12:12 PM, Christian Thalinger wrote: > > src/share/vm/compiler/compileBroker.cpp: > > + failure_reason = ci_env.failure_reason(); > + retry_message = ci_env.retry_message(); > ci_env.report_failure(ci_env.failure_reason()); > > Why not use failure_reason? Fewer edits? :) I?ll fix it. > > src/share/vm/jvmci/jvmciCompiler.cpp: > > + oop failure_message = CompilationRequestResult::failureMessage(result_object); > + if (failure_message != NULL) { > + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; > > failure_message is guaranteed to be non-null. Right. The code evolved a few times but now that test is unnecessary. > > + oop result_object = (oop) result.get_jobject(); > + if (result_object != NULL) { > > Looks like there is nothing to handle the null case. Should we? I debated on that. Maybe a Java assert in HotSpotJVMCIRuntime.compileMethod that JVMCICompiler.compileMethod always returns non-null? I don?t know that there?s anything useful we can in the C++ code if it?s null. tom > >> On Jan 19, 2016, at 9:32 AM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8147432/webrev/index.html >> https://bugs.openjdk.java.net/browse/JDK-8147432 >> >> Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Jan 19 20:44:20 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Jan 2016 10:44:20 -1000 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> Message-ID: > On Jan 19, 2016, at 10:40 AM, Tom Rodriguez wrote: > > >> On Jan 19, 2016, at 12:12 PM, Christian Thalinger > wrote: >> >> src/share/vm/compiler/compileBroker.cpp: >> >> + failure_reason = ci_env.failure_reason(); >> + retry_message = ci_env.retry_message(); >> ci_env.report_failure(ci_env.failure_reason()); >> >> Why not use failure_reason? > > Fewer edits? :) I?ll fix it. :-D > >> >> src/share/vm/jvmci/jvmciCompiler.cpp: >> >> + oop failure_message = CompilationRequestResult::failureMessage(result_object); >> + if (failure_message != NULL) { >> + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; >> >> failure_message is guaranteed to be non-null. > > Right. The code evolved a few times but now that test is unnecessary. > >> >> + oop result_object = (oop) result.get_jobject(); >> + if (result_object != NULL) { >> >> Looks like there is nothing to handle the null case. Should we? > > I debated on that. Maybe a Java assert in HotSpotJVMCIRuntime.compileMethod that JVMCICompiler.compileMethod always returns non-null? I don?t know that there?s anything useful we can in the C++ code if it?s null. Assert in Java sounds good. I was thinking about a hard-failure in C++ since it shouldn?t happen. > > tom > >> >>> On Jan 19, 2016, at 9:32 AM, Tom Rodriguez > wrote: >>> >>> http://cr.openjdk.java.net/~never/8147432/webrev/index.html >>> https://bugs.openjdk.java.net/browse/JDK-8147432 >>> >>> Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. >>> >>> tom >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Jan 19 20:46:33 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Jan 2016 12:46:33 -0800 Subject: RFR(S): 8146096: [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: <569E3109.8090107@oracle.com> References: <569E2CDE.3060805@oracle.com> <569E3109.8090107@oracle.com> Message-ID: <569EA0A9.8050406@oracle.com> Simple use timeout to check for generated safepoint is bad idea. It is very inaccurate. At least you need to check call stack to see if it stopped in compiled method. I would prefer to see WB new interface which would check that loop SafePointNode is generated during compilation of method. It will be precise. And we need such tests to make sure a feature is working - we can't remove them. Thanks, Vladimir On 1/19/16 4:50 AM, Vladimir Ivanov wrote: > As an idea to improve the test: spawn a thread which executes the counted loop and then use WhiteBox.forceSafepoint() to > trigger a safepoint. > > If the test times out, it means there's no safepoint in the loop. > > Also, it also simplifies the implementation - no need to spawn a child process, the check can be done in-process. > > Best regards, > Vladimir Ivanov > > On 1/19/16 3:32 PM, Andreas Eriksson wrote: >> Hi, >> >> Can I please have a review for the removal of >> hotspot/test/compiler/loopopts/UseCountedLoopSafepoints.java. >> >> The test needs to do a loop that takes more than two seconds to execute >> fully without doing a safepointing call. For this expensive atomic >> operations were used. The problem is that on certain embedded platforms >> they are too expensive, and the test times out. >> The loop length could probably be reduced, and it should still work on >> faster machines. However, the test is not very useful, so I think it's >> better to just remove it to avoid future problems. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8146096 >> Test to be removed: >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/d84a55e7aaf8/test/compiler/loopopts/UseCountedLoopSafepoints.java >> >> (I can create a webrev if you think it necessary.) >> >> Thanks, >> Andreas From tom.rodriguez at oracle.com Tue Jan 19 20:51:34 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 19 Jan 2016 12:51:34 -0800 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> Message-ID: <0965DCE8-4C48-48B7-B8C8-A406C151B588@oracle.com> http://cr.openjdk.java.net/~never/8147432.00-01/webrev/index.html I added a Java assert that it?s non-null plus a C++ assert in the else case. So we won?t crash in product if it returns null and turning on Java assert will report something useful. tom > On Jan 19, 2016, at 12:44 PM, Christian Thalinger wrote: > >> >> On Jan 19, 2016, at 10:40 AM, Tom Rodriguez > wrote: >> >> >>> On Jan 19, 2016, at 12:12 PM, Christian Thalinger > wrote: >>> >>> src/share/vm/compiler/compileBroker.cpp: >>> >>> + failure_reason = ci_env.failure_reason(); >>> + retry_message = ci_env.retry_message(); >>> ci_env.report_failure(ci_env.failure_reason()); >>> >>> Why not use failure_reason? >> >> Fewer edits? :) I?ll fix it. > > :-D > >> >>> >>> src/share/vm/jvmci/jvmciCompiler.cpp: >>> >>> + oop failure_message = CompilationRequestResult::failureMessage(result_object); >>> + if (failure_message != NULL) { >>> + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; >>> >>> failure_message is guaranteed to be non-null. >> >> Right. The code evolved a few times but now that test is unnecessary. >> >>> >>> + oop result_object = (oop) result.get_jobject(); >>> + if (result_object != NULL) { >>> >>> Looks like there is nothing to handle the null case. Should we? >> >> I debated on that. Maybe a Java assert in HotSpotJVMCIRuntime.compileMethod that JVMCICompiler.compileMethod always returns non-null? I don?t know that there?s anything useful we can in the C++ code if it?s null. > > Assert in Java sounds good. I was thinking about a hard-failure in C++ since it shouldn?t happen. > >> >> tom >> >>> >>>> On Jan 19, 2016, at 9:32 AM, Tom Rodriguez > wrote: >>>> >>>> http://cr.openjdk.java.net/~never/8147432/webrev/index.html >>>> https://bugs.openjdk.java.net/browse/JDK-8147432 >>>> >>>> Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. >>>> >>>> tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Jan 19 20:57:03 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Jan 2016 10:57:03 -1000 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: <0965DCE8-4C48-48B7-B8C8-A406C151B588@oracle.com> References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> <0965DCE8-4C48-48B7-B8C8-A406C151B588@oracle.com> Message-ID: > On Jan 19, 2016, at 10:51 AM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8147432.00-01/webrev/index.html > > I added a Java assert that it?s non-null plus a C++ assert in the else case. So we won?t crash in product if it returns null and turning on Java assert will report something useful. I am worried about corner cases which we will never see because the VM silently ignores them. > > tom > >> On Jan 19, 2016, at 12:44 PM, Christian Thalinger > wrote: >> >>> >>> On Jan 19, 2016, at 10:40 AM, Tom Rodriguez > wrote: >>> >>> >>>> On Jan 19, 2016, at 12:12 PM, Christian Thalinger > wrote: >>>> >>>> src/share/vm/compiler/compileBroker.cpp: >>>> >>>> + failure_reason = ci_env.failure_reason(); >>>> + retry_message = ci_env.retry_message(); >>>> ci_env.report_failure(ci_env.failure_reason()); >>>> >>>> Why not use failure_reason? >>> >>> Fewer edits? :) I?ll fix it. >> >> :-D >> >>> >>>> >>>> src/share/vm/jvmci/jvmciCompiler.cpp: >>>> >>>> + oop failure_message = CompilationRequestResult::failureMessage(result_object); >>>> + if (failure_message != NULL) { >>>> + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; >>>> >>>> failure_message is guaranteed to be non-null. >>> >>> Right. The code evolved a few times but now that test is unnecessary. >>> >>>> >>>> + oop result_object = (oop) result.get_jobject(); >>>> + if (result_object != NULL) { >>>> >>>> Looks like there is nothing to handle the null case. Should we? >>> >>> I debated on that. Maybe a Java assert in HotSpotJVMCIRuntime.compileMethod that JVMCICompiler.compileMethod always returns non-null? I don?t know that there?s anything useful we can in the C++ code if it?s null. >> >> Assert in Java sounds good. I was thinking about a hard-failure in C++ since it shouldn?t happen. >> >>> >>> tom >>> >>>> >>>>> On Jan 19, 2016, at 9:32 AM, Tom Rodriguez > wrote: >>>>> >>>>> http://cr.openjdk.java.net/~never/8147432/webrev/index.html >>>>> https://bugs.openjdk.java.net/browse/JDK-8147432 >>>>> >>>>> Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. >>>>> >>>>> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Tue Jan 19 21:06:25 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Jan 2016 00:06:25 +0300 Subject: RFR(XS) : 8141557 : TestResolvedJavaMethod.java times out after 1000 ms Message-ID: <52CAB89A-1AA5-4545-9C4B-DD2A6880E463@oracle.com> http://cr.openjdk.java.net/~iignatyev/8141557/webrev.00/ > 22 lines changed: 16 ins; 0 del; 6 mod; Hi all, Could you please review the fix for 8141557? The test uses timeout value of org.junit.Test to test reading annotation via JVMCI. In some cases, e.g. on embedded platforms, debug builds or w/ extra vm flags like -Xcomp, 1000ms isn?t enough for the test to complete, and since jtreg doesn?t apply timeout factor for junit/testng timeouts (CODETOOLS-7901567) the test timeouts despite the fact of increased timeout factor. The fix changes the test to use a separate annotation, which doesn?t affect test execution, and remove timeout value (which means no timeout). JBS: https://bugs.openjdk.java.net/browse/JDK-8141557 testing: locally Thanks, Igor From tom.rodriguez at oracle.com Tue Jan 19 21:06:54 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 19 Jan 2016 13:06:54 -0800 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> <0965DCE8-4C48-48B7-B8C8-A406C151B588@oracle.com> Message-ID: > On Jan 19, 2016, at 12:57 PM, Christian Thalinger wrote: > > >> On Jan 19, 2016, at 10:51 AM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8147432.00-01/webrev/index.html >> >> I added a Java assert that it?s non-null plus a C++ assert in the else case. So we won?t crash in product if it returns null and turning on Java assert will report something useful. > > I am worried about corner cases which we will never see because the VM silently ignores them. That?s why we have the Java assert. This part is all informational anyway. Do we really need to do more? tom > >> >> tom >> >>> On Jan 19, 2016, at 12:44 PM, Christian Thalinger > wrote: >>> >>>> >>>> On Jan 19, 2016, at 10:40 AM, Tom Rodriguez > wrote: >>>> >>>> >>>>> On Jan 19, 2016, at 12:12 PM, Christian Thalinger > wrote: >>>>> >>>>> src/share/vm/compiler/compileBroker.cpp: >>>>> >>>>> + failure_reason = ci_env.failure_reason(); >>>>> + retry_message = ci_env.retry_message(); >>>>> ci_env.report_failure(ci_env.failure_reason()); >>>>> >>>>> Why not use failure_reason? >>>> >>>> Fewer edits? :) I?ll fix it. >>> >>> :-D >>> >>>> >>>>> >>>>> src/share/vm/jvmci/jvmciCompiler.cpp: >>>>> >>>>> + oop failure_message = CompilationRequestResult::failureMessage(result_object); >>>>> + if (failure_message != NULL) { >>>>> + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; >>>>> >>>>> failure_message is guaranteed to be non-null. >>>> >>>> Right. The code evolved a few times but now that test is unnecessary. >>>> >>>>> >>>>> + oop result_object = (oop) result.get_jobject(); >>>>> + if (result_object != NULL) { >>>>> >>>>> Looks like there is nothing to handle the null case. Should we? >>>> >>>> I debated on that. Maybe a Java assert in HotSpotJVMCIRuntime.compileMethod that JVMCICompiler.compileMethod always returns non-null? I don?t know that there?s anything useful we can in the C++ code if it?s null. >>> >>> Assert in Java sounds good. I was thinking about a hard-failure in C++ since it shouldn?t happen. >>> >>>> >>>> tom >>>> >>>>> >>>>>> On Jan 19, 2016, at 9:32 AM, Tom Rodriguez > wrote: >>>>>> >>>>>> http://cr.openjdk.java.net/~never/8147432/webrev/index.html >>>>>> https://bugs.openjdk.java.net/browse/JDK-8147432 >>>>>> >>>>>> Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. >>>>>> >>>>>> tom >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Tue Jan 19 21:11:33 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Jan 2016 11:11:33 -1000 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> <0965DCE8-4C48-48B7-B8C8-A406C151B588@oracle.com> Message-ID: > On Jan 19, 2016, at 11:06 AM, Tom Rodriguez wrote: > > >> On Jan 19, 2016, at 12:57 PM, Christian Thalinger > wrote: >> >> >>> On Jan 19, 2016, at 10:51 AM, Tom Rodriguez > wrote: >>> >>> http://cr.openjdk.java.net/~never/8147432.00-01/webrev/index.html >>> >>> I added a Java assert that it?s non-null plus a C++ assert in the else case. So we won?t crash in product if it returns null and turning on Java assert will report something useful. >> >> I am worried about corner cases which we will never see because the VM silently ignores them. > > That?s why we have the Java assert. Sure, but customers don?t run with assertions on and if the error is silently ignored we or the customer don't know to turn assertions on. > This part is all informational anyway. Do we really need to do more? No, let?s push it as it is. > > tom > >> >>> >>> tom >>> >>>> On Jan 19, 2016, at 12:44 PM, Christian Thalinger > wrote: >>>> >>>>> >>>>> On Jan 19, 2016, at 10:40 AM, Tom Rodriguez > wrote: >>>>> >>>>> >>>>>> On Jan 19, 2016, at 12:12 PM, Christian Thalinger > wrote: >>>>>> >>>>>> src/share/vm/compiler/compileBroker.cpp: >>>>>> >>>>>> + failure_reason = ci_env.failure_reason(); >>>>>> + retry_message = ci_env.retry_message(); >>>>>> ci_env.report_failure(ci_env.failure_reason()); >>>>>> >>>>>> Why not use failure_reason? >>>>> >>>>> Fewer edits? :) I?ll fix it. >>>> >>>> :-D >>>> >>>>> >>>>>> >>>>>> src/share/vm/jvmci/jvmciCompiler.cpp: >>>>>> >>>>>> + oop failure_message = CompilationRequestResult::failureMessage(result_object); >>>>>> + if (failure_message != NULL) { >>>>>> + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; >>>>>> >>>>>> failure_message is guaranteed to be non-null. >>>>> >>>>> Right. The code evolved a few times but now that test is unnecessary. >>>>> >>>>>> >>>>>> + oop result_object = (oop) result.get_jobject(); >>>>>> + if (result_object != NULL) { >>>>>> >>>>>> Looks like there is nothing to handle the null case. Should we? >>>>> >>>>> I debated on that. Maybe a Java assert in HotSpotJVMCIRuntime.compileMethod that JVMCICompiler.compileMethod always returns non-null? I don?t know that there?s anything useful we can in the C++ code if it?s null. >>>> >>>> Assert in Java sounds good. I was thinking about a hard-failure in C++ since it shouldn?t happen. >>>> >>>>> >>>>> tom >>>>> >>>>>> >>>>>>> On Jan 19, 2016, at 9:32 AM, Tom Rodriguez > wrote: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~never/8147432/webrev/index.html >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8147432 >>>>>>> >>>>>>> Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. >>>>>>> >>>>>>> tom >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.rodriguez at oracle.com Tue Jan 19 21:26:28 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 19 Jan 2016 13:26:28 -0800 Subject: RFR(S): 8147432: JVMCI should report bailouts in PrintCompilation output In-Reply-To: References: <73910769-D6B7-4162-B7CB-A70F2C2380DF@oracle.com> <31562912-6D67-402B-A2EF-621D3A59D09A@oracle.com> <9111BB03-C6B0-4ECE-8131-0249B62FD94B@oracle.com> <0965DCE8-4C48-48B7-B8C8-A406C151B588@oracle.com> Message-ID: > On Jan 19, 2016, at 1:11 PM, Christian Thalinger wrote: > >> >> On Jan 19, 2016, at 11:06 AM, Tom Rodriguez > wrote: >> >> >>> On Jan 19, 2016, at 12:57 PM, Christian Thalinger > wrote: >>> >>> >>>> On Jan 19, 2016, at 10:51 AM, Tom Rodriguez > wrote: >>>> >>>> http://cr.openjdk.java.net/~never/8147432.00-01/webrev/index.html >>>> >>>> I added a Java assert that it?s non-null plus a C++ assert in the else case. So we won?t crash in product if it returns null and turning on Java assert will report something useful. >>> >>> I am worried about corner cases which we will never see because the VM silently ignores them. >> >> That?s why we have the Java assert. > > Sure, but customers don?t run with assertions on and if the error is silently ignored we or the customer don't know to turn assertions on. But in this case the customer is a JVMCI developer. > >> This part is all informational anyway. Do we really need to do more? > > No, let?s push it as it is. Ok. My last jprt push required a couple tries to get through. Hopefully this on is smoother. tom > >> >> tom >> >>> >>>> >>>> tom >>>> >>>>> On Jan 19, 2016, at 12:44 PM, Christian Thalinger > wrote: >>>>> >>>>>> >>>>>> On Jan 19, 2016, at 10:40 AM, Tom Rodriguez > wrote: >>>>>> >>>>>> >>>>>>> On Jan 19, 2016, at 12:12 PM, Christian Thalinger > wrote: >>>>>>> >>>>>>> src/share/vm/compiler/compileBroker.cpp: >>>>>>> >>>>>>> + failure_reason = ci_env.failure_reason(); >>>>>>> + retry_message = ci_env.retry_message(); >>>>>>> ci_env.report_failure(ci_env.failure_reason()); >>>>>>> >>>>>>> Why not use failure_reason? >>>>>> >>>>>> Fewer edits? :) I?ll fix it. >>>>> >>>>> :-D >>>>> >>>>>> >>>>>>> >>>>>>> src/share/vm/jvmci/jvmciCompiler.cpp: >>>>>>> >>>>>>> + oop failure_message = CompilationRequestResult::failureMessage(result_object); >>>>>>> + if (failure_message != NULL) { >>>>>>> + const char* failure_reason = failure_message != NULL ? java_lang_String::as_utf8_string(failure_message) : "unknown reason?; >>>>>>> >>>>>>> failure_message is guaranteed to be non-null. >>>>>> >>>>>> Right. The code evolved a few times but now that test is unnecessary. >>>>>> >>>>>>> >>>>>>> + oop result_object = (oop) result.get_jobject(); >>>>>>> + if (result_object != NULL) { >>>>>>> >>>>>>> Looks like there is nothing to handle the null case. Should we? >>>>>> >>>>>> I debated on that. Maybe a Java assert in HotSpotJVMCIRuntime.compileMethod that JVMCICompiler.compileMethod always returns non-null? I don?t know that there?s anything useful we can in the C++ code if it?s null. >>>>> >>>>> Assert in Java sounds good. I was thinking about a hard-failure in C++ since it shouldn?t happen. >>>>> >>>>>> >>>>>> tom >>>>>> >>>>>>> >>>>>>>> On Jan 19, 2016, at 9:32 AM, Tom Rodriguez > wrote: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~never/8147432/webrev/index.html >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8147432 >>>>>>>> >>>>>>>> Currently JVMCI compiles either produce code or they don?t but nothing is reported for failures. This adds a new CompilationRequestResult object that can return a human readable message to be included in the normal ?COMPILE SKIPPED? style message. I?ve refactored the printing so it?s shared between compiles. The result can also include the number of inlined byte codes for use by things like CITimeEach. Additionally I removed the CompilationToVM.notifyCompilationStatistics as this was apparently a left over. Tested with specjvm and PrintCompilation which has a few OSR bailouts plus injecting some exceptions to make sure they were reported correctly. >>>>>>>> >>>>>>>> tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Tue Jan 19 23:37:29 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Jan 2016 15:37:29 -0800 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <569CE098.4030807@oracle.com> References: <569CE098.4030807@oracle.com> Message-ID: <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov wrote: > > The fix is to avoid updating recompilation count when corresponding nmethod is invalidated due to a call site target change. Although I'm not vetoing it (since it seems it will help customers in the short term), I'm uncomfortable with this fix because it doesn't scale to large dyn. lang. applications with many unstable call sites. Put another way, it feels like we are duct-taping down a failsafe switch (against infinite recompilation) in order to spam a micro-benchmark: a small number mega-mutable call sites for which we are willing to spend (potentially) all of the JIT resources, including those usually allocated to application performance in the steady state. Put a third way: I am not comfortable with unthrottled infinite recompilation as a performance strategy. I've commented on the new RFE (JDK-8147550) where to go next, including the following sentiments: > There is a serious design tension here, though: Some users apparently are willing to endure an infinite series of recompilations as part of the cost of doing business; JDK-7177745 addresses this need by turning off the fail-safe against (accidental, buggy) infinite recompilation for unstable CSs. Other users might find that having a percentage of machine time devoted to recompilation is a problem. (This has been the case in the past with non-dynamic languages, at least.) The code shape proposed in this bug report would cover all simple unstable call sites (bi-stable, for example, would compile to a bi-morphic call), but, in pathological cases (infinite sequence of distinct CS targets) would "settle down" into a code shape that would be sub-optimal for any single target, but (as an indirect MH call) reasonable for all the targets together. > > In the absence of clear direction from the user or the profile, the JVM has to choose infinite recompilation or a good-enough final compilation. The latter choice is safer. And the infinite recompilation is less safe because there is no intrinsic bound on the amount of machine cycles that could be diverted to recompilation, given a dynamic language application with enough mega-mutable CSs. Settling down to a network of indirect calls has a bounded cost. > > Yes, one size-fits-all tactics never please everybody. But the JVM should not choose tactics with unlimited downsides. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Wed Jan 20 00:44:25 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 20 Jan 2016 00:44:25 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib In-Reply-To: <5699AACF.6080608@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569DFF26@ORSMSX106.amr.corp.intel.com> <56999E04.5040207@oracle.com> <5699A3D6.6080305@oracle.com> <5699AACF.6080608@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569FECA5@ORSMSX106.amr.corp.intel.com> Hi According LIBM experts at Intel for the test cases, the data sets used in regression tests for the Intel the math library (libm). They were collected over a long period of testing various libm implementations. The data sets contain function specific data (special and corner cases such as +/-0, maximum/minimum normalized numbers, +/-infinity, QNaN/SNaN, maximum/minimum denormal numbers, arguments that would produce close to overflow/underflow results, known hard-to-round cases, etc), implementation specific data (arguments close to table look-up values for different polynomial approximations, worst cases for range reduction algorithms) and other data with interesting bit patterns. Regards, Vivek -----Original Message----- From: joe darcy [mailto:joe.darcy at oracle.com] Sent: Friday, January 15, 2016 6:29 PM To: Vladimir Kozlov; Deshpande, Vivek R Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib Ah okay; I overlooked the separate push of the tests. Thanks, -Joe On 1/15/2016 5:58 PM, Vladimir Kozlov wrote: > Note, the test was pushed together with VM changes into hs-comp repo: > > http://hg.openjdk.java.net/jdk9/hs-comp/jdk/rev/ddd59a780769 > > New sin/cos code is tested in all running modes since it is used by > Interpreter and JITed code (C1 and C2). > > I will let Vivek answer questions about the test. > > Regards, > Vladimir > > On 1/15/16 5:33 PM, Joseph D. Darcy wrote: >> Hello, >> >> Catching up on email, how were these test cases generated or chosen? >> In other words, in what sense are they corners? >> >> The data would be easier to read if the numbers were aligned by >> column (they don't appear that way in the webrev at least). >> >> What is the code coverage of the new intrinsics with this set of tests? >> >> Theses tests should not be separated from the implementation for >> long; in other words, since the new implementation has already been >> pushed to a HotSpot forest, test coverage for that new implementation >> should not lag behind. >> >> Thanks, >> >> -Joe >> >> On 12/22/2015 5:41 PM, Deshpande, Vivek R wrote: >>> HI All >>> >>> I have uploaded the patch for sin and cos tests with input and >>> allowed outputs at this location for your review. >>> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webre >>> v.00/ >>> >>> Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 >>> Thank you. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] >>> Sent: Friday, December 04, 2015 4:50 PM >>> To: Deshpande, Vivek R; Vladimir Kozlov >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> Hi Vivek, >>> >>> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >>>> Hi >>>> >>>> Sure I will add the tests. Shall I use StrictMath result as a >>>> reference for exact result. >>>> Let me know your thoughts. >>> As a rough test of another sin/cos implementation, StrictMath.{sin, >>> cos} can be used a reference with the following caveat: there isn't >>> an indication of which why the error is in a StrictMath result. Let >>> me given an example, if >>> >>> StrictMath.sin(x) => y >>> >>> then one of the following should be true >>> >>> Math.sin(x) => y >>> Math.sin(x) => Math.nextUp(y) >>> Math.sin(x) => Math.nextDown(y) >>> >>> That is, Math.sin(x) should either be the same as StrictMath.sin(x) >>> OR equal to one of the floating-point numbers adjacent to that >>> result. Of these three options, only two area allowed by the >>> accuracy requirements of the StrictMath.sin specification. However, >>> since StrictMath.sin doesn't give an indication of which way its >>> error went (if it rounded up or down), there is no indication >>> without additional work which of >>> nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin >>> isn't buggy). >>> >>> HTH, >>> >>> -Joe >>> >>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>> Sent: Thursday, December 03, 2015 1:29 PM >>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> Hello, >>>> >>>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>>>> Vivek, >>>>> >>>>> I think Joe is asking you to write these tests as hotspot >>>>> regression test in hotspot/test/compiler. >>>> Exactly; if not generally applicable sin/cos tests that could be >>>> hosted in the jdk repo (alongside the regression and unit tests for >>>> java.lang.Math), then test of intrinsics in the HotSpot repo >>>> alongside other tests targeting intrinsics. >>>> >>>> Thanks, >>>> >>>> -Joe >>>> >>>>> Vladimir >>>>> >>>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>>>> Hi Joe >>>>>> >>>>>> It would be great if you would please share the additional tests >>>>>> with us. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>> Sent: Thursday, December 03, 2015 1:17 PM >>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> I think it is unwise for this large of an implementation change >>>>>> to be pushed with no tests targeting the specifics of the new >>>>>> implementation. >>>>>> >>>>>> The worst-case tests in the jdk repo are the mathematical worst >>>>>> cases for floating-point approximations, in other words the cases >>>>>> were the exact mathematical answer is closes to half-way between >>>>>> two representation floating-point numbers. Passing such tests is >>>>>> necessary but not sufficient condition for a new implementation. >>>>>> >>>>>> Chers, >>>>>> >>>>>> -Joe >>>>>> >>>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>>>> Okay, looks reasonable to me. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev with latest hotspot >>>>>>>> source as base for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Deshpande, Vivek R >>>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>> the math lib >>>>>>>> >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>>>> To: Deshpande, Vivek R; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>> the math lib >>>>>>>> >>>>>>>> Please send link to new webrev on cr server. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi Vladimir >>>>>>>>> >>>>>>>>> Please find the webrev with your suggested updates attached >>>>>>>>> with the mail. >>>>>>>>> We will update it in the jbs entry soon. >>>>>>>>> Please let me know if it needs further changes. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Deshpande, Vivek R >>>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> HI Vladimir, Joe >>>>>>>>> >>>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>>>> have mentioned. It passed those tests. >>>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>>>> The performance gain is 3.2x over base jdk, that is over >>>>>>>>> current fsin/fcos intrinsic. This gain is more realistic. >>>>>>>>> >>>>>>>>> Could I get those tests around the boundary values. Would >>>>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>>>> If yes, then it has passed those boundary cases. >>>>>>>>> >>>>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>>>> for libm and send out the webrev soon. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Just getting added to the thread.. >>>>>>>>> >>>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>>>> Thank you, for explanation, Vivek. >>>>>>>>>> >>>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition >>>>>>>>>> to Hotspot tests. >>>>>>>>>> >>>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>>>> Hi Vladimir >>>>>>>>>>> >>>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>>>> StrictMath result and not exact result. So I added the flag >>>>>>>>>>> to switch between FDLIBM and LIBM. >>>>>>>>>>> >>>>>>>>>>> Quick explanation: >>>>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>>>> should be = 0.19457293629570216 >>>>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>>>> 0.1945729362957022 >>>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>>>> library result is between the above two values and Exact >>>>>>>>>>> result would be pretty close to it. >>>>>>>>>>> So here StrictMath result is less than quad-precision >>>>>>>>>>> result, Math result should be StrictMath or StrictMath + >>>>>>>>>>> 1ulp and not StrictMath >>>>>>>>>>> - 1ulp, according to our test. >>>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both >>>>>>>>>> direction, I >>>>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>>>> generated by JIT compilers: >>>>>>>>>> >>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html# >>>>>>>>>> sin >>>>>>>>>> % >>>>>>>>>> 28 >>>>>>>>>> do >>>>>>>>>> u >>>>>>>>>> ble%29 >>>>>>>>>> >>>>>>>>> That interpretation of the spec is not quite right. For the >>>>>>>>> Math methods with a 1/2 ulp error bound, the floating-point >>>>>>>>> result closest to the exact result must be returned. For the >>>>>>>>> methods with a >>>>>>>>> 1 ulp error bound, either of the floating-point result >>>>>>>>> bracketing the true result can be returned, subject to the >>>>>>>>> monotonicity constraints of the specification of the particular method. >>>>>>>>> >>>>>>>>>>> I have done the experiments with >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin and >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dcos. >>>>>>>>>>> With this option, the interpreter would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>> I was thinking about using existing >>>>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add >>>>>>>>>> additional versions of functions which accept intrinsic ID >>>>>>>>>> instead of methodHandle. >>>>>>>>>> >>>>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>>>> >>>>>>>>>>> Also the performance gain ~4x is with >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>>>> LIBM code and compilers use FDLIB? >>>>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>>>> >>>>>>>>> I'm part way through porting the FDLIBM code to Java >>>>>>>>> (JDK-8134780: >>>>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>>>> boost to the StrictMath methods that have been ported. >>>>>>>>> >>>>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>>>> testing. >>>>>>>>> For example, part of patch says >>>>>>>>> >>>>>>>>> # For sin >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> # For cos >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> If nothing else, there are no tests at around those boundary >>>>>>>>> values, which is unacceptable. There should also be some tests >>>>>>>>> of values of interest to the algorithm in question. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> -Joe >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>>>> questions and give more data if needed. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Vivek >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>>>> To: Deshpande, Vivek R; >>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>>>> the math lib >>>>>>>>>>> >>>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>>>> log() changes did not have flags. >>>>>>>>>>>> >>>>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>>>> >>>>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>>>> Hi Vivek, >>>>>>>>>>> >>>>>>>>>>> I want to point that you can do this experiment later. We >>>>>>>>>>> can file bugs and fixed them after FC. >>>>>>>>>>> >>>>>>>>>>> For now, please, answer my question about flags only. This >>>>>>>>>>> is the only thing holding it from push. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>>>> Hi all >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to contribute a patch which optimizes >>>>>>>>>>>>> Math.sin() and >>>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel >>>>>>>>>>>>> LIBM >>>>>>>>>>>>> implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin >>>>>>>>>>>>> and cos. >>>>>>>>>>>>> >>>>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>>>> >>>>>>>>>>>>> Bug-id: >>>>>>>>>>>>> >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>>>> webrev: >>>>>>>>>>>>> >>>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Vivek >>>>>>>>>>>>> >> From christian.thalinger at oracle.com Wed Jan 20 00:45:00 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Jan 2016 14:45:00 -1000 Subject: RFR(XS) : 8141557 : TestResolvedJavaMethod.java times out after 1000 ms In-Reply-To: <52CAB89A-1AA5-4545-9C4B-DD2A6880E463@oracle.com> References: <52CAB89A-1AA5-4545-9C4B-DD2A6880E463@oracle.com> Message-ID: <3DDA7A22-74CF-400A-A403-9CE70655ABD5@oracle.com> I suppose TestAnnotionation is a typo? + @TestAnnotionation(1000L) Could you change that to value = 1000L? Just for extra clarity. Then it looks good. > On Jan 19, 2016, at 11:06 AM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev/8141557/webrev.00/ >> 22 lines changed: 16 ins; 0 del; 6 mod; > > Hi all, > > Could you please review the fix for 8141557? > > The test uses timeout value of org.junit.Test to test reading annotation via JVMCI. In some cases, e.g. on embedded platforms, debug builds or w/ extra vm flags like -Xcomp, 1000ms isn?t enough for the test to complete, and since jtreg doesn?t apply timeout factor for junit/testng timeouts (CODETOOLS-7901567) the test timeouts despite the fact of increased timeout factor. > > The fix changes the test to use a separate annotation, which doesn?t affect test execution, and remove timeout value (which means no timeout). > > JBS: https://bugs.openjdk.java.net/browse/JDK-8141557 > testing: locally > > Thanks, > Igor From vivek.r.deshpande at intel.com Wed Jan 20 00:48:41 2016 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Wed, 20 Jan 2016 00:48:41 +0000 Subject: RFR (M): 8143353: Update for x86 sin and cos in the math lib References: <53E8E64DB2403849AFD89B7D4DAC8B2A568ED1AC@ORSMSX106.amr.corp.intel.com> <564F80F7.5050605@oracle.com> <56535CC7.6020702@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F03BE@ORSMSX106.amr.corp.intel.com> <5653B9AF.7060306@oracle.com> <5653CB17.2020308@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A568F26AD@ORSMSX106.amr.corp.intel.com> <565E520B.8060801@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CE99C@ORSMSX106.amr.corp.intel.com> <5660AEB6.8060007@oracle.com> <5660B13B.1020907@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CECB1@ORSMSX106.amr.corp.intel.com> <5660B345.8010905@oracle.com> <5660B40D.4050800@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569CED5A@ORSMSX106.amr.corp.intel.com> <566234C6.8010806@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A569DFF26@ORSMSX106.amr.corp.intel.com> <56999E04.5040207@oracle.com> <5699A3D6.6080305@oracle.com> <5699AACF.6080608@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A569FECB4@ORSMSX106.amr.corp.intel.com> Hi All Forgot to add in the earlier reply, The reference values are computed with Maple and were converted into hexadecimal format. Regards, Vivek -----Original Message----- From: Deshpande, Vivek R Sent: Tuesday, January 19, 2016 4:44 PM To: 'joe darcy'; Vladimir Kozlov Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in the math lib Hi According LIBM experts at Intel for the test cases, the data sets used in regression tests for the Intel the math library (libm). They were collected over a long period of testing various libm implementations. The data sets contain function specific data (special and corner cases such as +/-0, maximum/minimum normalized numbers, +/-infinity, QNaN/SNaN, maximum/minimum denormal numbers, arguments that would produce close to overflow/underflow results, known hard-to-round cases, etc), implementation specific data (arguments close to table look-up values for different polynomial approximations, worst cases for range reduction algorithms) and other data with interesting bit patterns. Regards, Vivek -----Original Message----- From: joe darcy [mailto:joe.darcy at oracle.com] Sent: Friday, January 15, 2016 6:29 PM To: Vladimir Kozlov; Deshpande, Vivek R Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the math lib Ah okay; I overlooked the separate push of the tests. Thanks, -Joe On 1/15/2016 5:58 PM, Vladimir Kozlov wrote: > Note, the test was pushed together with VM changes into hs-comp repo: > > http://hg.openjdk.java.net/jdk9/hs-comp/jdk/rev/ddd59a780769 > > New sin/cos code is tested in all running modes since it is used by > Interpreter and JITed code (C1 and C2). > > I will let Vivek answer questions about the test. > > Regards, > Vladimir > > On 1/15/16 5:33 PM, Joseph D. Darcy wrote: >> Hello, >> >> Catching up on email, how were these test cases generated or chosen? >> In other words, in what sense are they corners? >> >> The data would be easier to read if the numbers were aligned by >> column (they don't appear that way in the webrev at least). >> >> What is the code coverage of the new intrinsics with this set of tests? >> >> Theses tests should not be separated from the implementation for >> long; in other words, since the new implementation has already been >> pushed to a HotSpot forest, test coverage for that new implementation >> should not lag behind. >> >> Thanks, >> >> -Joe >> >> On 12/22/2015 5:41 PM, Deshpande, Vivek R wrote: >>> HI All >>> >>> I have uploaded the patch for sin and cos tests with input and >>> allowed outputs at this location for your review. >>> http://cr.openjdk.java.net/~vdeshpande/libm_sincos/8143353/jdk/webre >>> v.00/ >>> >>> Bug ID: https://bugs.openjdk.java.net/browse/JDK-8143353 >>> Thank you. >>> >>> Regards, >>> Vivek >>> >>> -----Original Message----- >>> From: Joseph D. Darcy [mailto:joe.darcy at oracle.com] >>> Sent: Friday, December 04, 2015 4:50 PM >>> To: Deshpande, Vivek R; Vladimir Kozlov >>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>> math lib >>> >>> Hi Vivek, >>> >>> On 12/3/2015 2:01 PM, Deshpande, Vivek R wrote: >>>> Hi >>>> >>>> Sure I will add the tests. Shall I use StrictMath result as a >>>> reference for exact result. >>>> Let me know your thoughts. >>> As a rough test of another sin/cos implementation, StrictMath.{sin, >>> cos} can be used a reference with the following caveat: there isn't >>> an indication of which why the error is in a StrictMath result. Let >>> me given an example, if >>> >>> StrictMath.sin(x) => y >>> >>> then one of the following should be true >>> >>> Math.sin(x) => y >>> Math.sin(x) => Math.nextUp(y) >>> Math.sin(x) => Math.nextDown(y) >>> >>> That is, Math.sin(x) should either be the same as StrictMath.sin(x) >>> OR equal to one of the floating-point numbers adjacent to that >>> result. Of these three options, only two area allowed by the >>> accuracy requirements of the StrictMath.sin specification. However, >>> since StrictMath.sin doesn't give an indication of which way its >>> error went (if it rounded up or down), there is no indication >>> without additional work which of >>> nextUp(y) and nextDown(y) is allowable (assuming StrictMath.sin >>> isn't buggy). >>> >>> HTH, >>> >>> -Joe >>> >>> >>>> Regards, >>>> Vivek >>>> >>>> -----Original Message----- >>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>> Sent: Thursday, December 03, 2015 1:29 PM >>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>> math lib >>>> >>>> Hello, >>>> >>>> On 12/3/2015 1:25 PM, Vladimir Kozlov wrote: >>>>> Vivek, >>>>> >>>>> I think Joe is asking you to write these tests as hotspot >>>>> regression test in hotspot/test/compiler. >>>> Exactly; if not generally applicable sin/cos tests that could be >>>> hosted in the jdk repo (alongside the regression and unit tests for >>>> java.lang.Math), then test of intrinsics in the HotSpot repo >>>> alongside other tests targeting intrinsics. >>>> >>>> Thanks, >>>> >>>> -Joe >>>> >>>>> Vladimir >>>>> >>>>> On 12/3/15 1:22 PM, Deshpande, Vivek R wrote: >>>>>> Hi Joe >>>>>> >>>>>> It would be great if you would please share the additional tests >>>>>> with us. >>>>>> >>>>>> Regards, >>>>>> Vivek >>>>>> >>>>>> -----Original Message----- >>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>> Sent: Thursday, December 03, 2015 1:17 PM >>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in the >>>>>> math lib >>>>>> >>>>>> I think it is unwise for this large of an implementation change >>>>>> to be pushed with no tests targeting the specifics of the new >>>>>> implementation. >>>>>> >>>>>> The worst-case tests in the jdk repo are the mathematical worst >>>>>> cases for floating-point approximations, in other words the cases >>>>>> were the exact mathematical answer is closes to half-way between >>>>>> two representation floating-point numbers. Passing such tests is >>>>>> necessary but not sufficient condition for a new implementation. >>>>>> >>>>>> Chers, >>>>>> >>>>>> -Joe >>>>>> >>>>>> On 12/3/2015 1:05 PM, Vladimir Kozlov wrote: >>>>>>> Okay, looks reasonable to me. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 12/3/15 11:06 AM, Deshpande, Vivek R wrote: >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev with latest hotspot >>>>>>>> source as base for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.03/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Deshpande, Vivek R >>>>>>>> Sent: Wednesday, December 02, 2015 10:33 PM >>>>>>>> To: 'Vladimir Kozlov'; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>> the math lib >>>>>>>> >>>>>>>> Hi Vladimir >>>>>>>> >>>>>>>> This is the link for the updated webrev for your review. >>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.02/ >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vivek >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>> Sent: Tuesday, December 01, 2015 6:06 PM >>>>>>>> To: Deshpande, Vivek R; joe darcy >>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>> the math lib >>>>>>>> >>>>>>>> Please send link to new webrev on cr server. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 11/25/15 5:16 PM, Deshpande, Vivek R wrote: >>>>>>>>> Hi Vladimir >>>>>>>>> >>>>>>>>> Please find the webrev with your suggested updates attached >>>>>>>>> with the mail. >>>>>>>>> We will update it in the jbs entry soon. >>>>>>>>> Please let me know if it needs further changes. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Deshpande, Vivek R >>>>>>>>> Sent: Tuesday, November 24, 2015 10:22 AM >>>>>>>>> To: 'joe darcy'; Vladimir Kozlov >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: RE: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> HI Vladimir, Joe >>>>>>>>> >>>>>>>>> I have done the jtreg tests in hotspot and tests from jdk you >>>>>>>>> have mentioned. It passed those tests. >>>>>>>>> The ~4x gain is with XX:+UnlockDiagnosticVMOptions >>>>>>>>> -XX:DisableIntrinsic=_dsin/_dcos over without that option. >>>>>>>>> The performance gain is 3.2x over base jdk, that is over >>>>>>>>> current fsin/fcos intrinsic. This gain is more realistic. >>>>>>>>> >>>>>>>>> Could I get those tests around the boundary values. Would >>>>>>>>> WorstCaseTests.java jtreg test in jdk test those ? >>>>>>>>> If yes, then it has passed those boundary cases. >>>>>>>>> >>>>>>>>> I would work on adding either diagnostic flag or just one flag >>>>>>>>> for libm and send out the webrev soon. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vivek >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: joe darcy [mailto:joe.darcy at oracle.com] >>>>>>>>> Sent: Monday, November 23, 2015 6:28 PM >>>>>>>>> To: Vladimir Kozlov; Deshpande, Vivek R >>>>>>>>> Cc: Viswanathan, Sandhya; Berg, Michael C; hotspot compiler >>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>> the math lib >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Just getting added to the thread.. >>>>>>>>> >>>>>>>>> On 11/23/2015 5:13 PM, Vladimir Kozlov wrote: >>>>>>>>>> Thank you, for explanation, Vivek. >>>>>>>>>> >>>>>>>>>> Please, run jdk/test/java/lang/Math/ jtreg tests in addition >>>>>>>>>> to Hotspot tests. >>>>>>>>>> >>>>>>>>>> On 11/23/15 12:24 PM, Deshpande, Vivek R wrote: >>>>>>>>>>> Hi Vladimir >>>>>>>>>>> >>>>>>>>>>> The result we obtain with LIBM are within +/- 1ulp from >>>>>>>>>>> StrictMath result and not exact result. So I added the flag >>>>>>>>>>> to switch between FDLIBM and LIBM. >>>>>>>>>>> >>>>>>>>>>> Quick explanation: >>>>>>>>>>> This is what we observed with comparison to HPA Library >>>>>>>>>>> (http://www.nongnu.org/hpalib/) explained with an example. >>>>>>>>>>> LIBM Observed Math result=0.19457293629570213 >>>>>>>>>>> (4596178249117717083L) (StrictMath - 1ulp) Required result >>>>>>>>>>> should be = 0.19457293629570216 >>>>>>>>>>> (4596178249117717084L) (StrictMath result) or >>>>>>>>>>> 0.1945729362957022 >>>>>>>>>>> (4596178249117717085L) (StrictMath + 1ulp.) This means HPA >>>>>>>>>>> library result is between the above two values and Exact >>>>>>>>>>> result would be pretty close to it. >>>>>>>>>>> So here StrictMath result is less than quad-precision >>>>>>>>>>> result, Math result should be StrictMath or StrictMath + >>>>>>>>>>> 1ulp and not StrictMath >>>>>>>>>>> - 1ulp, according to our test. >>>>>>>>>> Note, java.lang.Math allows to have 1ulp off (in both >>>>>>>>>> direction, I >>>>>>>>>> think) and it should be consistent for Interpreter and code >>>>>>>>>> generated by JIT compilers: >>>>>>>>>> >>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html# >>>>>>>>>> sin >>>>>>>>>> % >>>>>>>>>> 28 >>>>>>>>>> do >>>>>>>>>> u >>>>>>>>>> ble%29 >>>>>>>>>> >>>>>>>>> That interpretation of the spec is not quite right. For the >>>>>>>>> Math methods with a 1/2 ulp error bound, the floating-point >>>>>>>>> result closest to the exact result must be returned. For the >>>>>>>>> methods with a >>>>>>>>> 1 ulp error bound, either of the floating-point result >>>>>>>>> bracketing the true result can be returned, subject to the >>>>>>>>> monotonicity constraints of the specification of the particular method. >>>>>>>>> >>>>>>>>>>> I have done the experiments with >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin and >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dcos. >>>>>>>>>>> With this option, the interpreter would go through LIBM and C1 and c2 through FDLIBM. >>>>>>>>>>> If we want to disable LIBM completely, we need the flags >>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>> I was thinking about using existing >>>>>>>>>> DirectiveSet::is_intrinsic_disabled() and >>>>>>>>>> vmIntrinsics::is_disabled_by_flags(). You need to add >>>>>>>>>> additional versions of functions which accept intrinsic ID >>>>>>>>>> instead of methodHandle. >>>>>>>>>> >>>>>>>>>> If you still want to use flags make them diagnostic. >>>>>>>>>> Or have one flag for all LIBM intrinsics -XX:+UseLibmIntrinsic. >>>>>>>>>> >>>>>>>>>>> Also the performance gain ~4x is with >>>>>>>>>>> XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_dsin/_dcos. >>>>>>>>>> You confused me here. So you get 4x when only Interpreter use >>>>>>>>>> LIBM code and compilers use FDLIB? >>>>>>>>> Just to be clear, are you comparing the new code to FDLIBM >>>>>>>>> (StrictMath) or to the existing fsin/fcos instrinsics (Math)? >>>>>>>>> >>>>>>>>> I'm part way through porting the FDLIBM code to Java >>>>>>>>> (JDK-8134780: >>>>>>>>> Port fdlibm to Java), which is providing a significant speed >>>>>>>>> boost to the StrictMath methods that have been ported. >>>>>>>>> >>>>>>>>> I find the current patch *insufficient* as-is in terms of its >>>>>>>>> testing. >>>>>>>>> For example, part of patch says >>>>>>>>> >>>>>>>>> # For sin >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> # For cos >>>>>>>>> >>>>>>>>> +// This means that the main path is actually only taken for >>>>>>>>> +// 2^-252 <= |X| < 90112. >>>>>>>>> >>>>>>>>> If nothing else, there are no tests at around those boundary >>>>>>>>> values, which is unacceptable. There should also be some tests >>>>>>>>> of values of interest to the algorithm in question. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> -Joe >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>>> Let me know your thoughts on this. I would answer more >>>>>>>>>>> questions and give more data if needed. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Vivek >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>>>>>>>>> Sent: Monday, November 23, 2015 10:37 AM >>>>>>>>>>> To: Deshpande, Vivek R; >>>>>>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>>>> Subject: Re: RFR (M): 8143353: Update for x86 sin and cos in >>>>>>>>>>> the math lib >>>>>>>>>>> >>>>>>>>>>> On 11/20/15 12:22 PM, Vladimir Kozlov wrote: >>>>>>>>>>>> What is the reason you decided to add new flags? exp() and >>>>>>>>>>>> log() changes did not have flags. >>>>>>>>>>>> >>>>>>>>>>>> It would be interesting to see what happens if you disable >>>>>>>>>>>> intrinsics using existing flag, for example: >>>>>>>>>>>> >>>>>>>>>>>> -XX:+UnlockDiagnosticVMOptions >>>>>>>>>>>> -XX:DisableIntrinsic=_dexp >>>>>>>>>>> Hi Vivek, >>>>>>>>>>> >>>>>>>>>>> I want to point that you can do this experiment later. We >>>>>>>>>>> can file bugs and fixed them after FC. >>>>>>>>>>> >>>>>>>>>>> For now, please, answer my question about flags only. This >>>>>>>>>>> is the only thing holding it from push. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vladimir >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> On 11/20/15 12:03 PM, Deshpande, Vivek R wrote: >>>>>>>>>>>>> Hi all >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to contribute a patch which optimizes >>>>>>>>>>>>> Math.sin() and >>>>>>>>>>>>> Math.cos() for 64 and 32 bit X86 architecture using Intel >>>>>>>>>>>>> LIBM >>>>>>>>>>>>> implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> The improvement gives ~4.25x gain over base for both sin >>>>>>>>>>>>> and cos. >>>>>>>>>>>>> >>>>>>>>>>>>> The option to use the optimizations are >>>>>>>>>>>>> -XX:+UseLibmSinIntrinsic and -XX:+UseLibmCosIntrinsic. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please review and sponsor this patch. >>>>>>>>>>>>> >>>>>>>>>>>>> Bug-id: >>>>>>>>>>>>> >>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8143353 >>>>>>>>>>>>> webrev: >>>>>>>>>>>>> >>>>>>>>>>>>> http://cr.openjdk.java.net/~mcberg/8143353/webrev.01/ >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Vivek >>>>>>>>>>>>> >> From christian.thalinger at oracle.com Wed Jan 20 00:49:00 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 19 Jan 2016 14:49:00 -1000 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> References: <569CE098.4030807@oracle.com> <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> Message-ID: <63C0FC07-E4C8-4967-A3D3-7083D1B3B7E3@oracle.com> > On Jan 19, 2016, at 1:37 PM, John Rose wrote: > > On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov > wrote: >> >> The fix is to avoid updating recompilation count when corresponding nmethod is invalidated due to a call site target change. > > Although I'm not vetoing it (since it seems it will help customers in the short term), I'm uncomfortable with this fix because it doesn't scale to large dyn. lang. applications with many unstable call sites. Put another way, it feels like we are duct-taping down a failsafe switch (against infinite recompilation) in order to spam a micro-benchmark: a small number mega-mutable call sites for which we are willing to spend (potentially) all of the JIT resources, including those usually allocated to application performance in the steady state. Put a third way: I am not comfortable with unthrottled infinite recompilation as a performance strategy. Having a deja-vu... https://bugs.openjdk.java.net/browse/JDK-7087838 > > I've commented on the new RFE (JDK-8147550) where to go next, including the following sentiments: > >> There is a serious design tension here, though: Some users apparently are willing to endure an infinite series of recompilations as part of the cost of doing business; JDK-7177745 addresses this need by turning off the fail-safe against (accidental, buggy) infinite recompilation for unstable CSs. Other users might find that having a percentage of machine time devoted to recompilation is a problem. (This has been the case in the past with non-dynamic languages, at least.) The code shape proposed in this bug report would cover all simple unstable call sites (bi-stable, for example, would compile to a bi-morphic call), but, in pathological cases (infinite sequence of distinct CS targets) would "settle down" into a code shape that would be sub-optimal for any single target, but (as an indirect MH call) reasonable for all the targets together. >> >> In the absence of clear direction from the user or the profile, the JVM has to choose infinite recompilation or a good-enough final compilation. The latter choice is safer. And the infinite recompilation is less safe because there is no intrinsic bound on the amount of machine cycles that could be diverted to recompilation, given a dynamic language application with enough mega-mutable CSs. Settling down to a network of indirect calls has a bounded cost. >> >> Yes, one size-fits-all tactics never please everybody. But the JVM should not choose tactics with unlimited downsides. > > ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Jan 20 07:35:27 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 20 Jan 2016 08:35:27 +0100 Subject: RFR(XS): 8147386: assert(size == calc_size) failed: incorrect size calculattion x86_32.ad In-Reply-To: References: <9CE48190-9B0F-4571-937D-5F4162EA5296@oracle.com> <569E7E8A.5080809@oracle.com> Message-ID: <840AB1B7-C249-44BB-BC1D-1540A8EEA24C@oracle.com> Thanks for the reviews Vladimir, Vladimir and Michael. Roland. From andreas.eriksson at oracle.com Wed Jan 20 09:26:16 2016 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Wed, 20 Jan 2016 10:26:16 +0100 Subject: RFR(S): 8146096: [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: <569EA0A9.8050406@oracle.com> References: <569E2CDE.3060805@oracle.com> <569E3109.8090107@oracle.com> <569EA0A9.8050406@oracle.com> Message-ID: <569F52B8.7020802@oracle.com> Vladimir Kozlov and Vladimir Ivanov, Ok, I'll look into using the whitebox api to fix the test. Thanks for looking at this. - Andreas On 2016-01-19 21:46, Vladimir Kozlov wrote: > Simple use timeout to check for generated safepoint is bad idea. It is > very inaccurate. At least you need to check call stack to see if it > stopped in compiled method. > I would prefer to see WB new interface which would check that loop > SafePointNode is generated during compilation of method. It will be > precise. > > And we need such tests to make sure a feature is working - we can't > remove them. > > Thanks, > Vladimir > > On 1/19/16 4:50 AM, Vladimir Ivanov wrote: >> As an idea to improve the test: spawn a thread which executes the >> counted loop and then use WhiteBox.forceSafepoint() to >> trigger a safepoint. >> >> If the test times out, it means there's no safepoint in the loop. >> >> Also, it also simplifies the implementation - no need to spawn a >> child process, the check can be done in-process. >> >> Best regards, >> Vladimir Ivanov >> >> On 1/19/16 3:32 PM, Andreas Eriksson wrote: >>> Hi, >>> >>> Can I please have a review for the removal of >>> hotspot/test/compiler/loopopts/UseCountedLoopSafepoints.java. >>> >>> The test needs to do a loop that takes more than two seconds to execute >>> fully without doing a safepointing call. For this expensive atomic >>> operations were used. The problem is that on certain embedded platforms >>> they are too expensive, and the test times out. >>> The loop length could probably be reduced, and it should still work on >>> faster machines. However, the test is not very useful, so I think it's >>> better to just remove it to avoid future problems. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8146096 >>> Test to be removed: >>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/d84a55e7aaf8/test/compiler/loopopts/UseCountedLoopSafepoints.java >>> >>> >>> (I can create a webrev if you think it necessary.) >>> >>> Thanks, >>> Andreas From aph at redhat.com Wed Jan 20 09:47:11 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 20 Jan 2016 09:47:11 +0000 Subject: [aarch64-port-dev ] RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change In-Reply-To: References: Message-ID: <569F579F.4060906@redhat.com> On 19/01/16 18:57, Volker Simonis wrote: > The change also affects aarch64 (although it is minimal and I don't > expect it to break anything) so I cc-ed aarch64-port-dev. > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ > https://bugs.openjdk.java.net/browse/JDK-8145336 That's find by us. We only defined UseSSE42Intrinsics in order to get the String.indexOf intrinsic. Of course, we should really have done it some other way but we were working on our own outside the main HotSpot tree. Andrew. From volker.simonis at gmail.com Wed Jan 20 10:20:27 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 20 Jan 2016 11:20:27 +0100 Subject: [aarch64-port-dev ] RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change In-Reply-To: <569F579F.4060906@redhat.com> References: <569F579F.4060906@redhat.com> Message-ID: Hi Andrew, thanks for looking at it. Regards, Volker On Wed, Jan 20, 2016 at 10:47 AM, Andrew Haley wrote: > On 19/01/16 18:57, Volker Simonis wrote: >> The change also affects aarch64 (although it is minimal and I don't >> expect it to break anything) so I cc-ed aarch64-port-dev. >> >> http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ >> https://bugs.openjdk.java.net/browse/JDK-8145336 > > That's find by us. We only defined UseSSE42Intrinsics in order to get > the String.indexOf intrinsic. Of course, we should really have done it > some other way but we were working on our own outside the main HotSpot > tree. > > Andrew. > From vladimir.x.ivanov at oracle.com Wed Jan 20 11:23:47 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 20 Jan 2016 14:23:47 +0300 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <569E9E69.4070202@oracle.com> References: <569CE098.4030807@oracle.com> <569E9E69.4070202@oracle.com> Message-ID: <569F6E43.4080801@oracle.com> Thanks for review, Vladimir. I decided to remove the field declaration because SA doesn't access it it. Best regards, Vladimir Ivanov On 1/19/16 11:36 PM, Vladimir Kozlov wrote: > Looks fine but in vmStructs.cpp you should replace the field declaration > instead of just removing old one. > Also look if SA access it. > > Thanks, > Vladimir > > On 1/18/16 4:54 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/7177745/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-7177745 >> >> JVM aggressively inlines through CallSites, even for mutable and >> volatile flavors. It's the key optimization for making >> invokedynamic performant. >> >> When a CallSite.target is updated, JVM invalidates all affected >> nmethods and try to recompile them later. If a call site >> target regularly changes, JVM will eventually mark (after >> PerMethodRecompilationCutoff invalidations) all hot methods >> which have the call site bound as non-compilable. It leads to >> significant peak performance reduction, because all >> affected methods will always be executed in interpreter mode since then. >> >> The fix is to avoid updating recompilation count when corresponding >> nmethod is invalidated due to a call site target >> change. >> >> I filed a separate RFE (JDK-8147550 [1]) to consider slow non-inlined >> code shape for unstable call sites, as John >> suggested [2]. >> >> Testing: regression test, octane, JPRT. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8147550 >> [2] >> https://bugs.openjdk.java.net/browse/JDK-7177745?focusedCommentId=13821545&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13821545 >> >> From martin.doerr at sap.com Wed Jan 20 11:43:40 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 20 Jan 2016 11:43:40 +0000 Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change In-Reply-To: References: Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228C120@DEWDFEMB19C.global.corp.sap> Hi Volker, thank you very much for adapting the non-CompactStrings version of the intrinsics. I especially like that you changed shared code to improve matching of special cases. Here are some minor change requests: - I guess you will have to adapt Copyright messages. - There's a typo in the new comment in library_call: "optimzed". - The comment for the instruction count (used for loop alignment) is wrong in MacroAssembler::string_indexof_1 (should start with 3 instead of 2). I have more change requests regarding ppc.ad: The computation of chr is incorrect for little endian in string_indexOf_imm1_char and string_indexOf_imm1. Some numbers for compute_padding should be adapted: int string_indexOf_imm1_charNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } int string_indexOfCharNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } int string_compareNode::compute_padding(int current_offset) const { return (2*4-current_offset)&31; } Some kill effects are missing: - ctr in all string_indexOf nodes - cr0, cr1 in string_indexOf_imm1, string_indexOfChar The new comment for string_indexOfChar claims "// Kill ... needle" which is not true. Thanks, Martin -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Volker Simonis Sent: Dienstag, 19. Januar 2016 19:57 To: hotspot compiler Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change Hi, can somebody please review and sponsor this change. Despite the bug summary, I still had to do some small shared changes to make this work, so unfortunately I can not push this on my own. The change also affects aarch64 (although it is minimal and I don't expect it to break anything) so I cc-ed aarch64-port-dev. http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ https://bugs.openjdk.java.net/browse/JDK-8145336 As described in the bug, this change only fixes the string intrinsics for the -XX:-UseCompactStrings mode which is still the default on ppc64. Additionally, support for the new StrIndexOfChar intrinsic was added because we already had a similar intrinsic for constant string needles of length one anyway. A later change (which we're already working on) will add the intrinsics which can handle compact strings. The current intrinsics can handle both, the new byte-array based string representation as well as the old char-array based string representation because we internally still use the new hotspot with older versions of the class libraries. I've also ported some of our internal string tests into a new regression test (TestStringIntrinsics2.java) because the existing tests didn't exercise all of our intrinsics. Following the shared changes I had to do: Until now, UseSSE42Intrinsics was a global shared option which was used to control the availability of the stringIndexOf intrinsics. But UseSSE42Intrinsics is actually a x86-specific feature so it doesn't make a lot of sense to define it for other architectures. I've therefore moved the flag to globals_x86.hpp and changed the condition which checks for the ability of the stringIndexOf intrinsics from: if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { to: if (!Matcher::match_rule_supported(Op_StrIndexOf)) { The Matcher::match_rule_supported() method already calls Matcher::has_match_rule() anyway. And it is implemented in the .ad file so I've moved the check for UseSSE42Intrinsics into x86.ad. Other platforms can now decide in their .ad file if they unconditionally support the intrinsic or if they need a special feature check. This change was already briefly discussed in [1]. The other shared change I had to make was in LibraryCallKit::make_string_method_node() for the "Op_StrEquals" case. We have optimized intrinsics for the case that one of the strings to compare is constant, but the StrEqualsNode is constructed without taking into account that one of the string length values could be a constant. This prevented our optimized instruction from being matched in the ad-file. All the other changes are ppc-specific. Thank you and best regards, Volker [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/thread.html#20400 From vladimir.x.ivanov at oracle.com Wed Jan 20 11:54:00 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 20 Jan 2016 14:54:00 +0300 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> References: <569CE098.4030807@oracle.com> <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> Message-ID: <569F7558.1030800@oracle.com> John, Chris, thanks for the feedback. I don't think it is only about microbenchmarks. Long-running large applications with lots of mutable call sites should also benefit for this change. Current JVM behavior counts invalidations on root method, so nmethods with multiple mutable call sites (from root & all inlined callees) are more likely to hit the limit, even if there's no mega-mutable sites. It just sums up and PerMethodRecompilationCutoff (= 400, by default) doesn't look like a huge number. Also, LambdaForm sharing somewhat worsen the situation. When LambdaForms were mostly customized, different method handle chains were compiled into a single nmethod. Right now, it means that not only the root method is always interpreted, but all bound method handle chains are broken into numerous per-LF nmethods (see JDK-8069591 for some details). MLVM folks, I'd like to hear your opinion about what kind of behavior do you expect from JVM w.r.t. mutable call sites. There are valid use-cases when JVM shouldn't throttle the recompilation (e.g., long-running application with indy-based dynamic tracing). Maybe there's a place for a new CallSite flavor to clearly communicate application expectations to the JVM? Either always recompile (thus eventually reaching peak performance) or give up and generate less efficient machine code, but save on possible recompilations. Best regards, Vladimir Ivanov On 1/20/16 2:37 AM, John Rose wrote: > On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov > > wrote: >> >> The fix is to avoid updating recompilation count when corresponding >> nmethod is invalidated due to a call site target change. > > Although I'm not vetoing it (since it seems it will help customers in > the short term), I'm uncomfortable with this fix because it doesn't > scale to large dyn. lang. applications with many unstable call sites. > Put another way, it feels like we are duct-taping down a failsafe > switch (against infinite recompilation) in order to spam a > micro-benchmark: a small number mega-mutable call sites for which we > are willing to spend (potentially) all of the JIT resources, including > those usually allocated to application performance in the steady state. > Put a third way: I am not comfortable with unthrottled infinite > recompilation as a performance strategy. > > I've commented on the new RFE (JDK-8147550) where to go next, including > the following sentiments: > >> There is a serious design tension here, though: Some users apparently >> are willing to endure an infinite series of recompilations as part of >> the cost of doing business; JDK-7177745 addresses this need by turning >> off the fail-safe against (accidental, buggy) infinite recompilation >> for unstable CSs. Other users might find that having a percentage of >> machine time devoted to recompilation is a problem. (This has been the >> case in the past with non-dynamic languages, at least.) The code shape >> proposed in this bug report would cover all simple unstable call >> sites (bi-stable, for example, would compile to a bi-morphic call), >> but, in pathological cases (infinite sequence of distinct CS targets) >> would "settle down" into a code shape that would be sub-optimal for >> any single target, but (as an indirect MH call) reasonable for all the >> targets together. >> >> In the absence of clear direction from the user or the profile, the >> JVM has to choose infinite recompilation or a good-enough final >> compilation. The latter choice is safer. And the >> infinite recompilation is less safe because there is no intrinsic >> bound on the amount of machine cycles that could be diverted to >> recompilation, given a dynamic language application with >> enough mega-mutable CSs. Settling down to a network of indirect calls >> has a bounded cost. >> >> Yes, one size-fits-all tactics never please everybody. But the JVM >> should not choose tactics with unlimited downsides. > > ? John From forax at univ-mlv.fr Wed Jan 20 12:13:29 2016 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 20 Jan 2016 13:13:29 +0100 (CET) Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> References: <569CE098.4030807@oracle.com> <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> Message-ID: <2036838501.1079316.1453292009390.JavaMail.zimbra@u-pem.fr> Hi John, I understand that having the VM that may always recompile may be seen as a bug, but having the VM that bailout and stop recompiling, or more generally change the compilation strategy is a bug too. The problem here is that there is no way from the point of view of a dyn lang runtime to know what will be the behavior of the VM for a callsite if the VM decide to stop to recompile, decide to not inline, decide to inline some part of the tree, etc. Said differently, using an invokedynamic allows to create code shapes that will change dynamically, if the VM behavior also changes dynamically, it's like building a wall on moving parts, the result is strange dynamic behaviors that are hard to diagnose and reproduce. The recompilation behavior of the VM should be keep simple and predicatable, basically, the VM should always recompile the CS with no failsafe switch. If dyn lang runtime devs have trouble with that, they can already use an exactInvoker to simulate an indirect mh call and we can even provide new method handle combiners to gracefully handle multi-stable CS. regards, R?mi ----- Mail original ----- > De: "John Rose" > ?: "Vladimir Ivanov" > Cc: "hotspot compiler" > Envoy?: Mercredi 20 Janvier 2016 00:37:29 > Objet: Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause > target method to always run in interpreter mode > On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com > > wrote: > > The fix is to avoid updating recompilation count when corresponding nmethod > > is invalidated due to a call site target change. > > Although I'm not vetoing it (since it seems it will help customers in the > short term), I'm uncomfortable with this fix because it doesn't scale to > large dyn. lang. applications with many unstable call sites. Put another > way, it feels like we are duct-taping down a failsafe switch (against > infinite recompilation) in order to spam a micro-benchmark: a small number > mega-mutable call sites for which we are willing to spend (potentially) all > of the JIT resources, including those usually allocated to application > performance in the steady state. Put a third way: I am not comfortable with > unthrottled infinite recompilation as a performance strategy. > I've commented on the new RFE (JDK-8147550) where to go next, including the > following sentiments: > > There is a serious design tension here, though: Some users apparently are > > willing to endure an infinite series of recompilations as part of the cost > > of doing business; JDK-7177745 addresses this need by turning off the > > fail-safe against (accidental, buggy) infinite recompilation for unstable > > CSs. Other users might find that having a percentage of machine time > > devoted > > to recompilation is a problem. (This has been the case in the past with > > non-dynamic languages, at least.) The code shape proposed in this bug > > report > > would cover all simple unstable call sites (bi-stable, for example, would > > compile to a bi-morphic call), but, in pathological cases (infinite > > sequence > > of distinct CS targets) would "settle down" into a code shape that would be > > sub-optimal for any single target, but (as an indirect MH call) reasonable > > for all the targets together. > > > In the absence of clear direction from the user or the profile, the JVM has > > to choose infinite recompilation or a good-enough final compilation. The > > latter choice is safer. And the infinite recompilation is less safe because > > there is no intrinsic bound on the amount of machine cycles that could be > > diverted to recompilation, given a dynamic language application with enough > > mega-mutable CSs. Settling down to a network of indirect calls has a > > bounded > > cost. > > > Yes, one size-fits-all tactics never please everybody. But the JVM should > > not > > choose tactics with unlimited downsides. > > ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From hui.shi at linaro.org Wed Jan 20 13:30:31 2016 From: hui.shi at linaro.org (Hui Shi) Date: Wed, 20 Jan 2016 21:30:31 +0800 Subject: RFR(s): AARCH64: 8147805: C1 segmentation fault due to inline Unsafe::getAndSetObject Message-ID: Hi All, Could some one help review this AArch64 C1 issue? Issue happens when inline unsafe.getAndSet(data) in C1 and UseCompressedOops flag is true, register is compressed for store, but it is not restored into decompressed form. Later compressed result is used as reference address and goes wrong. Bug: https://bugs.openjdk.java.net/browse/JDK-8147805 webrev: http://cr.openjdk.java.net/~hshi/8147805/webrev/ Small test case in http://cr.openjdk.java.net/~hshi/8147805/TestUnsafe.java Crash can be reproduced by java -XX:TieredStopAtLevel=3 -XX:+TieredCompilation -Xms4G -Xmx4G TestUnsafe In following method, n is stored two times, first in unsafe.getAndSet, second when store old.next. public Node foo(Node n) { Node old; old = this.getAndSet(n); // inline sun.misc.Unsafe::getAndSetObject here, shift first time for store old.next = n; // n is used again and store into old.next, shift again for store return old; } In generated assemlby, can see "x2" is shifted but not restored 0x0000007f943af3dc: lsr x2, x2, #3 // x2 is shifted but not restored 0x0000007f943af3e0: add x4, x1, #0xc 0x0000007f943af3e4: ldaxr w3, [x4] 0x0000007f943af3e8: stlxr w9, w2, [x4] 0x0000007f943af3ec: cbnz w9, 0x0000007f943af3e4 0x0000007f943af3f0: lsl x3, x3, #3 0x0000007f943af3f4: dmb ish 0x0000007f943af504: lsr x8, x2, #3 // x2 is shifted again and wrong 0x0000007f943af508: str w8, [x0,#16] 0x0000007f943af50c: lsr x2, x0, #9 0x0000007f943af510: strb wzr, [x2,x1,lsl #0] ;*putfield next ; - TestUnsafe::foo at 11 (line 25) Patch is using rscratch1 to hold heap_oop address for store when UseCompressedOops is true. So later use still get correct object address. Regards Hui -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.schatz at oracle.com Wed Jan 20 13:35:21 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Wed, 20 Jan 2016 14:35:21 +0100 Subject: RFR: 8147599: [JVMCI] simplify code installation interface Message-ID: <569F8D19.4090305@oracle.com> Hi, Please review this change to the JVMCI code installation interface: webrev: http://cr.openjdk.java.net/~rschatz/JDK-8147599/webrev.00/ jira: https://bugs.openjdk.java.net/browse/JDK-8147599 The new classes in the jdk.vm.ci.code.site package used to be inner classes in the removed CompilationResult class, no actual code changes there. Thanks, Roland From aph at redhat.com Wed Jan 20 14:12:57 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 20 Jan 2016 14:12:57 +0000 Subject: RFR(s): AARCH64: 8147805: C1 segmentation fault due to inline Unsafe::getAndSetObject In-Reply-To: References: Message-ID: <569F95E9.1070202@redhat.com> On 01/20/2016 01:30 PM, Hui Shi wrote: > Could some one help review this AArch64 C1 issue? OK, thanks, I'm looking at this to make sure thus problem does not exist elsewhere. Andrew. From edward.nevill at gmail.com Wed Jan 20 14:21:25 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 20 Jan 2016 14:21:25 +0000 Subject: [aarch64-port-dev ] RFR(s): AARCH64: 8147805: C1 segmentation fault due to inline Unsafe::getAndSetObject In-Reply-To: References: Message-ID: <1453299685.3772.2.camel@mint> On Wed, 2016-01-20 at 21:30 +0800, Hui Shi wrote: > Hi All, > > Could some one help review this AArch64 C1 issue? Issue happens when inline > unsafe.getAndSet(data) in C1 and UseCompressedOops flag is true, register > is compressed for store, but it is not restored into decompressed form. > Later compressed result is used as reference address and goes wrong. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8147805 > webrev: http://cr.openjdk.java.net/~hshi/8147805/webrev/ > Small test case in http://cr.openjdk.java.net/~hshi/8147805/TestUnsafe.java > Crash can be reproduced by java -XX:TieredStopAtLevel=3 > -XX:+TieredCompilation -Xms4G -Xmx4G TestUnsafe Hi Hui Shi, Thanks for finding this. Your change looks correct, but if I make suggest the following smaller change which achieves the same. diff -r 46c1abd5c34d src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp --- a/src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp Tue Jan 12 14:55:15 2016 +0000 +++ b/src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp Wed Jan 20 14:16:56 2016 +0000 @@ -3169,7 +3169,8 @@ Register obj = as_reg(data); Register dst = as_reg(dest); if (is_oop && UseCompressedOops) { - __ encode_heap_oop(obj); + __ encode_heap_oop(rscratch1, obj); + obj = rscratch1; } assert_different_registers(obj, addr.base(), tmp, rscratch2, dst); Label again; Regards, Ed. From aph at redhat.com Wed Jan 20 14:33:35 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 20 Jan 2016 14:33:35 +0000 Subject: [aarch64-port-dev ] RFR(s): AARCH64: 8147805: C1 segmentation fault due to inline Unsafe::getAndSetObject In-Reply-To: <1453299685.3772.2.camel@mint> References: <1453299685.3772.2.camel@mint> Message-ID: <569F9ABF.5070501@redhat.com> On 01/20/2016 02:21 PM, Edward Nevill wrote: > On Wed, 2016-01-20 at 21:30 +0800, Hui Shi wrote: >> Hi All, >> >> Could some one help review this AArch64 C1 issue? Issue happens when inline >> unsafe.getAndSet(data) in C1 and UseCompressedOops flag is true, register >> is compressed for store, but it is not restored into decompressed form. >> Later compressed result is used as reference address and goes wrong. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8147805 >> webrev: http://cr.openjdk.java.net/~hshi/8147805/webrev/ >> Small test case in http://cr.openjdk.java.net/~hshi/8147805/TestUnsafe.java >> Crash can be reproduced by java -XX:TieredStopAtLevel=3 >> -XX:+TieredCompilation -Xms4G -Xmx4G TestUnsafe > > Hi Hui Shi, > > Thanks for finding this. Your change looks correct, but if I make suggest the following smaller change which achieves the same. > > diff -r 46c1abd5c34d src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp > --- a/src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp Tue Jan 12 14:55:15 2016 +0000 > +++ b/src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp Wed Jan 20 14:16:56 2016 +0000 > @@ -3169,7 +3169,8 @@ > Register obj = as_reg(data); > Register dst = as_reg(dest); > if (is_oop && UseCompressedOops) { > - __ encode_heap_oop(obj); > + __ encode_heap_oop(rscratch1, obj); > + obj = rscratch1; > } > assert_different_registers(obj, addr.base(), tmp, rscratch2, dst); > Label again; I agree. I have tried this and it works well. The patch is OK with this change. Andrew. From duncan.macgregor at ge.com Wed Jan 20 14:53:16 2016 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Wed, 20 Jan 2016 14:53:16 +0000 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <569F7558.1030800@oracle.com> References: <569CE098.4030807@oracle.com> <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> <569F7558.1030800@oracle.com> Message-ID: I was going to say it is unlikely to matter in production cases but might well hit test code which does extensive meta-programming, but actually, since it?s a question of invalidations across _all_ sites, rather than any single one I think it might make a difference. I?ll need to take a look at what our compilation counts eventually come to and experiment with changing the limits. We did work quite early on to limit the the extent of call site invalidations. One thing that might affect this is how megamorphic call sites are handled. At the moment we keep a cache of classes, method handles, and switch points, and we check the switch point before calling the method handle. I had considered a change to bind the switch points to the method handles and thus allow those checks to be optimised out for methods called extensively from mega-morphia call sites, would that also fall foul of the compilation count being increased? I think there is definitely room for communicating more in the nature of a callsite to the JIT. Whether this should be around recompilation or perhaps more focused round inlining and type specialisation to avoid invalidations and recompilation would be my question. For example, method invocation sites may go megamorphic, and this currently forms a barrier to the JIT seeing the types in a way that doesn?t really exist with standard invokeVirtual sites. If there was some feedback loop allowing sites to be cloned as methods are inlined, and a way to indicate this was allowed or desired, then that might allow significantly more optimisations to happen in invokeDynamic based languages. It would also probably be a horror to implement in the current model, but I?m sure you guys can fix all that. :-) Duncan. On 20/01/2016, 11:54, "mlvm-dev on behalf of Vladimir Ivanov" wrote: >MLVM folks, I'd like to hear your opinion about what kind of behavior do >you expect from JVM w.r.t. mutable call sites. > >There are valid use-cases when JVM shouldn't throttle the recompilation >(e.g., long-running application with indy-based dynamic tracing). Maybe >there's a place for a new CallSite flavor to clearly communicate >application expectations to the JVM? Either always recompile (thus >eventually reaching peak performance) or give up and generate less >efficient machine code, but save on possible recompilations. > >Best regards, >Vladimir Ivanov > >On 1/20/16 2:37 AM, John Rose wrote: >> On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov >> > >>wrote: >>> >>> The fix is to avoid updating recompilation count when corresponding >>> nmethod is invalidated due to a call site target change. >> >> Although I'm not vetoing it (since it seems it will help customers in >> the short term), I'm uncomfortable with this fix because it doesn't >> scale to large dyn. lang. applications with many unstable call sites. >> Put another way, it feels like we are duct-taping down a failsafe >> switch (against infinite recompilation) in order to spam a >> micro-benchmark: a small number mega-mutable call sites for which we >> are willing to spend (potentially) all of the JIT resources, including >> those usually allocated to application performance in the steady state. >> Put a third way: I am not comfortable with unthrottled infinite >> recompilation as a performance strategy. >> >> I've commented on the new RFE (JDK-8147550) where to go next, including >> the following sentiments: >> >>> There is a serious design tension here, though: Some users apparently >>> are willing to endure an infinite series of recompilations as part of >>> the cost of doing business; JDK-7177745 addresses this need by turning >>> off the fail-safe against (accidental, buggy) infinite recompilation >>> for unstable CSs. Other users might find that having a percentage of >>> machine time devoted to recompilation is a problem. (This has been the >>> case in the past with non-dynamic languages, at least.) The code shape >>> proposed in this bug report would cover all simple unstable call >>> sites (bi-stable, for example, would compile to a bi-morphic call), >>> but, in pathological cases (infinite sequence of distinct CS targets) >>> would "settle down" into a code shape that would be sub-optimal for >>> any single target, but (as an indirect MH call) reasonable for all the >>> targets together. >>> >>> In the absence of clear direction from the user or the profile, the >>> JVM has to choose infinite recompilation or a good-enough final >>> compilation. The latter choice is safer. And the >>> infinite recompilation is less safe because there is no intrinsic >>> bound on the amount of machine cycles that could be diverted to >>> recompilation, given a dynamic language application with >>> enough mega-mutable CSs. Settling down to a network of indirect calls >>> has a bounded cost. >>> >>> Yes, one size-fits-all tactics never please everybody. But the JVM >>> should not choose tactics with unlimited downsides. >> >> ? John >_______________________________________________ >mlvm-dev mailing list >mlvm-dev at openjdk.java.net >https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openjdk.java.net_ >mailman_listinfo_mlvm-2Ddev&d=CwIGaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUr >LrDQYWSI&r=aV08z5NG4zOHLhrrnNlp8QUqO3qoRJCN9uQ9bkMSeqE&m=hJAS8YcL1L_GjSeHP >APOmxt0FJdHHHxFhzDQ8MvmlGY&s=5oO4O88gUQMGupvgF779GMNnLvUSUnYOGcvKCpcm8sU&e >= From roland.schatz at oracle.com Wed Jan 20 15:28:33 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Wed, 20 Jan 2016 16:28:33 +0100 Subject: RFR and workflow question Message-ID: <569FA7A1.1080200@oracle.com> Hi! Please review this small bugfix: http://cr.openjdk.java.net/~rschatz/JDK-8147475/webrev.00/ This is on top of another webrev[1]. It has no semantic dependency on it, just a random source-level conflict. I can rebase them if we want to integrate this one first. This is missing a commit message, and I'm not sure what to use here. There are two issues that I *think* this will solve, but I can't be sure since I haven't managed to reproduce them: https://bugs.openjdk.java.net/browse/JDK-8147475 https://bugs.openjdk.java.net/browse/JDK-8146608 Do I just use one of the above issues randomly for the commit message, and close the other as duplicate? Thanks, Roland [1] http://cr.openjdk.java.net/~rschatz/JDK-8147599/webrev.00/ From igor.ignatyev at oracle.com Wed Jan 20 16:05:09 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 20 Jan 2016 19:05:09 +0300 Subject: RFR(XS) : 8141557 : TestResolvedJavaMethod.java times out after 1000 ms In-Reply-To: <3DDA7A22-74CF-400A-A403-9CE70655ABD5@oracle.com> References: <52CAB89A-1AA5-4545-9C4B-DD2A6880E463@oracle.com> <3DDA7A22-74CF-400A-A403-9CE70655ABD5@oracle.com> Message-ID: <613621B5-925D-4E72-A115-1480F47BA43C@oracle.com> Hi Chris, thank you for review. Y, it?s a typo, I?ve fixed that (s/TestAnnotionation/TestAnnotation/g) and added explicit 'value =?: http://cr.openjdk.java.net/~iignatyev/8141557/webrev.01/ Thanks, Igor > On Jan 20, 2016, at 3:45 AM, Christian Thalinger wrote: > > I suppose TestAnnotionation is a typo? > > + @TestAnnotionation(1000L) > > Could you change that to value = 1000L? Just for extra clarity. Then it looks good. > >> On Jan 19, 2016, at 11:06 AM, Igor Ignatyev wrote: >> >> http://cr.openjdk.java.net/~iignatyev/8141557/webrev.00/ >>> 22 lines changed: 16 ins; 0 del; 6 mod; >> >> Hi all, >> >> Could you please review the fix for 8141557? >> >> The test uses timeout value of org.junit.Test to test reading annotation via JVMCI. In some cases, e.g. on embedded platforms, debug builds or w/ extra vm flags like -Xcomp, 1000ms isn?t enough for the test to complete, and since jtreg doesn?t apply timeout factor for junit/testng timeouts (CODETOOLS-7901567) the test timeouts despite the fact of increased timeout factor. >> >> The fix changes the test to use a separate annotation, which doesn?t affect test execution, and remove timeout value (which means no timeout). >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8141557 >> testing: locally >> >> Thanks, >> Igor > From volker.simonis at gmail.com Wed Jan 20 16:23:27 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 20 Jan 2016 17:23:27 +0100 Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228C120@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116567228C120@DEWDFEMB19C.global.corp.sap> Message-ID: Hi Martin, thanks for your thorough review. I've uploaded a new webrev to: http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336.v1/ Please find my comments inline. Regards, Volker On Wed, Jan 20, 2016 at 12:43 PM, Doerr, Martin wrote: > Hi Volker, > > thank you very much for adapting the non-CompactStrings version of the intrinsics. I especially like that you changed shared code to improve matching of special cases. > > Here are some minor change requests: > - I guess you will have to adapt Copyright messages. Done. > - There's a typo in the new comment in library_call: "optimzed". Fixed. > - The comment for the instruction count (used for loop alignment) is wrong in MacroAssembler::string_indexof_1 (should start with 3 instead of 2). > Right, fixed. > I have more change requests regarding ppc.ad: > > The computation of chr is incorrect for little endian in string_indexOf_imm1_char and string_indexOf_imm1. > Good catch. Fixed. > Some numbers for compute_padding should be adapted: > int string_indexOf_imm1_charNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } > int string_indexOfCharNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } > int string_compareNode::compute_padding(int current_offset) const { return (2*4-current_offset)&31; } > Right, and also: int string_indexOf_imm1Node::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } I have now put a comment in each method which points to the macro assembler method it depends on to make this dependency explicit. > Some kill effects are missing: > - ctr in all string_indexOf nodes Added kill effect for ctr register to all str_indexof intrinsics. > - cr0, cr1 in string_indexOf_imm1, string_indexOfChar > Fixed. > The new comment for string_indexOfChar claims "// Kill ... needle" which is not true. > Right, fixed. > Thanks, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Volker Simonis > Sent: Dienstag, 19. Januar 2016 19:57 > To: hotspot compiler > Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change > > Hi, > > can somebody please review and sponsor this change. > > Despite the bug summary, I still had to do some small shared changes > to make this work, so unfortunately I can not push this on my own. > > The change also affects aarch64 (although it is minimal and I don't > expect it to break anything) so I cc-ed aarch64-port-dev. > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ > https://bugs.openjdk.java.net/browse/JDK-8145336 > > As described in the bug, this change only fixes the string intrinsics > for the -XX:-UseCompactStrings mode which is still the default on > ppc64. Additionally, support for the new StrIndexOfChar intrinsic was > added because we already had a similar intrinsic for constant string > needles of length one anyway. A later change (which we're already > working on) will add the intrinsics which can handle compact strings. > > The current intrinsics can handle both, the new byte-array based > string representation as well as the old char-array based string > representation because we internally still use the new hotspot with > older versions of the class libraries. > > I've also ported some of our internal string tests into a new > regression test (TestStringIntrinsics2.java) because the existing > tests didn't exercise all of our intrinsics. > > Following the shared changes I had to do: > > Until now, UseSSE42Intrinsics was a global shared option which was > used to control the availability of the stringIndexOf intrinsics. But > UseSSE42Intrinsics is actually a x86-specific feature so it doesn't > make a lot of sense to define it for other architectures. I've > therefore moved the flag to globals_x86.hpp and changed the condition > which checks for the ability of the stringIndexOf intrinsics from: > > if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { > > to: > > if (!Matcher::match_rule_supported(Op_StrIndexOf)) { > > The Matcher::match_rule_supported() method already calls > Matcher::has_match_rule() anyway. And it is implemented in the .ad > file so I've moved the check for UseSSE42Intrinsics into x86.ad. Other > platforms can now decide in their .ad file if they unconditionally > support the intrinsic or if they need a special feature check. This > change was already briefly discussed in [1]. > > The other shared change I had to make was in > LibraryCallKit::make_string_method_node() for the "Op_StrEquals" case. > We have optimized intrinsics for the case that one of the strings to > compare is constant, but the StrEqualsNode is constructed without > taking into account that one of the string length values could be a > constant. This prevented our optimized instruction from being matched > in the ad-file. > > All the other changes are ppc-specific. > > Thank you and best regards, > Volker > > > [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/thread.html#20400 From pavel.punegov at oracle.com Wed Jan 20 16:36:13 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Wed, 20 Jan 2016 19:36:13 +0300 Subject: RFR (S): 8145800: [Testbug] CompilerControl: inline message differs for not inlined methods Message-ID: <0B7801F7-DEFA-482F-BD24-C06BC3037E0B@oracle.com> Hi, please review the following fix for the test bug. Issue: tests incorrectly set inlining state for methods, that belong to any of Internal subclass of both pool.sub.Klass and pool.subpack.KlassDup. This happen because test have an assumption that any of method callers will match only *.* directive pattern. But they could match patterns like ?*Internal*?, because a typical method caller in this case could be pool/sub/Klass$Internal::lambda$getAllMethods$0. Fix: Make method callers (lambdas) do not contain any names used in the test, such as Internal, or Klass. That?s why all executable and callable creation was moved to a new SubMethodHolder class. bug id: https://bugs.openjdk.java.net/browse/JDK-8145800 webrev: http://cr.openjdk.java.net/~ppunegov/8145800/webrev.00/ ? Thanks, Pavel Punegov -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Wed Jan 20 16:45:44 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 20 Jan 2016 16:45:44 +0000 Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change In-Reply-To: References: <7C9B87B351A4BA4AA9EC95BB418116567228C120@DEWDFEMB19C.global.corp.sap> Message-ID: <7C9B87B351A4BA4AA9EC95BB418116567228C273@DEWDFEMB19C.global.corp.sap> Hi Volker, thanks for the update. Looks good. Best regards, Martin -----Original Message----- From: Volker Simonis [mailto:volker.simonis at gmail.com] Sent: Mittwoch, 20. Januar 2016 17:23 To: Doerr, Martin Cc: hotspot compiler ; ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Re: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change Hi Martin, thanks for your thorough review. I've uploaded a new webrev to: http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336.v1/ Please find my comments inline. Regards, Volker On Wed, Jan 20, 2016 at 12:43 PM, Doerr, Martin wrote: > Hi Volker, > > thank you very much for adapting the non-CompactStrings version of the intrinsics. I especially like that you changed shared code to improve matching of special cases. > > Here are some minor change requests: > - I guess you will have to adapt Copyright messages. Done. > - There's a typo in the new comment in library_call: "optimzed". Fixed. > - The comment for the instruction count (used for loop alignment) is wrong in MacroAssembler::string_indexof_1 (should start with 3 instead of 2). > Right, fixed. > I have more change requests regarding ppc.ad: > > The computation of chr is incorrect for little endian in string_indexOf_imm1_char and string_indexOf_imm1. > Good catch. Fixed. > Some numbers for compute_padding should be adapted: > int string_indexOf_imm1_charNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } > int string_indexOfCharNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } > int string_compareNode::compute_padding(int current_offset) const { return (2*4-current_offset)&31; } > Right, and also: int string_indexOf_imm1Node::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } I have now put a comment in each method which points to the macro assembler method it depends on to make this dependency explicit. > Some kill effects are missing: > - ctr in all string_indexOf nodes Added kill effect for ctr register to all str_indexof intrinsics. > - cr0, cr1 in string_indexOf_imm1, string_indexOfChar > Fixed. > The new comment for string_indexOfChar claims "// Kill ... needle" which is not true. > Right, fixed. > Thanks, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Volker Simonis > Sent: Dienstag, 19. Januar 2016 19:57 > To: hotspot compiler > Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change > > Hi, > > can somebody please review and sponsor this change. > > Despite the bug summary, I still had to do some small shared changes > to make this work, so unfortunately I can not push this on my own. > > The change also affects aarch64 (although it is minimal and I don't > expect it to break anything) so I cc-ed aarch64-port-dev. > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ > https://bugs.openjdk.java.net/browse/JDK-8145336 > > As described in the bug, this change only fixes the string intrinsics > for the -XX:-UseCompactStrings mode which is still the default on > ppc64. Additionally, support for the new StrIndexOfChar intrinsic was > added because we already had a similar intrinsic for constant string > needles of length one anyway. A later change (which we're already > working on) will add the intrinsics which can handle compact strings. > > The current intrinsics can handle both, the new byte-array based > string representation as well as the old char-array based string > representation because we internally still use the new hotspot with > older versions of the class libraries. > > I've also ported some of our internal string tests into a new > regression test (TestStringIntrinsics2.java) because the existing > tests didn't exercise all of our intrinsics. > > Following the shared changes I had to do: > > Until now, UseSSE42Intrinsics was a global shared option which was > used to control the availability of the stringIndexOf intrinsics. But > UseSSE42Intrinsics is actually a x86-specific feature so it doesn't > make a lot of sense to define it for other architectures. I've > therefore moved the flag to globals_x86.hpp and changed the condition > which checks for the ability of the stringIndexOf intrinsics from: > > if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { > > to: > > if (!Matcher::match_rule_supported(Op_StrIndexOf)) { > > The Matcher::match_rule_supported() method already calls > Matcher::has_match_rule() anyway. And it is implemented in the .ad > file so I've moved the check for UseSSE42Intrinsics into x86.ad. Other > platforms can now decide in their .ad file if they unconditionally > support the intrinsic or if they need a special feature check. This > change was already briefly discussed in [1]. > > The other shared change I had to make was in > LibraryCallKit::make_string_method_node() for the "Op_StrEquals" case. > We have optimized intrinsics for the case that one of the strings to > compare is constant, but the StrEqualsNode is constructed without > taking into account that one of the string length values could be a > constant. This prevented our optimized instruction from being matched > in the ad-file. > > All the other changes are ppc-specific. > > Thank you and best regards, > Volker > > > [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/thread.html#20400 From vladimir.kozlov at oracle.com Wed Jan 20 16:46:48 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Jan 2016 08:46:48 -0800 Subject: RFR and workflow question In-Reply-To: <569FA7A1.1080200@oracle.com> References: <569FA7A1.1080200@oracle.com> Message-ID: <569FB9F8.1070802@oracle.com> Use 8147475 for commit message - your change can be related (relocation info could be affected by padding). But I don't see how your change can fix 8146608 - it is patching return PC which is SP relative and nothing to do with padding in prolog. Changes looks fine. Please, integrate it first since it affect all hotspot repos. Thanks, Vladimir On 1/20/16 7:28 AM, Roland Schatz wrote: > Hi! > > Please review this small bugfix: > http://cr.openjdk.java.net/~rschatz/JDK-8147475/webrev.00/ > > This is on top of another webrev[1]. It has no semantic dependency on it, just a random source-level conflict. I can > rebase them if we want to integrate this one first. > > > This is missing a commit message, and I'm not sure what to use here. > > There are two issues that I *think* this will solve, but I can't be sure since I haven't managed to reproduce them: > https://bugs.openjdk.java.net/browse/JDK-8147475 > https://bugs.openjdk.java.net/browse/JDK-8146608 > > Do I just use one of the above issues randomly for the commit message, and close the other as duplicate? > > > Thanks, > Roland > > [1] http://cr.openjdk.java.net/~rschatz/JDK-8147599/webrev.00/ From vladimir.kozlov at oracle.com Wed Jan 20 16:49:11 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Jan 2016 08:49:11 -0800 Subject: RFR (S): 8145800: [Testbug] CompilerControl: inline message differs for not inlined methods In-Reply-To: <0B7801F7-DEFA-482F-BD24-C06BC3037E0B@oracle.com> References: <0B7801F7-DEFA-482F-BD24-C06BC3037E0B@oracle.com> Message-ID: <569FBA87.4@oracle.com> Good. Thanks, Vladimir On 1/20/16 8:36 AM, Pavel Punegov wrote: > Hi, > > please review the following fix for the test bug. > > Issue: tests incorrectly set inlining state for methods, that belong to any of Internal subclass of both pool.sub.Klass > and pool.subpack.KlassDup. > This happen because test have an assumption that any of method callers will match only *.* directive pattern. But they > could match patterns like ?*Internal*?, because > a typical method caller in this case could be pool/sub/Klass$Internal::lambda$getAllMethods$0. > > Fix: Make method callers (lambdas) do not contain any names used in the test, such as Internal, or Klass. That?s why all > executable and callable creation was moved to a new SubMethodHolder class. > > bug id: https://bugs.openjdk.java.net/browse/JDK-8145800 > webrev: http://cr.openjdk.java.net/~ppunegov/8145800/webrev.00/ > > ? Thanks, > Pavel Punegov > From vladimir.kozlov at oracle.com Wed Jan 20 17:08:18 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Jan 2016 09:08:18 -0800 Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change In-Reply-To: <7C9B87B351A4BA4AA9EC95BB418116567228C273@DEWDFEMB19C.global.corp.sap> References: <7C9B87B351A4BA4AA9EC95BB418116567228C120@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB418116567228C273@DEWDFEMB19C.global.corp.sap> Message-ID: <569FBF02.6080707@oracle.com> +1. Finally UseSSE42Intrinsics was moved! I will sponsor it. Thanks, Vladimir On 1/20/16 8:45 AM, Doerr, Martin wrote: > Hi Volker, > > thanks for the update. Looks good. > > Best regards, > Martin > > -----Original Message----- > From: Volker Simonis [mailto:volker.simonis at gmail.com] > Sent: Mittwoch, 20. Januar 2016 17:23 > To: Doerr, Martin > Cc: hotspot compiler ; ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: Re: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change > > Hi Martin, > > thanks for your thorough review. I've uploaded a new webrev to: > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336.v1/ > > Please find my comments inline. > > Regards, > Volker > > > On Wed, Jan 20, 2016 at 12:43 PM, Doerr, Martin wrote: >> Hi Volker, >> >> thank you very much for adapting the non-CompactStrings version of the intrinsics. I especially like that you changed shared code to improve matching of special cases. >> >> Here are some minor change requests: >> - I guess you will have to adapt Copyright messages. > > Done. > >> - There's a typo in the new comment in library_call: "optimzed". > > Fixed. > >> - The comment for the instruction count (used for loop alignment) is wrong in MacroAssembler::string_indexof_1 (should start with 3 instead of 2). >> > > Right, fixed. > >> I have more change requests regarding ppc.ad: >> >> The computation of chr is incorrect for little endian in string_indexOf_imm1_char and string_indexOf_imm1. >> > > Good catch. Fixed. > >> Some numbers for compute_padding should be adapted: >> int string_indexOf_imm1_charNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } >> int string_indexOfCharNode::compute_padding(int current_offset) const { return (3*4-current_offset)&31; } >> int string_compareNode::compute_padding(int current_offset) const { return (2*4-current_offset)&31; } >> > > Right, and also: > int string_indexOf_imm1Node::compute_padding(int current_offset) const > { return (3*4-current_offset)&31; } > > I have now put a comment in each method which points to the macro > assembler method it depends on to make this dependency explicit. > >> Some kill effects are missing: >> - ctr in all string_indexOf nodes > > Added kill effect for ctr register to all str_indexof intrinsics. > >> - cr0, cr1 in string_indexOf_imm1, string_indexOfChar >> > > Fixed. > >> The new comment for string_indexOfChar claims "// Kill ... needle" which is not true. >> > > Right, fixed. > >> Thanks, >> Martin >> >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Volker Simonis >> Sent: Dienstag, 19. Januar 2016 19:57 >> To: hotspot compiler >> Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net >> Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change >> >> Hi, >> >> can somebody please review and sponsor this change. >> >> Despite the bug summary, I still had to do some small shared changes >> to make this work, so unfortunately I can not push this on my own. >> >> The change also affects aarch64 (although it is minimal and I don't >> expect it to break anything) so I cc-ed aarch64-port-dev. >> >> http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ >> https://bugs.openjdk.java.net/browse/JDK-8145336 >> >> As described in the bug, this change only fixes the string intrinsics >> for the -XX:-UseCompactStrings mode which is still the default on >> ppc64. Additionally, support for the new StrIndexOfChar intrinsic was >> added because we already had a similar intrinsic for constant string >> needles of length one anyway. A later change (which we're already >> working on) will add the intrinsics which can handle compact strings. >> >> The current intrinsics can handle both, the new byte-array based >> string representation as well as the old char-array based string >> representation because we internally still use the new hotspot with >> older versions of the class libraries. >> >> I've also ported some of our internal string tests into a new >> regression test (TestStringIntrinsics2.java) because the existing >> tests didn't exercise all of our intrinsics. >> >> Following the shared changes I had to do: >> >> Until now, UseSSE42Intrinsics was a global shared option which was >> used to control the availability of the stringIndexOf intrinsics. But >> UseSSE42Intrinsics is actually a x86-specific feature so it doesn't >> make a lot of sense to define it for other architectures. I've >> therefore moved the flag to globals_x86.hpp and changed the condition >> which checks for the ability of the stringIndexOf intrinsics from: >> >> if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { >> >> to: >> >> if (!Matcher::match_rule_supported(Op_StrIndexOf)) { >> >> The Matcher::match_rule_supported() method already calls >> Matcher::has_match_rule() anyway. And it is implemented in the .ad >> file so I've moved the check for UseSSE42Intrinsics into x86.ad. Other >> platforms can now decide in their .ad file if they unconditionally >> support the intrinsic or if they need a special feature check. This >> change was already briefly discussed in [1]. >> >> The other shared change I had to make was in >> LibraryCallKit::make_string_method_node() for the "Op_StrEquals" case. >> We have optimized intrinsics for the case that one of the strings to >> compare is constant, but the StrEqualsNode is constructed without >> taking into account that one of the string length values could be a >> constant. This prevented our optimized instruction from being matched >> in the ad-file. >> >> All the other changes are ppc-specific. >> >> Thank you and best regards, >> Volker >> >> >> [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/thread.html#20400 From volker.simonis at gmail.com Wed Jan 20 17:11:35 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 20 Jan 2016 18:11:35 +0100 Subject: RFR(M): 8145336: PPC64: fix string intrinsics after CompactStrings change In-Reply-To: <569FBF02.6080707@oracle.com> References: <7C9B87B351A4BA4AA9EC95BB418116567228C120@DEWDFEMB19C.global.corp.sap> <7C9B87B351A4BA4AA9EC95BB418116567228C273@DEWDFEMB19C.global.corp.sap> <569FBF02.6080707@oracle.com> Message-ID: Great! Thanks a lot Vladimir, Volker On Wed, Jan 20, 2016 at 6:08 PM, Vladimir Kozlov wrote: > +1. Finally UseSSE42Intrinsics was moved! > I will sponsor it. > > Thanks, > Vladimir > > > > On 1/20/16 8:45 AM, Doerr, Martin wrote: >> >> Hi Volker, >> >> thanks for the update. Looks good. >> >> Best regards, >> Martin >> >> -----Original Message----- >> From: Volker Simonis [mailto:volker.simonis at gmail.com] >> Sent: Mittwoch, 20. Januar 2016 17:23 >> To: Doerr, Martin >> Cc: hotspot compiler ; >> ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net >> Subject: Re: RFR(M): 8145336: PPC64: fix string intrinsics after >> CompactStrings change >> >> Hi Martin, >> >> thanks for your thorough review. I've uploaded a new webrev to: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336.v1/ >> >> Please find my comments inline. >> >> Regards, >> Volker >> >> >> On Wed, Jan 20, 2016 at 12:43 PM, Doerr, Martin >> wrote: >>> >>> Hi Volker, >>> >>> thank you very much for adapting the non-CompactStrings version of the >>> intrinsics. I especially like that you changed shared code to improve >>> matching of special cases. >>> >>> Here are some minor change requests: >>> - I guess you will have to adapt Copyright messages. >> >> >> Done. >> >>> - There's a typo in the new comment in library_call: "optimzed". >> >> >> Fixed. >> >>> - The comment for the instruction count (used for loop alignment) is >>> wrong in MacroAssembler::string_indexof_1 (should start with 3 instead of >>> 2). >>> >> >> Right, fixed. >> >>> I have more change requests regarding ppc.ad: >>> >>> The computation of chr is incorrect for little endian in >>> string_indexOf_imm1_char and string_indexOf_imm1. >>> >> >> Good catch. Fixed. >> >>> Some numbers for compute_padding should be adapted: >>> int string_indexOf_imm1_charNode::compute_padding(int current_offset) >>> const { return (3*4-current_offset)&31; } >>> int string_indexOfCharNode::compute_padding(int current_offset) const { >>> return (3*4-current_offset)&31; } >>> int string_compareNode::compute_padding(int current_offset) const { >>> return (2*4-current_offset)&31; } >>> >> >> Right, and also: >> int string_indexOf_imm1Node::compute_padding(int current_offset) const >> { return (3*4-current_offset)&31; } >> >> I have now put a comment in each method which points to the macro >> assembler method it depends on to make this dependency explicit. >> >>> Some kill effects are missing: >>> - ctr in all string_indexOf nodes >> >> >> Added kill effect for ctr register to all str_indexof intrinsics. >> >>> - cr0, cr1 in string_indexOf_imm1, string_indexOfChar >>> >> >> Fixed. >> >>> The new comment for string_indexOfChar claims "// Kill ... needle" which >>> is not true. >>> >> >> Right, fixed. >> >>> Thanks, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Volker >>> Simonis >>> Sent: Dienstag, 19. Januar 2016 19:57 >>> To: hotspot compiler >>> Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net >>> Subject: RFR(M): 8145336: PPC64: fix string intrinsics after >>> CompactStrings change >>> >>> Hi, >>> >>> can somebody please review and sponsor this change. >>> >>> Despite the bug summary, I still had to do some small shared changes >>> to make this work, so unfortunately I can not push this on my own. >>> >>> The change also affects aarch64 (although it is minimal and I don't >>> expect it to break anything) so I cc-ed aarch64-port-dev. >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8145336/ >>> https://bugs.openjdk.java.net/browse/JDK-8145336 >>> >>> As described in the bug, this change only fixes the string intrinsics >>> for the -XX:-UseCompactStrings mode which is still the default on >>> ppc64. Additionally, support for the new StrIndexOfChar intrinsic was >>> added because we already had a similar intrinsic for constant string >>> needles of length one anyway. A later change (which we're already >>> working on) will add the intrinsics which can handle compact strings. >>> >>> The current intrinsics can handle both, the new byte-array based >>> string representation as well as the old char-array based string >>> representation because we internally still use the new hotspot with >>> older versions of the class libraries. >>> >>> I've also ported some of our internal string tests into a new >>> regression test (TestStringIntrinsics2.java) because the existing >>> tests didn't exercise all of our intrinsics. >>> >>> Following the shared changes I had to do: >>> >>> Until now, UseSSE42Intrinsics was a global shared option which was >>> used to control the availability of the stringIndexOf intrinsics. But >>> UseSSE42Intrinsics is actually a x86-specific feature so it doesn't >>> make a lot of sense to define it for other architectures. I've >>> therefore moved the flag to globals_x86.hpp and changed the condition >>> which checks for the ability of the stringIndexOf intrinsics from: >>> >>> if (!Matcher::has_match_rule(Op_StrIndexOf) || !UseSSE42Intrinsics) { >>> >>> to: >>> >>> if (!Matcher::match_rule_supported(Op_StrIndexOf)) { >>> >>> The Matcher::match_rule_supported() method already calls >>> Matcher::has_match_rule() anyway. And it is implemented in the .ad >>> file so I've moved the check for UseSSE42Intrinsics into x86.ad. Other >>> platforms can now decide in their .ad file if they unconditionally >>> support the intrinsic or if they need a special feature check. This >>> change was already briefly discussed in [1]. >>> >>> The other shared change I had to make was in >>> LibraryCallKit::make_string_method_node() for the "Op_StrEquals" case. >>> We have optimized intrinsics for the case that one of the strings to >>> compare is constant, but the StrEqualsNode is constructed without >>> taking into account that one of the string length values could be a >>> constant. This prevented our optimized instruction from being matched >>> in the ad-file. >>> >>> All the other changes are ppc-specific. >>> >>> Thank you and best regards, >>> Volker >>> >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/thread.html#20400 From pavel.punegov at oracle.com Wed Jan 20 17:23:33 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Wed, 20 Jan 2016 20:23:33 +0300 Subject: RFR (S): 8145800: [Testbug] CompilerControl: inline message differs for not inlined methods In-Reply-To: <569FBA87.4@oracle.com> References: <0B7801F7-DEFA-482F-BD24-C06BC3037E0B@oracle.com> <569FBA87.4@oracle.com> Message-ID: <9E055252-9C22-4438-A210-33A6E7B5360E@oracle.com> Thanks for review, Vladimir. > On 20 Jan 2016, at 19:49, Vladimir Kozlov wrote: > > Good. > Thanks, > Vladimir > > On 1/20/16 8:36 AM, Pavel Punegov wrote: >> Hi, >> >> please review the following fix for the test bug. >> >> Issue: tests incorrectly set inlining state for methods, that belong to any of Internal subclass of both pool.sub.Klass >> and pool.subpack.KlassDup. >> This happen because test have an assumption that any of method callers will match only *.* directive pattern. But they >> could match patterns like ?*Internal*?, because >> a typical method caller in this case could be pool/sub/Klass$Internal::lambda$getAllMethods$0. >> >> Fix: Make method callers (lambdas) do not contain any names used in the test, such as Internal, or Klass. That?s why all >> executable and callable creation was moved to a new SubMethodHolder class. >> >> bug id: https://bugs.openjdk.java.net/browse/JDK-8145800 >> webrev: http://cr.openjdk.java.net/~ppunegov/8145800/webrev.00/ >> >> ? Thanks, >> Pavel Punegov >> From christian.thalinger at oracle.com Thu Jan 21 02:49:21 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 20 Jan 2016 16:49:21 -1000 Subject: RFR(XS) : 8141557 : TestResolvedJavaMethod.java times out after 1000 ms In-Reply-To: <613621B5-925D-4E72-A115-1480F47BA43C@oracle.com> References: <52CAB89A-1AA5-4545-9C4B-DD2A6880E463@oracle.com> <3DDA7A22-74CF-400A-A403-9CE70655ABD5@oracle.com> <613621B5-925D-4E72-A115-1480F47BA43C@oracle.com> Message-ID: <72DE4A03-5A01-439A-8D01-F6C5A8BDFB86@oracle.com> Looks good. > On Jan 20, 2016, at 6:05 AM, Igor Ignatyev wrote: > > Hi Chris, > > thank you for review. > > Y, it?s a typo, I?ve fixed that (s/TestAnnotionation/TestAnnotation/g) and added explicit 'value =?: > http://cr.openjdk.java.net/~iignatyev/8141557/webrev.01/ > > Thanks, > Igor > > >> On Jan 20, 2016, at 3:45 AM, Christian Thalinger wrote: >> >> I suppose TestAnnotionation is a typo? >> >> + @TestAnnotionation(1000L) >> >> Could you change that to value = 1000L? Just for extra clarity. Then it looks good. >> >>> On Jan 19, 2016, at 11:06 AM, Igor Ignatyev wrote: >>> >>> http://cr.openjdk.java.net/~iignatyev/8141557/webrev.00/ >>>> 22 lines changed: 16 ins; 0 del; 6 mod; >>> >>> Hi all, >>> >>> Could you please review the fix for 8141557? >>> >>> The test uses timeout value of org.junit.Test to test reading annotation via JVMCI. In some cases, e.g. on embedded platforms, debug builds or w/ extra vm flags like -Xcomp, 1000ms isn?t enough for the test to complete, and since jtreg doesn?t apply timeout factor for junit/testng timeouts (CODETOOLS-7901567) the test timeouts despite the fact of increased timeout factor. >>> >>> The fix changes the test to use a separate annotation, which doesn?t affect test execution, and remove timeout value (which means no timeout). >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8141557 >>> testing: locally >>> >>> Thanks, >>> Igor >> > From christian.thalinger at oracle.com Thu Jan 21 02:56:58 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 20 Jan 2016 16:56:58 -1000 Subject: RFR and workflow question In-Reply-To: <569FB9F8.1070802@oracle.com> References: <569FA7A1.1080200@oracle.com> <569FB9F8.1070802@oracle.com> Message-ID: <8A7390F7-4DC5-4033-B12F-F1202EDA51B8@oracle.com> > On Jan 20, 2016, at 6:46 AM, Vladimir Kozlov wrote: > > Use 8147475 for commit message - your change can be related (relocation info could be affected by padding). > But I don't see how your change can fix 8146608 - it is patching return PC which is SP relative and nothing to do with padding in prolog. > > Changes looks fine. Please, integrate it first since it affect all hotspot repos. Roland is not a Committer yet; just Author. I?ll take care of it. > > Thanks, > Vladimir > > On 1/20/16 7:28 AM, Roland Schatz wrote: >> Hi! >> >> Please review this small bugfix: >> http://cr.openjdk.java.net/~rschatz/JDK-8147475/webrev.00/ >> >> This is on top of another webrev[1]. It has no semantic dependency on it, just a random source-level conflict. I can >> rebase them if we want to integrate this one first. >> >> >> This is missing a commit message, and I'm not sure what to use here. >> >> There are two issues that I *think* this will solve, but I can't be sure since I haven't managed to reproduce them: >> https://bugs.openjdk.java.net/browse/JDK-8147475 >> https://bugs.openjdk.java.net/browse/JDK-8146608 >> >> Do I just use one of the above issues randomly for the commit message, and close the other as duplicate? >> >> >> Thanks, >> Roland >> >> [1] http://cr.openjdk.java.net/~rschatz/JDK-8147599/webrev.00/ From aph at redhat.com Thu Jan 21 10:22:13 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jan 2016 10:22:13 +0000 Subject: Baffling USE in x86_64.ad Message-ID: <56A0B155.3090802@redhat.com> In this pattern: instruct compI_rReg(rFlagsReg cr, rRegI op1, rRegI op2) %{ match(Set cr (CmpI op1 op2)); effect(DEF cr, USE op1, USE op2); format %{ "cmpl $op1, $op2" %} opcode(0x3B); /* Opcode 3B /r */ ins_encode(REX_reg_reg(op1, op2), OpcP, reg_reg(op1, op2)); ins_pipe(ialu_cr_reg_reg); %} why does the USE appear in the effect? And the DEF? The operands appear in the match expression in the normal way, so I would have thought the effect expression unnecessary. It's this pattern: others don't have the effect: instruct compL_rReg(rFlagsReg cr, rRegL op1, rRegL op2) %{ match(Set cr (CmpL op1 op2)); format %{ "cmpq $op1, $op2" %} opcode(0x3B); /* Opcode 3B /r */ ins_encode(REX_reg_reg_wide(op1, op2), OpcP, reg_reg(op1, op2)); ins_pipe(ialu_cr_reg_reg); %} Thanks, Andrew. From nils.eliasson at oracle.com Thu Jan 21 10:25:59 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 21 Jan 2016 11:25:59 +0100 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err Message-ID: <56A0B237.9090008@oracle.com> Hi, Please review this small change. The diff looks big but most of the change is just changing how the directive are passed to the compilers. Directives are set in the ciEnv and then passed to the compilers. The compilers can then choose to add it to any internal compilation object for convenience. The hs_err printing routine in vmError.cpp loads the directive from the ciEnv. Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ Regards, Nils From adinn at redhat.com Thu Jan 21 10:29:32 2016 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 21 Jan 2016 10:29:32 +0000 Subject: Baffling USE in x86_64.ad In-Reply-To: <56A0B155.3090802@redhat.com> References: <56A0B155.3090802@redhat.com> Message-ID: <56A0B30C.6080205@redhat.com> On 21/01/16 10:22, Andrew Haley wrote: > In this pattern: > > instruct compI_rReg(rFlagsReg cr, rRegI op1, rRegI op2) > %{ > match(Set cr (CmpI op1 op2)); > effect(DEF cr, USE op1, USE op2); > > format %{ "cmpl $op1, $op2" %} > opcode(0x3B); /* Opcode 3B /r */ > ins_encode(REX_reg_reg(op1, op2), OpcP, reg_reg(op1, op2)); > ins_pipe(ialu_cr_reg_reg); > %} > > why does the USE appear in the effect? And the DEF? The operands > appear in the match expression in the normal way, so I would have > thought the effect expression unnecessary. It's this pattern: others > don't have the effect: I am not certain of this but I note that the above rule is used for direct matching and also used for expansion. I believe the expansion process requires an effect declaration even though that effect is also implied by the match rule. There are many instruction definitions which are only used for matching and they have an effect declaration but no match declaration. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (US), Michael O'Neill (Ireland), Paul Argiry (US) From nils.eliasson at oracle.com Thu Jan 21 10:31:52 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 21 Jan 2016 11:31:52 +0100 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <56A0B237.9090008@oracle.com> References: <56A0B237.9090008@oracle.com> Message-ID: <56A0B398.4000408@oracle.com> This is how it looks: [...] --------------- T H R E A D --------------- Current thread (0x00007f071046a000): JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=20033, stack(0x00007f05d7afb000,0x00007f05d7bfc000)] Current CompileTask: C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) Current compiler directive: inline: - Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false PrintInlining:false PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000 Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], sp=0x00007f05d7bfa5d0, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x182 V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char const*, int, char const*, char const*, __va_list_tag*)+0x4a V [libjvm.so+0x908cca] report_vm_error(char const*, int, char const*, char const*, ...)+0xea V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 V [libjvm.so+0x88ec5a] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 V [libjvm.so+0x10189aa] java_start(Thread*)+0xca C [libpthread.so.0+0x8182] start_thread+0xc2 [...] http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt Regards, Nils On 2016-01-21 11:25, Nils Eliasson wrote: > Hi, > > Please review this small change. The diff looks big but most of the > change is just changing how the directive are passed to the compilers. > Directives are set in the ciEnv and then passed to the compilers. The > compilers can then choose to add it to any internal compilation object > for convenience. The hs_err printing routine in vmError.cpp loads the > directive from the ciEnv. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 > Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ > > Regards, > Nils -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Thu Jan 21 10:46:16 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 21 Jan 2016 11:46:16 +0100 Subject: [9] RFR(S): 8065334: CodeHeap expansion fails although there is uncommitted memory Message-ID: <56A0B6F8.5040809@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8065334 http://cr.openjdk.java.net/~thartmann/8065334/webrev.00/ If ReservedCodeCacheSize (or the size of a single code heap) is not a multiple of CodeCacheExpansionSize, the last code heap expansion fails, leaving unused uncommitted memory. For example, see [1]. Both the profiled and the non-profiled segments are full but there is still 32Kb of uncommitted space that is never used (CodeCacheExpansionSize is 64Kb). CodeHeap::expand_by() should check for this condition and commit all the remaining space even if the requested expansion size is larger. Like this, we use all the available space [2]. Thanks, Tobias [1] Baseline: CodeHeap 'non-nmethods': size=5696Kb used=2403Kb max_used=2433Kb free=3292Kb bounds [0x00007f4d83b27000, 0x00007f4d83d97000, 0x00007f4d840b7000] CodeHeap 'profiled nmethods': size=120032Kb used=119999Kb max_used=119999Kb free=32Kb bounds [0x00007f4d840b7000, 0x00007f4d8b5e7000, 0x00007f4d8b5ef000] CodeHeap 'non-profiled nmethods': size=120032Kb used=119999Kb max_used=119999Kb free=32Kb bounds [0x00007f4d8b5ef000, 0x00007f4d92b1f000, 0x00007f4d92b27000] total_blobs=248449 nmethods=52 adapters=570 compilation: disabled (not enough contiguous free space left) [2] Fixed: CodeHeap 'non-nmethods': size=5696Kb used=2404Kb max_used=2436Kb free=3291Kb bounds [0x00007fe8cd000000, 0x00007fe8cd270000, 0x00007fe8cd590000] CodeHeap 'profiled nmethods': size=120032Kb used=120031Kb max_used=120031Kb free=0Kb bounds [0x00007fe8cd590000, 0x00007fe8d4ac8000, 0x00007fe8d4ac8000] CodeHeap 'non-profiled nmethods': size=120032Kb used=120032Kb max_used=120032Kb free=0Kb bounds [0x00007fe8d4ac8000, 0x00007fe8dc000000, 0x00007fe8dc000000] total_blobs=10665 nmethods=127 adapters=570 compilation: disabled (not enough contiguous free space left) From aph at redhat.com Thu Jan 21 11:21:11 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jan 2016 11:21:11 +0000 Subject: Baffling USE in x86_64.ad In-Reply-To: <56A0B30C.6080205@redhat.com> References: <56A0B155.3090802@redhat.com> <56A0B30C.6080205@redhat.com> Message-ID: <56A0BF27.4070200@redhat.com> On 01/21/2016 10:29 AM, Andrew Dinn wrote: > I am not certain of this but I note that the above rule is used for > direct matching and also used for expansion. Aha! So it is. Thanks, I didn't look for that but I see you're right. Andrew. From goetz.lindenmaier at sap.com Thu Jan 21 11:47:19 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 21 Jan 2016 11:47:19 +0000 Subject: Baffling USE in x86_64.ad In-Reply-To: <56A0B30C.6080205@redhat.com> References: <56A0B155.3090802@redhat.com> <56A0B30C.6080205@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41F1651F@DEWDFEMB12A.global.corp.sap> Hi, adlc scans the match rule and derives USE/DEF effects from that. Then it adds in the effects from the effect() declaration, which are the same here. So that line is superfluous. (I once cleaned up the effects in the ppc port and removed all these. And we are expand-power-users :) ) Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Andrew Dinn > Sent: Donnerstag, 21. Januar 2016 11:30 > To: Andrew Haley ; hotspot compiler dev at openjdk.java.net> > Subject: Re: Baffling USE in x86_64.ad > > On 21/01/16 10:22, Andrew Haley wrote: > > In this pattern: > > > > instruct compI_rReg(rFlagsReg cr, rRegI op1, rRegI op2) > > %{ > > match(Set cr (CmpI op1 op2)); > > effect(DEF cr, USE op1, USE op2); > > > > format %{ "cmpl $op1, $op2" %} > > opcode(0x3B); /* Opcode 3B /r */ > > ins_encode(REX_reg_reg(op1, op2), OpcP, reg_reg(op1, op2)); > > ins_pipe(ialu_cr_reg_reg); > > %} > > > > why does the USE appear in the effect? And the DEF? The operands > > appear in the match expression in the normal way, so I would have > > thought the effect expression unnecessary. It's this pattern: others > > don't have the effect: > > I am not certain of this but I note that the above rule is used for > direct matching and also used for expansion. I believe the expansion > process requires an effect declaration even though that effect is also > implied by the match rule. There are many instruction definitions which > are only used for matching and they have an effect declaration but no > match declaration. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (US), Michael O'Neill (Ireland), Paul > Argiry (US) From andreas.eriksson at oracle.com Thu Jan 21 12:51:44 2016 From: andreas.eriksson at oracle.com (Andreas Eriksson) Date: Thu, 21 Jan 2016 13:51:44 +0100 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <56A0B398.4000408@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> Message-ID: <56A0D460.2070006@oracle.com> Looks good to me (not Reviewer). - Andreas On 2016-01-21 11:31, Nils Eliasson wrote: > This is how it looks: > > [...] > --------------- T H R E A D --------------- > > Current thread (0x00007f071046a000): JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=20033, stack(0x00007f05d7afb000,0x00007f05d7bfc000)] > > Current CompileTask: > C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) > > Current compiler directive: > inline: - > Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false PrintInlining:false PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000 > > Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], sp=0x00007f05d7bfa5d0, free space=1021k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x182 > V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char const*, int, char const*, char const*, __va_list_tag*)+0x4a > V [libjvm.so+0x908cca] report_vm_error(char const*, int, char const*, char const*, ...)+0xea > V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 > V [libjvm.so+0x88ec5a] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a > V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 > V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 > V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 > V [libjvm.so+0x10189aa] java_start(Thread*)+0xca > C [libpthread.so.0+0x8182] start_thread+0xc2 > [...] > > http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt > > Regards, > Nils > > On 2016-01-21 11:25, Nils Eliasson wrote: >> Hi, >> >> Please review this small change. The diff looks big but most of the >> change is just changing how the directive are passed to the >> compilers. Directives are set in the ciEnv and then passed to the >> compilers. The compilers can then choose to add it to any internal >> compilation object for convenience. The hs_err printing routine in >> vmError.cpp loads the directive from the ciEnv. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >> >> Regards, >> Nils > -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Thu Jan 21 14:53:23 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 21 Jan 2016 14:53:23 +0000 Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. Message-ID: <4295855A5C1DE049A61835A1887419CC41F1663A@DEWDFEMB12A.global.corp.sap> Hi, SAP changed its name from SAP AG to SAP SE. We were asked to adapt our copyright messages accordingly. This change fixes all SAP copyrights in hostpot to follow the patterns "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." Please review this change. I please need a sponsor. http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.schatz at oracle.com Thu Jan 21 15:41:29 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Thu, 21 Jan 2016 16:41:29 +0100 Subject: RFR(S): 8146244: compiler/jvmci/code/DataPatchTest.java crashes: SIGSEGV in (getConstClass)getConstClass Message-ID: <56A0FC29.7040506@oracle.com> Hi, Please review this small bugfix: webrev: http://cr.openjdk.java.net/~rschatz/JDK-8146244/webrev.00/ issue: https://bugs.openjdk.java.net/browse/JDK-8146244 Thanks, Roland From thomas.stuefe at gmail.com Thu Jan 21 15:48:14 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 21 Jan 2016 16:48:14 +0100 Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. In-Reply-To: <4295855A5C1DE049A61835A1887419CC41F1663A@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC41F1663A@DEWDFEMB12A.global.corp.sap> Message-ID: Hi Goetz, http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/src/os/aix/vm/libodm_aix.cpp.frames.html Please remove Oracle copyright, this is SAP only. http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/src/os/aix/vm/libodm_aix.hpp.frames.html ditto. Otherwise looks fine. ... Thomas On Thu, Jan 21, 2016 at 3:53 PM, Lindenmaier, Goetz < goetz.lindenmaier at sap.com> wrote: > Hi, > > > > SAP changed its name from SAP AG to SAP SE. We were asked to > > adapt our copyright messages accordingly. > > > > This change fixes all SAP copyrights in hostpot to follow the patterns > > "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or > > "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All > rights reserved." > > > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 > > > > Best regards, > > Goetz. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Thu Jan 21 17:29:29 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 21 Jan 2016 18:29:29 +0100 Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. In-Reply-To: <4295855A5C1DE049A61835A1887419CC41F1663A@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC41F1663A@DEWDFEMB12A.global.corp.sap> Message-ID: Looks good! Thanks for doing this cleanup, Volker On Thu, Jan 21, 2016 at 3:53 PM, Lindenmaier, Goetz wrote: > Hi, > > > > SAP changed its name from SAP AG to SAP SE. We were asked to > > adapt our copyright messages accordingly. > > > > This change fixes all SAP copyrights in hostpot to follow the patterns > > "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or > > "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All rights > reserved." > > > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 > > > > Best regards, > > Goetz. > > From vladimir.kozlov at oracle.com Thu Jan 21 18:55:07 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Jan 2016 10:55:07 -0800 Subject: Baffling USE in x86_64.ad In-Reply-To: <4295855A5C1DE049A61835A1887419CC41F1651F@DEWDFEMB12A.global.corp.sap> References: <56A0B155.3090802@redhat.com> <56A0B30C.6080205@redhat.com> <4295855A5C1DE049A61835A1887419CC41F1651F@DEWDFEMB12A.global.corp.sap> Message-ID: <56A1298B.8000804@oracle.com> I agree that is not needed. We never cleaned up this before. I looked on history and it from 2000 when C2 was still in development. Regards, Vladimir On 1/21/16 3:47 AM, Lindenmaier, Goetz wrote: > Hi, > > adlc scans the match rule and derives USE/DEF effects from that. > Then it adds in the effects from the effect() declaration, which are > the same here. > > So that line is superfluous. > > (I once cleaned up the effects in the ppc > port and removed all these. And we are expand-power-users :) ) > > Best regards, > Goetz. > > >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- >> bounces at openjdk.java.net] On Behalf Of Andrew Dinn >> Sent: Donnerstag, 21. Januar 2016 11:30 >> To: Andrew Haley ; hotspot compiler > dev at openjdk.java.net> >> Subject: Re: Baffling USE in x86_64.ad >> >> On 21/01/16 10:22, Andrew Haley wrote: >>> In this pattern: >>> >>> instruct compI_rReg(rFlagsReg cr, rRegI op1, rRegI op2) >>> %{ >>> match(Set cr (CmpI op1 op2)); >>> effect(DEF cr, USE op1, USE op2); >>> >>> format %{ "cmpl $op1, $op2" %} >>> opcode(0x3B); /* Opcode 3B /r */ >>> ins_encode(REX_reg_reg(op1, op2), OpcP, reg_reg(op1, op2)); >>> ins_pipe(ialu_cr_reg_reg); >>> %} >>> >>> why does the USE appear in the effect? And the DEF? The operands >>> appear in the match expression in the normal way, so I would have >>> thought the effect expression unnecessary. It's this pattern: others >>> don't have the effect: >> >> I am not certain of this but I note that the above rule is used for >> direct matching and also used for expansion. I believe the expansion >> process requires an effect declaration even though that effect is also >> implied by the match rule. There are many instruction definitions which >> are only used for matching and they have an effect declaration but no >> match declaration. >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Senior Principal Software Engineer >> Red Hat UK Ltd >> Registered in UK and Wales under Company Registration No. 3798903 >> Directors: Michael Cunningham (US), Michael O'Neill (Ireland), Paul >> Argiry (US) From vladimir.kozlov at oracle.com Thu Jan 21 19:18:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Jan 2016 11:18:28 -0800 Subject: [9] RFR(S): 8065334: CodeHeap expansion fails although there is uncommitted memory In-Reply-To: <56A0B6F8.5040809@oracle.com> References: <56A0B6F8.5040809@oracle.com> Message-ID: <56A12F04.6080305@oracle.com> Good. Thanks, Vladimir On 1/21/16 2:46 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8065334 > http://cr.openjdk.java.net/~thartmann/8065334/webrev.00/ > > If ReservedCodeCacheSize (or the size of a single code heap) is not a multiple of CodeCacheExpansionSize, the last code heap expansion fails, leaving unused uncommitted memory. For example, see [1]. Both the profiled and the non-profiled segments are full but there is still 32Kb of uncommitted space that is never used (CodeCacheExpansionSize is 64Kb). > > CodeHeap::expand_by() should check for this condition and commit all the remaining space even if the requested expansion size is larger. Like this, we use all the available space [2]. > > Thanks, > Tobias > > > [1] Baseline: > CodeHeap 'non-nmethods': size=5696Kb used=2403Kb max_used=2433Kb free=3292Kb > bounds [0x00007f4d83b27000, 0x00007f4d83d97000, 0x00007f4d840b7000] > CodeHeap 'profiled nmethods': size=120032Kb used=119999Kb max_used=119999Kb free=32Kb > bounds [0x00007f4d840b7000, 0x00007f4d8b5e7000, 0x00007f4d8b5ef000] > CodeHeap 'non-profiled nmethods': size=120032Kb used=119999Kb max_used=119999Kb free=32Kb > bounds [0x00007f4d8b5ef000, 0x00007f4d92b1f000, 0x00007f4d92b27000] > total_blobs=248449 nmethods=52 adapters=570 > compilation: disabled (not enough contiguous free space left) > > [2] Fixed: > CodeHeap 'non-nmethods': size=5696Kb used=2404Kb max_used=2436Kb free=3291Kb > bounds [0x00007fe8cd000000, 0x00007fe8cd270000, 0x00007fe8cd590000] > CodeHeap 'profiled nmethods': size=120032Kb used=120031Kb max_used=120031Kb free=0Kb > bounds [0x00007fe8cd590000, 0x00007fe8d4ac8000, 0x00007fe8d4ac8000] > CodeHeap 'non-profiled nmethods': size=120032Kb used=120032Kb max_used=120032Kb free=0Kb > bounds [0x00007fe8d4ac8000, 0x00007fe8dc000000, 0x00007fe8dc000000] > total_blobs=10665 nmethods=127 adapters=570 > compilation: disabled (not enough contiguous free space left) > From vladimir.kozlov at oracle.com Thu Jan 21 19:28:54 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Jan 2016 11:28:54 -0800 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <56A0B398.4000408@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> Message-ID: <56A13176.804@oracle.com> Passing directives through ciEnv is fine. My question is about output in hs_err file. How those directives were selected in your example? I found it strange to see mixed flags values and oracle commands. "Enable:true Exclude:false" - which these correspond to, for example? Should we not print directives/flags which are not set explicitly? Thanks, Vladimir On 1/21/16 2:31 AM, Nils Eliasson wrote: > This is how it looks: > > [...] > > --------------- T H R E A D --------------- > > Current thread (0x00007f071046a000): JavaThread "C1 CompilerThread10" daemon [_thread_in_native, id=20033, stack(0x00007f05d7afb000,0x00007f05d7bfc000)] > > Current CompileTask: > C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) > > Current compiler directive: > inline: - > Enable:true Exclude:false BreakAtExecute:false BreakAtCompile:false Log:false PrintAssembly:false PrintInlining:false PrintNMethods:false ReplayInline:false DumpReplay:false DumpInline:false CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: BlockLayoutByFrequency:true PrintOptoAssembly:false PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false TraceSpilling:false Vectorize:false VectorizeDebug:false CloneMapDebug:false DoReserveCopyInSuperWordDebug:false IGVPrintLevel:0 MaxNodeLimit:80000 > > Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], sp=0x00007f05d7bfa5d0, free space=1021k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x182 > V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char const*, int, char const*, char const*, __va_list_tag*)+0x4a > V [libjvm.so+0x908cca] report_vm_error(char const*, int, char const*, char const*, ...)+0xea > V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 > V [libjvm.so+0x88ec5a] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a > V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 > V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 > V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 > V [libjvm.so+0x10189aa] java_start(Thread*)+0xca > C [libpthread.so.0+0x8182] start_thread+0xc2 > > [...] > > http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt > > Regards, > Nils > > On 2016-01-21 11:25, Nils Eliasson wrote: >> Hi, >> >> Please review this small change. The diff looks big but most of the change is just changing how the directive are >> passed to the compilers. Directives are set in the ciEnv and then passed to the compilers. The compilers can then >> choose to add it to any internal compilation object for convenience. The hs_err printing routine in vmError.cpp loads >> the directive from the ciEnv. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >> >> Regards, >> Nils > From christian.thalinger at oracle.com Thu Jan 21 23:03:47 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 21 Jan 2016 13:03:47 -1000 Subject: RFR: 8147599: [JVMCI] simplify code installation interface In-Reply-To: <569F8D19.4090305@oracle.com> References: <569F8D19.4090305@oracle.com> Message-ID: Looks good. > On Jan 20, 2016, at 3:35 AM, Roland Schatz wrote: > > Hi, > > Please review this change to the JVMCI code installation interface: > > webrev: http://cr.openjdk.java.net/~rschatz/JDK-8147599/webrev.00/ > jira: https://bugs.openjdk.java.net/browse/JDK-8147599 > > The new classes in the jdk.vm.ci.code.site package used to be inner classes in the removed CompilationResult class, no actual code changes there. > > Thanks, > Roland From goetz.lindenmaier at sap.com Fri Jan 22 07:41:42 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 22 Jan 2016 07:41:42 +0000 Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. In-Reply-To: References: <4295855A5C1DE049A61835A1887419CC41F1663A@DEWDFEMB12A.global.corp.sap> Message-ID: <4295855A5C1DE049A61835A1887419CC41F16A11@DEWDFEMB12A.global.corp.sap> Hi Thomas, I only want to do syntactic changes to our copyright message. I don?t want to change any content of them. So please let?s leave this to another change. Thanks, Goetz. From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] Sent: Thursday, January 21, 2016 4:48 PM To: Lindenmaier, Goetz Cc: hotspot compiler Subject: Re: RFR(M): 8147937: Adapt SAP copyrights to new company name. Hi Goetz, http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/src/os/aix/vm/libodm_aix.cpp.frames.html Please remove Oracle copyright, this is SAP only. http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/src/os/aix/vm/libodm_aix.hpp.frames.html ditto. Otherwise looks fine. ... Thomas On Thu, Jan 21, 2016 at 3:53 PM, Lindenmaier, Goetz > wrote: Hi, SAP changed its name from SAP AG to SAP SE. We were asked to adapt our copyright messages accordingly. This change fixes all SAP copyrights in hostpot to follow the patterns "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." Please review this change. I please need a sponsor. http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Fri Jan 22 08:48:59 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 22 Jan 2016 09:48:59 +0100 Subject: [9] RFR(S): 8065334: CodeHeap expansion fails although there is uncommitted memory In-Reply-To: <56A12F04.6080305@oracle.com> References: <56A0B6F8.5040809@oracle.com> <56A12F04.6080305@oracle.com> Message-ID: <56A1ECFB.9000205@oracle.com> Thanks, Vladimir. Best, Tobias On 21.01.2016 20:18, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 1/21/16 2:46 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8065334 >> http://cr.openjdk.java.net/~thartmann/8065334/webrev.00/ >> >> If ReservedCodeCacheSize (or the size of a single code heap) is not a multiple of CodeCacheExpansionSize, the last code heap expansion fails, leaving unused uncommitted memory. For example, see [1]. Both the profiled and the non-profiled segments are full but there is still 32Kb of uncommitted space that is never used (CodeCacheExpansionSize is 64Kb). >> >> CodeHeap::expand_by() should check for this condition and commit all the remaining space even if the requested expansion size is larger. Like this, we use all the available space [2]. >> >> Thanks, >> Tobias >> >> >> [1] Baseline: >> CodeHeap 'non-nmethods': size=5696Kb used=2403Kb max_used=2433Kb free=3292Kb >> bounds [0x00007f4d83b27000, 0x00007f4d83d97000, 0x00007f4d840b7000] >> CodeHeap 'profiled nmethods': size=120032Kb used=119999Kb max_used=119999Kb free=32Kb >> bounds [0x00007f4d840b7000, 0x00007f4d8b5e7000, 0x00007f4d8b5ef000] >> CodeHeap 'non-profiled nmethods': size=120032Kb used=119999Kb max_used=119999Kb free=32Kb >> bounds [0x00007f4d8b5ef000, 0x00007f4d92b1f000, 0x00007f4d92b27000] >> total_blobs=248449 nmethods=52 adapters=570 >> compilation: disabled (not enough contiguous free space left) >> >> [2] Fixed: >> CodeHeap 'non-nmethods': size=5696Kb used=2404Kb max_used=2436Kb free=3291Kb >> bounds [0x00007fe8cd000000, 0x00007fe8cd270000, 0x00007fe8cd590000] >> CodeHeap 'profiled nmethods': size=120032Kb used=120031Kb max_used=120031Kb free=0Kb >> bounds [0x00007fe8cd590000, 0x00007fe8d4ac8000, 0x00007fe8d4ac8000] >> CodeHeap 'non-profiled nmethods': size=120032Kb used=120032Kb max_used=120032Kb free=0Kb >> bounds [0x00007fe8d4ac8000, 0x00007fe8dc000000, 0x00007fe8dc000000] >> total_blobs=10665 nmethods=127 adapters=570 >> compilation: disabled (not enough contiguous free space left) >> From aph at redhat.com Fri Jan 22 09:43:38 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 22 Jan 2016 09:43:38 +0000 Subject: Baffling USE in x86_64.ad In-Reply-To: <56A1298B.8000804@oracle.com> References: <56A0B155.3090802@redhat.com> <56A0B30C.6080205@redhat.com> <4295855A5C1DE049A61835A1887419CC41F1651F@DEWDFEMB12A.global.corp.sap> <56A1298B.8000804@oracle.com> Message-ID: <56A1F9CA.8030609@redhat.com> On 21/01/16 18:55, Vladimir Kozlov wrote: > I agree that is not needed. We never cleaned up this before. I > looked on history and it from 2000 when C2 was still in development. Thanks for looking at that. One of the most difficult problems a programmer ever encounters is the mysterious line of code: "I don't know what this is for, but I'm scared to remove it in case it's being used for some odd side-effect. I think I'll leave it in." Andrew. From mikael.gerdin at oracle.com Fri Jan 22 09:46:32 2016 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Fri, 22 Jan 2016 10:46:32 +0100 Subject: RFR(M) 8147461: Use byte offsets for vtable start and vtable length offsets In-Reply-To: <56A04DCF.9090204@oracle.com> References: <569926B9.4070806@oracle.com> <569F7E22.3090905@oracle.com> <56A04DCF.9090204@oracle.com> Message-ID: <56A1FA78.3090608@oracle.com> Hi Chris, On 2016-01-21 04:17, Chris Plummer wrote: > Hi Mikael, > > The changes look good except I think you should get someone from the > compiler team to make sure the change in > HotSpotResolvedJavaMethodImpl.java and HotSpotVMConfig.java are ok. I'm > not sure why you chose to remove instanceKlassVtableStartOffset() rather > than just fix it. I'm cc:ing hotspot-compiler-dev and graal-dev to see if I can get someone to ok the JVMCI parts. The reason for removing the method is that the only reason for it being a method was to apply the wordSize scaling on the value and since I changed the offset to be a byte offset it does not need scaling and can be treated similar to the other constants in HotSpotVMConfig which are accessed without any accessor method. > > I think some of your changes may conflict with my changes for > JDK-8143608. Coleen is pushing JDK-8143608 for me once hs-rt opens up. > I'd appreciate it if you could wait until after then before doing your > push. Will do, would you mind pinging me when you've integrated 8143608? /Mikael > > thanks, > > Chris > > On 1/20/16 4:31 AM, Mikael Gerdin wrote: >> Hi again, >> >> I've rebased the on hs-rt and had to include some additional changes >> for JVMCI. >> I've also updated the copyright years. >> Unfortunately I can't generate an incremental webrev since i rebased >> the patch and there's no good way that I know of to make that work >> with webrev. >> >> New webrev at: http://cr.openjdk.java.net/~mgerdin/8147461/webrev.1/ >> >> Testing: JPRT again (which includes the JVMCI jtreg tests) >> >> /Mikael >> >> On 2016-01-15 18:04, Mikael Gerdin wrote: >>> Hi all, >>> >>> As per the previous discussion in mid-December[0] about moving the >>> _vtable_length field to class Klass, here's the first RFR and webrev, >>> according to my suggested plan[1]: >>> >>>> My current plan is to first modify the vtable_length_offset accessor to >>>> return a byte offset (which is what it's translated to by all callers). >>>> >>>> Then I'll tackle moving the _vtable_len field to Klass. >>>> >>>> Finally I'll try to consolidate the vtable related methods to Klass, >>>> where they belong. >>> >>> This change actually consists of three changes: >>> * modifying InstanceKlass::vtable_length_offset to become a byte offset >>> and use the ByteSize type to communicate the scaling. >>> * modifying InstanceKlass::vtable_start_offset to become a byte offset >>> and use the ByteSize type, for symmetry reasons mainly. >>> * adding a vtableEntry::size_in_bytes() since in many places the vtable >>> entry size is used in combination with the vtable start to compute a >>> byte offset for vtable lookups. >>> >>> I don't foresee any issues with the fact that the byte offset is >>> represented as an int, for two reasons: >>> 1) If the offset of any of these grows to over 2 gigabytes then we have >>> a huge footprint problem with InstanceKlass >>> 2) The offsets are converted to byte offsets and stored in ints already >>> in the cpu specific code I've modified. >>> >>> Bug link: https://bugs.openjdk.java.net/browse/JDK-8147461 >>> Webrev: http://cr.openjdk.java.net/~mgerdin/8147461/webrev.0/ >>> >>> Testing: JPRT on Oracle supported platforms, testing on AARCH64 and >>> PPC64 would be much appreciated, appropriate mailing lists have been >>> CC:ed to notify them of the request. >>> >>> >>> [0] >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-December/021152.html >>> >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-December/021224.html >>> >>> >>> >>> Thanks! >>> /Mikael >> > From nils.eliasson at oracle.com Fri Jan 22 13:38:31 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 22 Jan 2016 14:38:31 +0100 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <56A13176.804@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> <56A13176.804@oracle.com> Message-ID: <56A230D7.9060606@oracle.com> Hi, Vladimir On 2016-01-21 20:28, Vladimir Kozlov wrote: > Passing directives through ciEnv is fine. > My question is about output in hs_err file. How those directives were > selected in your example? It only prints the directive that is used for the current compile task (that caused the crash). (Thats why I put them together in the hs_err file) > I found it strange to see mixed flags values and oracle commands. > "Enable:true Exclude:false" - which these correspond to, for example? These are all options from the directive - and they are set with directives (highest priority), compilecommmand or vmflags (lowest priority). > > Should we not print directives/flags which are not set explicitly? I updated the print output to mark all options in the directive that are not default with a '*'. That makes it quicker to see if any special options was applied. It will also print if the directive is the unmodified default directive. Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/ Example output: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt Regards, Nils > > Thanks, > Vladimir > > On 1/21/16 2:31 AM, Nils Eliasson wrote: >> This is how it looks: >> >> [...] >> >> --------------- T H R E A D --------------- >> >> Current thread (0x00007f071046a000): JavaThread "C1 >> CompilerThread10" daemon [_thread_in_native, id=20033, >> stack(0x00007f05d7afb000,0x00007f05d7bfc000)] >> >> Current CompileTask: >> C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) >> >> Current compiler directive: >> inline: - >> Enable:true Exclude:false BreakAtExecute:false >> BreakAtCompile:false Log:false PrintAssembly:false >> PrintInlining:false PrintNMethods:false ReplayInline:false >> DumpReplay:false DumpInline:false >> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: >> BlockLayoutByFrequency:true PrintOptoAssembly:false >> PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false >> TraceSpilling:false Vectorize:false VectorizeDebug:false >> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false >> IGVPrintLevel:0 MaxNodeLimit:80000 >> >> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], >> sp=0x00007f05d7bfa5d0, free space=1021k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >> C=native code) >> V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, >> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, >> char const*, int, unsigned long)+0x182 >> V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char >> const*, int, char const*, char const*, __va_list_tag*)+0x4a >> V [libjvm.so+0x908cca] report_vm_error(char const*, int, char >> const*, char const*, ...)+0xea >> V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, >> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 >> V [libjvm.so+0x88ec5a] >> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a >> V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 >> V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 >> V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 >> V [libjvm.so+0x10189aa] java_start(Thread*)+0xca >> C [libpthread.so.0+0x8182] start_thread+0xc2 >> >> [...] >> >> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt >> >> Regards, >> Nils >> >> On 2016-01-21 11:25, Nils Eliasson wrote: >>> Hi, >>> >>> Please review this small change. The diff looks big but most of the >>> change is just changing how the directive are >>> passed to the compilers. Directives are set in the ciEnv and then >>> passed to the compilers. The compilers can then >>> choose to add it to any internal compilation object for convenience. >>> The hs_err printing routine in vmError.cpp loads >>> the directive from the ciEnv. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >>> >>> Regards, >>> Nils >> From nils.eliasson at oracle.com Fri Jan 22 14:40:33 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 22 Jan 2016 15:40:33 +0100 Subject: RFR(S): 8063112: Compiler diagnostic commands should have locking instead of safepoint Message-ID: <56A23F61.9000201@oracle.com> Hi, Please review. Summary: Firstly this change removes the unnecessary vm-ops from three compiler diagnostic commands and adds locking instead. Secondly the Compiler.queue diagnostic command is improved with printing of any active compilations. I found this useful when diagnosing a rouge VM. Thirdly, as a bonus, I also add printing of active compilations in the thread section of the hs_err file. Very useful when investigating VMs terminated by a timeout. Testing: This does not pass all tests yet. A few tests is dependent on the output from the diagnostic command, and I want to be sure the reviewers are happy with the output format first. Bug: https://bugs.openjdk.java.net/browse/JDK-8063112 Webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.02/ Regards, Nils From rahul.v.raghavan at oracle.com Fri Jan 22 16:11:51 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 22 Jan 2016 08:11:51 -0800 (PST) Subject: FW: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler Message-ID: > -----Original Message----- > From: Tobias Hartmann > Sent: Monday, January 11, 2016 2:56 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net > > Hi Rahul, > > > http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ > > Why don't you use 'markOopDesc::hash_mask_in_place' for the 64 bit version? This should safe some instructions and you also don't > need the 'hash' register if you compute everything in 'result'. Thank you for your comments Tobias. I could not get the implementation work with the usage of 'markOopDesc::hash_mask_in_place' in x86_64 (similar to support in x86_32). Usage of - __ andptr(result, markOopDesc::hash_mask_in_place); Results in build error - ' overflow in implicit constant conversion' Then understood from 'sharedRuntime_sparc.cpp', 'markOop.hpp' - that the usage of 'hash_mask_in_place' should be avoided for 64-bit because the values are too big! Similar comments in LibraryCallKit::inline_native_hashcode [hotspot/src/share/vm/opto/library_call.cpp] also. Could not find some other way to use hash_mask_in_place here for x86_64? So depending on markOopDesc::hash_mask, markOopDesc::hash_shift value instead (similar to done in sharedRuntime_sparc) Added missing comment regarding above in the revised webrev. Also yes I missed the optimized codegen. Tried revised patch removing usages of extra 'hash', 'mask' registers and computed all in 'result' itself. [sharedRuntime_x86_64.cpp] .................... + Register obj_reg = j_rarg0; + Register result = rax; ........ + // get hash + // Read the header and build a mask to get its hash field. + // Depend on hash_mask being at most 32 bits and avoid the use of hash_mask_in_place + // because it could be larger than 32 bits in a 64-bit vm. See markOop.hpp. + __ shrptr(result, markOopDesc::hash_shift); + __ andptr(result, markOopDesc::hash_mask); + // test if hashCode exists + __ jcc (Assembler::zero, slowCase); + __ ret(0); + __ bind (slowCase); ........ Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. Please send your comments. I can submit revised webrev if all okay. > > Best, > Tobias > > > On 08.01.2016 18:13, Rahul Raghavan wrote: > > Hello, > > > > Please review the following revised patch for JDK-6378256 - > > http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ > > > > This revised webrev got following changes - > > > > 1) A minor, better optimized code with return 0 at initial stage (instead of continuing to 'slowCase' path), for special/rare null > reference input! > > (as per documentation, test results confirmed it is safe to 'return 0' for null reference input, for System.identityHashCode) > > > > 2) Added similar Object.hashCode, System.identityHashCode optimization support in sharedRuntime_x86_64.cpp. > > > > Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. > > > > Thanks, > > Rahul > > > > > >> -----Original Message----- > >> From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan> Cc: hotspot-compiler- > dev at openjdk.java.net > >> > >>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > >> > >> Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again would > be > >> nice. > >> Shouldn't we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? > >> > >> Roland. > > > > > >> -----Original Message----- > >> From: Rahul Raghavan > Sent: Wednesday, December 09, 2015 2:43 PM > To: hotspot-compiler-dev at openjdk.java.net > >> > >> Hello, > >> > >> Please review the following patch for JDK-6378256. > >> > >> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > >> > >> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . > >> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times > >> slower). > >> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). > >> > >> sample unit test: > >> public class Jdk6378256Test > >> { > >> public static void main(String[] args) > >> { > >> Object obj = new Object(); > >> long time = System.nanoTime(); > >> for(int i = 0 ; i < 1000000 ; i++) > >> System.identityHashCode(obj); //compare to obj.hashCode(); > >> System.out.println ("Result = " + (System.nanoTime() - time)); > >> } > >> } > >> > >> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. > >> (looks in the header for the hashCode before calling into the VM). > >> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. > >> So also added required additional null check for System.identityHashCode case. > >> > >> Testing: > >> - successful JPRT run (-testset hotspot). > >> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). > >> (with -client / -XX:TieredStopAtLevel=1 etc. options). > >> - Added 'noreg-perf' label for this performance bug. > >> Manual testing done and confirmed expected performance values for unit tests with fix. > >> > >> Thanks, > >> Rahul From roland.westrelin at oracle.com Fri Jan 22 16:38:45 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 22 Jan 2016 17:38:45 +0100 Subject: RFR(XS): 8147853: "assert(t->meet(t0) == t) failed: Not monotonic" with sun/util/calendar/zi/TestZoneInfo310.java Message-ID: During CCP, a Phi for the induction variable of a CountedLoop is processed repeatedly while the type of the backedge control is top so only the loop entry input is considered for computing the Phi?s type. The loop entry first has type int:1..3 so the Phi?s type is int:1..3 then it has type int:1..4 so the Phi?s type is int:1..4 then it has type int:1..5:www so the Phi?s type is int:1..5:www then it has type int:1..6:www so the Phi?s type is saturated to int:1..max-1:www The backedge control?s type is changed to non-top and the type of the Phi is recomputed. This time the special code for counted loop in PhiNode::Value(): CountedLoopNode* l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL; if (l && l->can_be_counted_loop(phase) && ((const Node*)l->phi() == this)) { // Trip counted loop! // protect against init_trip() or limit() returning NULL const Node *init = l->init_trip(); const Node *limit = l->limit(); const Node* stride = l->stride(); if (init != NULL && limit != NULL && stride != NULL) { const TypeInt* lo = phase->type(init)->isa_int(); const TypeInt* hi = phase->type(limit)->isa_int(); const TypeInt* stride_t = phase->type(stride)->isa_int(); if (lo != NULL && hi != NULL && stride_t != NULL) { // Dying loops might have TOP here assert(stride_t->_hi >= stride_t->_lo, "bad stride type"); const Type* res = NULL; if (stride_t->_hi < 0) { // Down-counter loop swap(lo, hi); return TypeInt::make(MIN2(lo->_lo, hi->_lo) , hi->_hi, 3); } else if (stride_t->_lo >= 0) { return TypeInt::make(lo->_lo, MAX2(lo->_hi, hi->_hi), 3); } } } } kicks in and it computes a type of: int:1..8:www. The type of the Phi was narrowed and the assert fires. I suggest we fix this by saturating the type of the Phi only once the type of the loop?s backedge is non top. This way, the special code for counted loop above has a chance to run and that should be enough to keep the types during CCP monotonic. http://cr.openjdk.java.net/~roland/8147853/webrev.00/ Roland. From christian.thalinger at oracle.com Fri Jan 22 17:18:04 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 22 Jan 2016 07:18:04 -1000 Subject: RFR(S): 8146244: compiler/jvmci/code/DataPatchTest.java crashes: SIGSEGV in (getConstClass)getConstClass In-Reply-To: <56A0FC29.7040506@oracle.com> References: <56A0FC29.7040506@oracle.com> Message-ID: <0C33C507-194F-4867-B282-0DBB16FE4A0A@oracle.com> Looks good. > On Jan 21, 2016, at 5:41 AM, Roland Schatz wrote: > > Hi, > > Please review this small bugfix: > webrev: http://cr.openjdk.java.net/~rschatz/JDK-8146244/webrev.00/ > issue: https://bugs.openjdk.java.net/browse/JDK-8146244 > > Thanks, > Roland From vladimir.kozlov at oracle.com Fri Jan 22 18:36:30 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Jan 2016 10:36:30 -0800 Subject: RFR(XS): 8147853: "assert(t->meet(t0) == t) failed: Not monotonic" with sun/util/calendar/zi/TestZoneInfo310.java In-Reply-To: References: Message-ID: <56A276AE.8060408@oracle.com> Good fix. Can we simple return type of in(EntryControl) phi's input in such case (backedge is top) without filter_speculative() and verification code under assert which is useless in this case, I think. We already have check for the counted loop, we only need to separate can_be_counted_loop() condition. Thanks, Vladimir On 1/22/16 8:38 AM, Roland Westrelin wrote: > During CCP, a Phi for the induction variable of a CountedLoop is processed repeatedly while the type of the backedge control is top so only the loop entry input is considered for computing the Phi?s type. > > The loop entry first has type int:1..3 so the Phi?s type is int:1..3 > then it has type int:1..4 so the Phi?s type is int:1..4 > then it has type int:1..5:www so the Phi?s type is int:1..5:www > then it has type int:1..6:www so the Phi?s type is saturated to int:1..max-1:www > > The backedge control?s type is changed to non-top and the type of the Phi is recomputed. This time the special code for counted loop in PhiNode::Value(): > > CountedLoopNode* l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL; > if (l && l->can_be_counted_loop(phase) && > ((const Node*)l->phi() == this)) { // Trip counted loop! > // protect against init_trip() or limit() returning NULL > const Node *init = l->init_trip(); > const Node *limit = l->limit(); > const Node* stride = l->stride(); > if (init != NULL && limit != NULL && stride != NULL) { > const TypeInt* lo = phase->type(init)->isa_int(); > const TypeInt* hi = phase->type(limit)->isa_int(); > const TypeInt* stride_t = phase->type(stride)->isa_int(); > if (lo != NULL && hi != NULL && stride_t != NULL) { // Dying loops might have TOP here > assert(stride_t->_hi >= stride_t->_lo, "bad stride type"); > const Type* res = NULL; > if (stride_t->_hi < 0) { // Down-counter loop > swap(lo, hi); > return TypeInt::make(MIN2(lo->_lo, hi->_lo) , hi->_hi, 3); > } else if (stride_t->_lo >= 0) { > return TypeInt::make(lo->_lo, MAX2(lo->_hi, hi->_hi), 3); > } > } > } > } > > > kicks in and it computes a type of: int:1..8:www. The type of the Phi was narrowed and the assert fires. > > I suggest we fix this by saturating the type of the Phi only once the type of the loop?s backedge is non top. This way, the special code for counted loop above has a chance to run and that should be enough to keep the types during CCP monotonic. > > http://cr.openjdk.java.net/~roland/8147853/webrev.00/ > > Roland. > From vladimir.kozlov at oracle.com Fri Jan 22 18:56:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Jan 2016 10:56:21 -0800 Subject: RFR(S): 8138756: Compiler Control: Print directives in hs_err In-Reply-To: <56A230D7.9060606@oracle.com> References: <56A0B237.9090008@oracle.com> <56A0B398.4000408@oracle.com> <56A13176.804@oracle.com> <56A230D7.9060606@oracle.com> Message-ID: <56A27B55.6050502@oracle.com> "no inline - compile commands may apply" is confusing to me (and for others who not familiar with directives). What does it mean? :) Does it mean no 'inline' directives were used or opposite: -XX:-Inline flag was specified (or corresponding directive). If it is switch off inlining then I think it should be "don't inline". So what "compile commands may apply" means? > I updated the print output to mark all options in the directive that are > not default with a '*'. That makes it quicker to see if any special Yes, it is better but I still did not get this. I see that command line has PrintInlining command and it is in the list: *PrintInlining:true. But I don't see PrintCompilation on the list but it is specified on command line. On other hand PrintIntrinsics:false is there. > It only prints the directive that is used for the current compile task > (that caused the crash). (Thats why I put them together in the hs_err file) What do you mean "is used"? "Print *which* directive (and options) were in use if compiler crash. Print *if* directives were used at some point if other crash?" Should we replace "in use"/"were used" with "were set"? Thanks, Vladimir On 1/22/16 5:38 AM, Nils Eliasson wrote: > Hi, Vladimir > > On 2016-01-21 20:28, Vladimir Kozlov wrote: >> Passing directives through ciEnv is fine. >> My question is about output in hs_err file. How those directives were >> selected in your example? > > It only prints the directive that is used for the current compile task > (that caused the crash). (Thats why I put them together in the hs_err file) > >> I found it strange to see mixed flags values and oracle commands. >> "Enable:true Exclude:false" - which these correspond to, for example? > > These are all options from the directive - and they are set with > directives (highest priority), compilecommmand or vmflags (lowest > priority). > >> >> Should we not print directives/flags which are not set explicitly? > > I updated the print output to mark all options in the directive that are > not default with a '*'. That makes it quicker to see if any special > options was applied. It will also print if the directive is the > unmodified default directive. > > Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/ > Example output: > http://cr.openjdk.java.net/~neliasso/8138756/webrev.03/hserr.txt > > Regards, > Nils > >> >> Thanks, >> Vladimir >> >> On 1/21/16 2:31 AM, Nils Eliasson wrote: >>> This is how it looks: >>> >>> [...] >>> >>> --------------- T H R E A D --------------- >>> >>> Current thread (0x00007f071046a000): JavaThread "C1 >>> CompilerThread10" daemon [_thread_in_native, id=20033, >>> stack(0x00007f05d7afb000,0x00007f05d7bfc000)] >>> >>> Current CompileTask: >>> C1: 225 1 3 java.lang.String::isLatin1 (19 bytes) >>> >>> Current compiler directive: >>> inline: - >>> Enable:true Exclude:false BreakAtExecute:false >>> BreakAtCompile:false Log:false PrintAssembly:false >>> PrintInlining:false PrintNMethods:false ReplayInline:false >>> DumpReplay:false DumpInline:false >>> CompilerDirectivesIgnoreCompileCommands:false DisableIntrinsic: >>> BlockLayoutByFrequency:true PrintOptoAssembly:false >>> PrintIntrinsics:false TraceOptoPipelining:false TraceOptoOutput:false >>> TraceSpilling:false Vectorize:false VectorizeDebug:false >>> CloneMapDebug:false DoReserveCopyInSuperWordDebug:false >>> IGVPrintLevel:0 MaxNodeLimit:80000 >>> >>> Stack: [0x00007f05d7afb000,0x00007f05d7bfc000], >>> sp=0x00007f05d7bfa5d0, free space=1021k >>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>> C=native code) >>> V [libjvm.so+0x12e7532] VMError::report_and_die(int, char const*, >>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, >>> char const*, int, unsigned long)+0x182 >>> V [libjvm.so+0x12e829a] VMError::report_and_die(Thread*, char >>> const*, int, char const*, char const*, __va_list_tag*)+0x4a >>> V [libjvm.so+0x908cca] report_vm_error(char const*, int, char >>> const*, char const*, ...)+0xea >>> V [libjvm.so+0x88df81] CompileBroker::post_compile(CompilerThread*, >>> CompileTask*, EventCompilation&, bool, ciEnv*)+0x1b1 >>> V [libjvm.so+0x88ec5a] >>> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x90a >>> V [libjvm.so+0x88f960] CompileBroker::compiler_thread_loop()+0x540 >>> V [libjvm.so+0x1264789] JavaThread::thread_main_inner()+0x1c9 >>> V [libjvm.so+0x1264ac6] JavaThread::run()+0x2a6 >>> V [libjvm.so+0x10189aa] java_start(Thread*)+0xca >>> C [libpthread.so.0+0x8182] start_thread+0xc2 >>> >>> [...] >>> >>> http://cr.openjdk.java.net/~neliasso/8138756/hserr.txt >>> >>> Regards, >>> Nils >>> >>> On 2016-01-21 11:25, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review this small change. The diff looks big but most of the >>>> change is just changing how the directive are >>>> passed to the compilers. Directives are set in the ciEnv and then >>>> passed to the compilers. The compilers can then >>>> choose to add it to any internal compilation object for convenience. >>>> The hs_err printing routine in vmError.cpp loads >>>> the directive from the ciEnv. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8138756 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8138756/webrev.01/ >>>> >>>> Regards, >>>> Nils >>> > From tom.rodriguez at oracle.com Fri Jan 22 19:17:25 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 22 Jan 2016 11:17:25 -0800 Subject: RFR(S): 8146424: runtime/ReservedStack/ReservedStackTest.java triggers: assert(thread->deopt_mark() == __null) failed: no stack overflow from deopt blob/uncommon trap Message-ID: <593E3AC7-E839-4969-ACDB-74B934DC3F14@oracle.com> http://cr.openjdk.java.net/~never/8146424/webrev/index.html JVMCI needs to provide access to Interpreter::size_activation so that JVMCI compilers can properly bang stacks based on their deoptimization requires. This simply adds a new entry point the compiler can use to compute the required size. It also exposes HotSpotVMConfig.vm_page_size instead of requiring the compiler to rely on Unsafe.pageSize which has unspecified relationship to that value. Tested with Graal and the jtreg stack banging tests. I was unable to reproduce the exact reported failure locally though I confirmed that more stack banging was being done in the required places. tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Jan 22 19:23:47 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 22 Jan 2016 11:23:47 -0800 Subject: RFR(S): 8063112: Compiler diagnostic commands should have locking instead of safepoint In-Reply-To: <56A23F61.9000201@oracle.com> References: <56A23F61.9000201@oracle.com> Message-ID: <56A281C3.6010408@oracle.com> Why you need new print method? Why you can't use existing print()? Also I prefer to get current compilation tasks print in separate lines - not in the list of threads. Then you don't need to use new print? I am worry about using locks for printing because print code also has locks. Do we really have to have locks here? The output for these directives is local bufferedStream. As I understand it is separate for each directive. So why you need lock? Or VM operation as before? Thanks, Vladimir On 1/22/16 6:40 AM, Nils Eliasson wrote: > Hi, > > Please review. > > Summary: > Firstly this change removes the unnecessary vm-ops from three compiler > diagnostic commands and adds locking instead. > Secondly the Compiler.queue diagnostic command is improved with printing > of any active compilations. I found this useful when diagnosing a rouge VM. > Thirdly, as a bonus, I also add printing of active compilations in the > thread section of the hs_err file. Very useful when investigating VMs > terminated by a timeout. > > Testing: > This does not pass all tests yet. A few tests is dependent on the output > from the diagnostic command, and I want to be sure the reviewers are > happy with the output format first. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8063112 > Webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.02/ > > Regards, > Nils > From tom.rodriguez at oracle.com Fri Jan 22 22:23:05 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 22 Jan 2016 14:23:05 -0800 Subject: RFR(S): 8146424: runtime/ReservedStack/ReservedStackTest.java triggers: assert(thread->deopt_mark() == __null) failed: no stack overflow from deopt blob/uncommon trap In-Reply-To: <593E3AC7-E839-4969-ACDB-74B934DC3F14@oracle.com> References: <593E3AC7-E839-4969-ACDB-74B934DC3F14@oracle.com> Message-ID: <08AB95F3-5D22-4053-A870-5A6A9E59D5BD@oracle.com> I added a regression test and generated a new webrev http://cr.openjdk.java.net/~never/8146424.01/webrev/index.html tom > On Jan 22, 2016, at 11:17 AM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8146424/webrev/index.html > > JVMCI needs to provide access to Interpreter::size_activation so that JVMCI compilers can properly bang stacks based on their deoptimization requires. This simply adds a new entry point the compiler can use to compute the required size. It also exposes HotSpotVMConfig.vm_page_size instead of requiring the compiler to rely on Unsafe.pageSize which has unspecified relationship to that value. > > Tested with Graal and the jtreg stack banging tests. I was unable to reproduce the exact reported failure locally though I confirmed that more stack banging was being done in the required places. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.rodriguez at oracle.com Fri Jan 22 22:25:28 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 22 Jan 2016 14:25:28 -0800 Subject: RFR(S): 8148101: [JVMCI] Make CallingConvention.Type extensible Message-ID: <165DD75E-8A1E-4C0D-991E-302964E2C0EF@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8148101 CallingConvention.Type currently fixes the set of types for all possible backend. It's should be abstracted so that it can be more easily extended. The unused stackOnly parameter was removed at the same time. tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Jan 22 22:41:06 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 22 Jan 2016 12:41:06 -1000 Subject: RFR(S): 8146424: runtime/ReservedStack/ReservedStackTest.java triggers: assert(thread->deopt_mark() == __null) failed: no stack overflow from deopt blob/uncommon trap In-Reply-To: <08AB95F3-5D22-4053-A870-5A6A9E59D5BD@oracle.com> References: <593E3AC7-E839-4969-ACDB-74B934DC3F14@oracle.com> <08AB95F3-5D22-4053-A870-5A6A9E59D5BD@oracle.com> Message-ID: Looks good. > On Jan 22, 2016, at 12:23 PM, Tom Rodriguez wrote: > > I added a regression test and generated a new webrev http://cr.openjdk.java.net/~never/8146424.01/webrev/index.html > > tom > >> On Jan 22, 2016, at 11:17 AM, Tom Rodriguez > wrote: >> >> http://cr.openjdk.java.net/~never/8146424/webrev/index.html >> >> JVMCI needs to provide access to Interpreter::size_activation so that JVMCI compilers can properly bang stacks based on their deoptimization requires. This simply adds a new entry point the compiler can use to compute the required size. It also exposes HotSpotVMConfig.vm_page_size instead of requiring the compiler to rely on Unsafe.pageSize which has unspecified relationship to that value. >> >> Tested with Graal and the jtreg stack banging tests. I was unable to reproduce the exact reported failure locally though I confirmed that more stack banging was being done in the required places. >> >> tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Jan 22 23:00:52 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 22 Jan 2016 13:00:52 -1000 Subject: RFR(S): 8148101: [JVMCI] Make CallingConvention.Type extensible In-Reply-To: <165DD75E-8A1E-4C0D-991E-302964E2C0EF@oracle.com> References: <165DD75E-8A1E-4C0D-991E-302964E2C0EF@oracle.com> Message-ID: <31EB408D-FAF7-44CF-BBF3-45CA09960FED@oracle.com> http://cr.openjdk.java.net/~never/8148101/webrev/ Looks good. > On Jan 22, 2016, at 12:25 PM, Tom Rodriguez wrote: > > https://bugs.openjdk.java.net/browse/JDK-8148101 > > > CallingConvention.Type currently fixes the set of types for all possible backend. It's should be abstracted so that it can be more easily extended. The unused stackOnly parameter was removed at the same time. > > tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Mon Jan 25 07:09:48 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 25 Jan 2016 08:09:48 +0100 Subject: FW: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: <56A5CA3C.6070401@oracle.com> Hi Rahul, On 22.01.2016 17:11, Rahul Raghavan wrote: > >> -----Original Message----- >> From: Tobias Hartmann > Sent: Monday, January 11, 2016 2:56 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net >> >> Hi Rahul, >> >>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ >> >> Why don't you use 'markOopDesc::hash_mask_in_place' for the 64 bit version? This should safe some instructions and you also don't >> need the 'hash' register if you compute everything in 'result'. > > Thank you for your comments Tobias. > > I could not get the implementation work with the usage of 'markOopDesc::hash_mask_in_place' in x86_64 (similar to support in x86_32). > Usage of - __ andptr(result, markOopDesc::hash_mask_in_place); > Results in build error - ' overflow in implicit constant conversion' > > Then understood from 'sharedRuntime_sparc.cpp', 'markOop.hpp' - that the usage of 'hash_mask_in_place' should be avoided for 64-bit because the values are too big! > Similar comments in LibraryCallKit::inline_native_hashcode [hotspot/src/share/vm/opto/library_call.cpp] also. > Could not find some other way to use hash_mask_in_place here for x86_64? You are right, I missed that. > So depending on markOopDesc::hash_mask, markOopDesc::hash_shift value instead (similar to done in sharedRuntime_sparc) > Added missing comment regarding above in the revised webrev. > > Also yes I missed the optimized codegen. > Tried revised patch removing usages of extra 'hash', 'mask' registers and computed all in 'result' itself. > > [sharedRuntime_x86_64.cpp] > .................... > + Register obj_reg = j_rarg0; > + Register result = rax; > ........ > + // get hash > + // Read the header and build a mask to get its hash field. > + // Depend on hash_mask being at most 32 bits and avoid the use of hash_mask_in_place > + // because it could be larger than 32 bits in a 64-bit vm. See markOop.hpp. > + __ shrptr(result, markOopDesc::hash_shift); > + __ andptr(result, markOopDesc::hash_mask); > + // test if hashCode exists > + __ jcc (Assembler::zero, slowCase); > + __ ret(0); > + __ bind (slowCase); > ........ > > Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. > > Please send your comments. I can submit revised webrev if all okay. Looks good. Please send a new webrev. Best, Tobias > >> >> Best, >> Tobias >> >> >> On 08.01.2016 18:13, Rahul Raghavan wrote: >>> Hello, >>> >>> Please review the following revised patch for JDK-6378256 - >>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ >>> >>> This revised webrev got following changes - >>> >>> 1) A minor, better optimized code with return 0 at initial stage (instead of continuing to 'slowCase' path), for special/rare null >> reference input! >>> (as per documentation, test results confirmed it is safe to 'return 0' for null reference input, for System.identityHashCode) >>> >>> 2) Added similar Object.hashCode, System.identityHashCode optimization support in sharedRuntime_x86_64.cpp. >>> >>> Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. >>> >>> Thanks, >>> Rahul >>> >>> >>>> -----Original Message----- >>>> From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan> Cc: hotspot-compiler- >> dev at openjdk.java.net >>>> >>>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >>>> >>>> Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again would >> be >>>> nice. >>>> Shouldn't we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? >>>> >>>> Roland. >>> >>> >>>> -----Original Message----- >>>> From: Rahul Raghavan > Sent: Wednesday, December 09, 2015 2:43 PM > To: hotspot-compiler-dev at openjdk.java.net >>>> >>>> Hello, >>>> >>>> Please review the following patch for JDK-6378256. >>>> >>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . >>>> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times >>>> slower). >>>> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). >>>> >>>> sample unit test: >>>> public class Jdk6378256Test >>>> { >>>> public static void main(String[] args) >>>> { >>>> Object obj = new Object(); >>>> long time = System.nanoTime(); >>>> for(int i = 0 ; i < 1000000 ; i++) >>>> System.identityHashCode(obj); //compare to obj.hashCode(); >>>> System.out.println ("Result = " + (System.nanoTime() - time)); >>>> } >>>> } >>>> >>>> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. >>>> (looks in the header for the hashCode before calling into the VM). >>>> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. >>>> So also added required additional null check for System.identityHashCode case. >>>> >>>> Testing: >>>> - successful JPRT run (-testset hotspot). >>>> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). >>>> (with -client / -XX:TieredStopAtLevel=1 etc. options). >>>> - Added 'noreg-perf' label for this performance bug. >>>> Manual testing done and confirmed expected performance values for unit tests with fix. >>>> >>>> Thanks, >>>> Rahul From tobias.hartmann at oracle.com Mon Jan 25 09:48:28 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 25 Jan 2016 10:48:28 +0100 Subject: [9] RFR(XS): 8147876: ciTypeFlow::is_dominated_by() writes outside dominated array Message-ID: <56A5EF6C.4090603@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8147876 http://cr.openjdk.java.net/~thartmann/8147876/webrev.00/ ciTypeFlow::is_dominated_by() write outside the 'dominated' array because it's size is too small. The problem is that the number of ciBlocks is not equal to the Blocks used by ciTypeFlow (there is a 1:n relation). Therefore, we should use block_count() instead of _methodBlocks->num_blocks(). Thanks, Tobias From thomas.stuefe at gmail.com Mon Jan 25 11:02:53 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 25 Jan 2016 12:02:53 +0100 Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. In-Reply-To: <4295855A5C1DE049A61835A1887419CC41F16A11@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC41F1663A@DEWDFEMB12A.global.corp.sap> <4295855A5C1DE049A61835A1887419CC41F16A11@DEWDFEMB12A.global.corp.sap> Message-ID: Ok Goetz! Looks all fine to me. On Fri, Jan 22, 2016 at 8:41 AM, Lindenmaier, Goetz < goetz.lindenmaier at sap.com> wrote: > Hi Thomas, > > > > I only want to do syntactic changes to our copyright message. I don?t > > want to change any content of them. > > > > So please let?s leave this to another change. > > > > Thanks, > > Goetz. > > > > *From:* Thomas St?fe [mailto:thomas.stuefe at gmail.com] > *Sent:* Thursday, January 21, 2016 4:48 PM > *To:* Lindenmaier, Goetz > *Cc:* hotspot compiler > *Subject:* Re: RFR(M): 8147937: Adapt SAP copyrights to new company name. > > > > Hi Goetz, > > > > > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/src/os/aix/vm/libodm_aix.cpp.frames.html > > > > Please remove Oracle copyright, this is SAP only. > > > > > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/src/os/aix/vm/libodm_aix.hpp.frames.html > > > > ditto. > > > > Otherwise looks fine. > > > > ... > > Thomas > > > > On Thu, Jan 21, 2016 at 3:53 PM, Lindenmaier, Goetz < > goetz.lindenmaier at sap.com> wrote: > > Hi, > > > > SAP changed its name from SAP AG to SAP SE. We were asked to > > adapt our copyright messages accordingly. > > > > This change fixes all SAP copyrights in hostpot to follow the patterns > > "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or > > "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All > rights reserved." > > > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 > > > > Best regards, > > Goetz. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Mon Jan 25 11:26:34 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 25 Jan 2016 11:26:34 +0000 Subject: sponsor?: RFR(M): 8147937: Adapt SAP copyrights to new company name. Message-ID: <4295855A5C1DE049A61835A1887419CC41F17026@DEWDFEMB12A.global.corp.sap> Hi, could somebody please sponsor this change? I updated the webrev to apply to the latest hs-comp repo. http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/ Thanks! Goetz. > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Donnerstag, 21. Januar 2016 15:53 > To: hotspot compiler > Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. > > Hi, > > > > SAP changed its name from SAP AG to SAP SE. We were asked to > > adapt our copyright messages accordingly. > > > > This change fixes all SAP copyrights in hostpot to follow the patterns > > "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or > > "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All rights > reserved." > > > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 > > > > Best regards, > > Goetz. > > From zoltan.majo at oracle.com Mon Jan 25 11:40:16 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 25 Jan 2016 12:40:16 +0100 Subject: sponsor?: RFR(M): 8147937: Adapt SAP copyrights to new company name. In-Reply-To: <4295855A5C1DE049A61835A1887419CC41F17026@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC41F17026@DEWDFEMB12A.global.corp.sap> Message-ID: <56A609A0.9060109@oracle.com> Hi Goetz, On 01/25/2016 12:26 PM, Lindenmaier, Goetz wrote: > Hi, > > could somebody please sponsor this change? I'll sponsor the change. Thank you and best regards, Zoltan > I updated the webrev to apply to the latest hs-comp repo. > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/ > > Thanks! > Goetz. > >> -----Original Message----- >> From: Lindenmaier, Goetz >> Sent: Donnerstag, 21. Januar 2016 15:53 >> To: hotspot compiler >> Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. >> >> Hi, >> >> >> >> SAP changed its name from SAP AG to SAP SE. We were asked to >> >> adapt our copyright messages accordingly. >> >> >> >> This change fixes all SAP copyrights in hostpot to follow the patterns >> >> "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or >> >> "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All rights >> reserved." >> >> >> >> Please review this change. I please need a sponsor. >> >> http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 >> >> >> >> Best regards, >> >> Goetz. >> >> From goetz.lindenmaier at sap.com Mon Jan 25 11:41:27 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 25 Jan 2016 11:41:27 +0000 Subject: sponsor?: RFR(M): 8147937: Adapt SAP copyrights to new company name. In-Reply-To: <56A609A0.9060109@oracle.com> References: <4295855A5C1DE049A61835A1887419CC41F17026@DEWDFEMB12A.global.corp.sap> <56A609A0.9060109@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC41F1705E@DEWDFEMB12A.global.corp.sap> That's great! Thanks! Best regards, Goetz. > -----Original Message----- > From: Zolt?n Maj? [mailto:zoltan.majo at oracle.com] > Sent: Montag, 25. Januar 2016 12:40 > To: Lindenmaier, Goetz ; hotspot compiler > > Subject: Re: sponsor?: RFR(M): 8147937: Adapt SAP copyrights to new > company name. > > Hi Goetz, > > > On 01/25/2016 12:26 PM, Lindenmaier, Goetz wrote: > > Hi, > > > > could somebody please sponsor this change? > > I'll sponsor the change. > > Thank you and best regards, > > > Zoltan > > > I updated the webrev to apply to the latest hs-comp repo. > > http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01/ > > > > Thanks! > > Goetz. > > > >> -----Original Message----- > >> From: Lindenmaier, Goetz > >> Sent: Donnerstag, 21. Januar 2016 15:53 > >> To: hotspot compiler > >> Subject: RFR(M): 8147937: Adapt SAP copyrights to new company name. > >> > >> Hi, > >> > >> > >> > >> SAP changed its name from SAP AG to SAP SE. We were asked to > >> > >> adapt our copyright messages accordingly. > >> > >> > >> > >> This change fixes all SAP copyrights in hostpot to follow the patterns > >> > >> "Copyright (c) [1,2][9,0][0-9][0-9] SAP SE. All rights reserved." or > >> > >> "Copyright (c) [1,2][9,0][0-9][0-9], [1,2][9,0][0-9][0-9] SAP SE. All rights > >> reserved." > >> > >> > >> > >> Please review this change. I please need a sponsor. > >> > >> http://cr.openjdk.java.net/~goetz/wr16/8147937-copyright/webrev.01 > >> > >> > >> > >> Best regards, > >> > >> Goetz. > >> > >> From nils.eliasson at oracle.com Mon Jan 25 14:12:11 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 25 Jan 2016 15:12:11 +0100 Subject: [9] RFR(XS): 8147876: ciTypeFlow::is_dominated_by() writes outside dominated array In-Reply-To: <56A5EF6C.4090603@oracle.com> References: <56A5EF6C.4090603@oracle.com> Message-ID: <56A62D3B.6070805@oracle.com> Looks good. Best regards, Nils (Not a reviewer) On 2016-01-25 10:48, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8147876 > http://cr.openjdk.java.net/~thartmann/8147876/webrev.00/ > > ciTypeFlow::is_dominated_by() write outside the 'dominated' array because it's size is too small. The problem is that the number of ciBlocks is not equal to the Blocks used by ciTypeFlow (there is a 1:n relation). Therefore, we should use block_count() instead of _methodBlocks->num_blocks(). > > Thanks, > Tobias From tobias.hartmann at oracle.com Mon Jan 25 14:27:40 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 25 Jan 2016 15:27:40 +0100 Subject: [9] RFR(XS): 8147876: ciTypeFlow::is_dominated_by() writes outside dominated array In-Reply-To: <56A62D3B.6070805@oracle.com> References: <56A5EF6C.4090603@oracle.com> <56A62D3B.6070805@oracle.com> Message-ID: <56A630DC.9020406@oracle.com> Thanks, Nils! Best, Tobias On 25.01.2016 15:12, Nils Eliasson wrote: > Looks good. > > Best regards, > Nils > (Not a reviewer) > > On 2016-01-25 10:48, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8147876 >> http://cr.openjdk.java.net/~thartmann/8147876/webrev.00/ >> >> ciTypeFlow::is_dominated_by() write outside the 'dominated' array because it's size is too small. The problem is that the number of ciBlocks is not equal to the Blocks used by ciTypeFlow (there is a 1:n relation). Therefore, we should use block_count() instead of _methodBlocks->num_blocks(). >> >> Thanks, >> Tobias > From doug.simon at oracle.com Mon Jan 25 16:14:20 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 25 Jan 2016 17:14:20 +0100 Subject: RFR: 8147470: update JVMCI mx extensions Message-ID: Please review these changes to the mx extensions for JVMCI to account for recent HotSpot and Graal changes. https://bugs.openjdk.java.net/browse/JDK-8147470 http://cr.openjdk.java.net/~dnsimon/8147470/ -Doug From aleksey.shipilev at oracle.com Mon Jan 25 16:32:52 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 25 Jan 2016 19:32:52 +0300 Subject: RFR (M) 8148146: Integrate new internal Unsafe entry points, and basic intrinsic support for VarHandles Message-ID: <56A64E34.8020305@oracle.com> Hi, I would like to solicit reviews for the slab of VM changes to support JEP 193 (VarHandles). This portion covers new Unsafe methods. Webrev: http://cr.openjdk.java.net/~shade/8148146/webrev.jdk.00/ http://cr.openjdk.java.net/~shade/8148146/webrev.hs.00/ The patches "almost" pass JPRT, with some failures in closed code, triggered by adding a large number of new intrinsics. Those failures are to be addressed separately -- and because of that, this change is not yet pushable. A preliminary review would be appreciated meanwhile. A brief summary of changes: a) jdk.internal.misc.Unsafe has new methods. Since we now have split s.m.Unsafe and j.i.m.Unsafe, this change "safely" extends the private Unsafe, leaving the other one untouched. b) hotspot/test/compiler/unsafe tests are extended for newly added methods. c) unsafe.cpp gets the basic native method implementations. Most new operations are folded to their volatile (the strongest) counterparts, hoping that compilers would intrinsify them into more performant versions. d) C2 intrinsics for x86: * Most intrinsics code is covered by platform-independent LibraryCallKit changes, which means non-x86 architectures are also partially covered. * There are two classes of ops left for platform-dependent code: WeakCAS and CompareAndExchange nodes. Both seem simple enough to do, but there are details to be sorted out on each platform -- let's do those separately. * Both LibraryCallKit::inline_unsafe_access and LCK::inline_unsafe_load_store were modified to accept new access modes, and generally brushed up to accept the changes. * putOrdered intrinsic methods are purged in favor of put*Release operations. We still keep Unsafe.putOrdered for testability and compatibility reasons. Eyeballing the generated code on x86 yields no obvious problems. Sanity microbenchmark runs do not show performance regressions on old methods, and show the expected performance on new methods: http://cr.openjdk.java.net/~shade/8148146/notes.txt Cheers, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From igor.ignatyev at oracle.com Mon Jan 25 16:47:14 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 25 Jan 2016 19:47:14 +0300 Subject: RFR(XXS) : 8148161 : quarantine compiler/loopopts/UseCountedLoopSafepoints.java Message-ID: http://cr.openjdk.java.net/~iignatyev/8148161/webrev.00/ > 1 line changed: 1 ins; 0 del; 0 mod; Hi all, could you please review this tiny fix which quarantines 'compiler/loopopts/UseCountedLoopSafepoints.java? test while 8146096 is fixed? Thanks, Igor From vladimir.kozlov at oracle.com Mon Jan 25 16:54:43 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 25 Jan 2016 08:54:43 -0800 Subject: RFR(XXS) : 8148161 : quarantine compiler/loopopts/UseCountedLoopSafepoints.java In-Reply-To: References: Message-ID: <56A65353.3010200@oracle.com> Good. Thanks, Vladimir On 1/25/16 8:47 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8148161/webrev.00/ >> 1 line changed: 1 ins; 0 del; 0 mod; > > Hi all, > > could you please review this tiny fix which quarantines 'compiler/loopopts/UseCountedLoopSafepoints.java? test while 8146096 is fixed? > > Thanks, > Igor > From rahul.v.raghavan at oracle.com Mon Jan 25 17:02:00 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Mon, 25 Jan 2016 09:02:00 -0800 (PST) Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler Message-ID: Hello, With reference to below email thread, please send review comments for the revised patch for JDK-6378256. http://cr.openjdk.java.net/~thartmann/6378256/webrev.02/ Thanks, Rahul > -----Original Message----- > From: Tobias Hartmann > Sent: Monday, January 25, 2016 12:40 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net > > Hi Rahul, > > On 22.01.2016 17:11, Rahul Raghavan wrote: > > > >> -----Original Message----- > >> From: Tobias Hartmann > Sent: Monday, January 11, 2016 2:56 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net > >> > >> Hi Rahul, > >> > >>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ > >> > >> Why don't you use 'markOopDesc::hash_mask_in_place' for the 64 bit version? This should safe some instructions and you also > don't > >> need the 'hash' register if you compute everything in 'result'. > > > > Thank you for your comments Tobias. > > > > I could not get the implementation work with the usage of 'markOopDesc::hash_mask_in_place' in x86_64 (similar to support in > x86_32). > > Usage of - __ andptr(result, markOopDesc::hash_mask_in_place); > > Results in build error - ' overflow in implicit constant conversion' > > > > Then understood from 'sharedRuntime_sparc.cpp', 'markOop.hpp' - that the usage of 'hash_mask_in_place' should be avoided for > 64-bit because the values are too big! > > Similar comments in LibraryCallKit::inline_native_hashcode [hotspot/src/share/vm/opto/library_call.cpp] also. > > Could not find some other way to use hash_mask_in_place here for x86_64? > > You are right, I missed that. > > > So depending on markOopDesc::hash_mask, markOopDesc::hash_shift value instead (similar to done in sharedRuntime_sparc) > > Added missing comment regarding above in the revised webrev. > > > > Also yes I missed the optimized codegen. > > Tried revised patch removing usages of extra 'hash', 'mask' registers and computed all in 'result' itself. > > > > [sharedRuntime_x86_64.cpp] > > .................... > > + Register obj_reg = j_rarg0; > > + Register result = rax; > > ........ > > + // get hash > > + // Read the header and build a mask to get its hash field. > > + // Depend on hash_mask being at most 32 bits and avoid the use of hash_mask_in_place > > + // because it could be larger than 32 bits in a 64-bit vm. See markOop.hpp. > > + __ shrptr(result, markOopDesc::hash_shift); > > + __ andptr(result, markOopDesc::hash_mask); > > + // test if hashCode exists > > + __ jcc (Assembler::zero, slowCase); > > + __ ret(0); > > + __ bind (slowCase); > > ........ > > > > Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. > > > > Please send your comments. I can submit revised webrev if all okay. > > Looks good. Please send a new webrev. > > Best, > Tobias > > > > >> > >> Best, > >> Tobias > >> > >> > >> On 08.01.2016 18:13, Rahul Raghavan wrote: > >>> Hello, > >>> > >>> Please review the following revised patch for JDK-6378256 - > >>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ > >>> > >>> This revised webrev got following changes - > >>> > >>> 1) A minor, better optimized code with return 0 at initial stage (instead of continuing to 'slowCase' path), for special/rare null > >> reference input! > >>> (as per documentation, test results confirmed it is safe to 'return 0' for null reference input, for System.identityHashCode) > >>> > >>> 2) Added similar Object.hashCode, System.identityHashCode optimization support in sharedRuntime_x86_64.cpp. > >>> > >>> Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. > >>> > >>> Thanks, > >>> Rahul > >>> > >>> > >>>> -----Original Message----- > >>>> From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan> Cc: hotspot-compiler- > >> dev at openjdk.java.net > >>>> > >>>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > >>>> > >>>> Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again > would > >> be > >>>> nice. > >>>> Shouldn't we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? > >>>> > >>>> Roland. > >>> > >>> > >>>> -----Original Message----- > >>>> From: Rahul Raghavan > Sent: Wednesday, December 09, 2015 2:43 PM > To: hotspot-compiler-dev at openjdk.java.net > >>>> > >>>> Hello, > >>>> > >>>> Please review the following patch for JDK-6378256. > >>>> > >>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > >>>> > >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . > >>>> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times > >>>> slower). > >>>> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). > >>>> > >>>> sample unit test: > >>>> public class Jdk6378256Test > >>>> { > >>>> public static void main(String[] args) > >>>> { > >>>> Object obj = new Object(); > >>>> long time = System.nanoTime(); > >>>> for(int i = 0 ; i < 1000000 ; i++) > >>>> System.identityHashCode(obj); //compare to obj.hashCode(); > >>>> System.out.println ("Result = " + (System.nanoTime() - time)); > >>>> } > >>>> } > >>>> > >>>> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. > >>>> (looks in the header for the hashCode before calling into the VM). > >>>> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. > >>>> So also added required additional null check for System.identityHashCode case. > >>>> > >>>> Testing: > >>>> - successful JPRT run (-testset hotspot). > >>>> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). > >>>> (with -client / -XX:TieredStopAtLevel=1 etc. options). > >>>> - Added 'noreg-perf' label for this performance bug. > >>>> Manual testing done and confirmed expected performance values for unit tests with fix. > >>>> > >>>> Thanks, > >>>> Rahul From vladimir.x.ivanov at oracle.com Mon Jan 25 17:02:56 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 25 Jan 2016 20:02:56 +0300 Subject: [9] RFR(XS): 8147876: ciTypeFlow::is_dominated_by() writes outside dominated array In-Reply-To: <56A5EF6C.4090603@oracle.com> References: <56A5EF6C.4090603@oracle.com> Message-ID: <56A65540.6010105@oracle.com> Looks good. Best regards, Vladimir Ivanov On 1/25/16 12:48 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8147876 > http://cr.openjdk.java.net/~thartmann/8147876/webrev.00/ > > ciTypeFlow::is_dominated_by() write outside the 'dominated' array because it's size is too small. The problem is that the number of ciBlocks is not equal to the Blocks used by ciTypeFlow (there is a 1:n relation). Therefore, we should use block_count() instead of _methodBlocks->num_blocks(). > > Thanks, > Tobias > From roland.westrelin at oracle.com Mon Jan 25 17:05:54 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 25 Jan 2016 18:05:54 +0100 Subject: RFR(XS): 8147853: "assert(t->meet(t0) == t) failed: Not monotonic" with sun/util/calendar/zi/TestZoneInfo310.java In-Reply-To: <56A276AE.8060408@oracle.com> References: <56A276AE.8060408@oracle.com> Message-ID: <8686820A-60A2-4CEC-AEAC-DD3A74369323@oracle.com> Hi Vladimir, Thanks for looking at this. > Can we simple return type of in(EntryControl) phi's input in such case (backedge is top) without filter_speculative() and verification code under assert which is useless in this case, I think. We already have check for the counted loop, we only need to separate can_be_counted_loop() condition. Does this look better? http://cr.openjdk.java.net/~roland/8147853/webrev.01/ Roland. > > Thanks, > Vladimir > > On 1/22/16 8:38 AM, Roland Westrelin wrote: >> During CCP, a Phi for the induction variable of a CountedLoop is processed repeatedly while the type of the backedge control is top so only the loop entry input is considered for computing the Phi?s type. >> >> The loop entry first has type int:1..3 so the Phi?s type is int:1..3 >> then it has type int:1..4 so the Phi?s type is int:1..4 >> then it has type int:1..5:www so the Phi?s type is int:1..5:www >> then it has type int:1..6:www so the Phi?s type is saturated to int:1..max-1:www >> >> The backedge control?s type is changed to non-top and the type of the Phi is recomputed. This time the special code for counted loop in PhiNode::Value(): >> >> CountedLoopNode* l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL; >> if (l && l->can_be_counted_loop(phase) && >> ((const Node*)l->phi() == this)) { // Trip counted loop! >> // protect against init_trip() or limit() returning NULL >> const Node *init = l->init_trip(); >> const Node *limit = l->limit(); >> const Node* stride = l->stride(); >> if (init != NULL && limit != NULL && stride != NULL) { >> const TypeInt* lo = phase->type(init)->isa_int(); >> const TypeInt* hi = phase->type(limit)->isa_int(); >> const TypeInt* stride_t = phase->type(stride)->isa_int(); >> if (lo != NULL && hi != NULL && stride_t != NULL) { // Dying loops might have TOP here >> assert(stride_t->_hi >= stride_t->_lo, "bad stride type"); >> const Type* res = NULL; >> if (stride_t->_hi < 0) { // Down-counter loop >> swap(lo, hi); >> return TypeInt::make(MIN2(lo->_lo, hi->_lo) , hi->_hi, 3); >> } else if (stride_t->_lo >= 0) { >> return TypeInt::make(lo->_lo, MAX2(lo->_hi, hi->_hi), 3); >> } >> } >> } >> } >> >> >> kicks in and it computes a type of: int:1..8:www. The type of the Phi was narrowed and the assert fires. >> >> I suggest we fix this by saturating the type of the Phi only once the type of the loop?s backedge is non top. This way, the special code for counted loop above has a chance to run and that should be enough to keep the types during CCP monotonic. >> >> http://cr.openjdk.java.net/~roland/8147853/webrev.00/ >> >> Roland. >> From vladimir.kozlov at oracle.com Mon Jan 25 17:12:06 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 25 Jan 2016 09:12:06 -0800 Subject: [9] RFR(XS): 8147876: ciTypeFlow::is_dominated_by() writes outside dominated array In-Reply-To: <56A5EF6C.4090603@oracle.com> References: <56A5EF6C.4090603@oracle.com> Message-ID: <56A65766.6030307@oracle.com> Looks good. Note to all. When you find problem in recent changes, please, add link in bug report to original changes (JDK-8140574 in this case). In a future it will help, for example, if we want to backport a original changes. Thanks, Vladimir On 1/25/16 1:48 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8147876 > http://cr.openjdk.java.net/~thartmann/8147876/webrev.00/ > > ciTypeFlow::is_dominated_by() write outside the 'dominated' array because it's size is too small. The problem is that the number of ciBlocks is not equal to the Blocks used by ciTypeFlow (there is a 1:n relation). Therefore, we should use block_count() instead of _methodBlocks->num_blocks(). > > Thanks, > Tobias > From igor.ignatyev at oracle.com Mon Jan 25 17:17:29 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 25 Jan 2016 20:17:29 +0300 Subject: RFR(S) : 8148136 : compile control tests have incorrect @build directives Message-ID: http://cr.openjdk.java.net/~iignatyev/8148136/webrev.00/ > 49 lines changed: 24 ins; 0 del; 25 mod Hi all, could you please review the patch which fixes build directives for compile control tests? Compile control tests intermittently fail in concurrent jtreg execution w/ NoClassDefFoundError or ClassNotFoundException for different classes. The tests have @build directives which refers to classname w/o package, but they should refer to FQN. So there are no @build/compile actions which guarantees that all needed classes would be compiled. The patch replaces classname w/ fully qualified name. testing: run all hotspot/test/compiler jbs: https://bugs.openjdk.java.net/browse/JDK-8148136 Thanks, ? Igor From vladimir.kozlov at oracle.com Mon Jan 25 17:52:33 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 25 Jan 2016 09:52:33 -0800 Subject: RFR(XS): 8147853: "assert(t->meet(t0) == t) failed: Not monotonic" with sun/util/calendar/zi/TestZoneInfo310.java In-Reply-To: <8686820A-60A2-4CEC-AEAC-DD3A74369323@oracle.com> References: <56A276AE.8060408@oracle.com> <8686820A-60A2-4CEC-AEAC-DD3A74369323@oracle.com> Message-ID: <56A660E1.5020300@oracle.com> Yes! Thank you for making changes. Vladimir On 1/25/16 9:05 AM, Roland Westrelin wrote: > Hi Vladimir, > > Thanks for looking at this. > >> Can we simple return type of in(EntryControl) phi's input in such case (backedge is top) without filter_speculative() and verification code under assert which is useless in this case, I think. We already have check for the counted loop, we only need to separate can_be_counted_loop() condition. > > Does this look better? > > http://cr.openjdk.java.net/~roland/8147853/webrev.01/ > > Roland. > >> >> Thanks, >> Vladimir >> >> On 1/22/16 8:38 AM, Roland Westrelin wrote: >>> During CCP, a Phi for the induction variable of a CountedLoop is processed repeatedly while the type of the backedge control is top so only the loop entry input is considered for computing the Phi?s type. >>> >>> The loop entry first has type int:1..3 so the Phi?s type is int:1..3 >>> then it has type int:1..4 so the Phi?s type is int:1..4 >>> then it has type int:1..5:www so the Phi?s type is int:1..5:www >>> then it has type int:1..6:www so the Phi?s type is saturated to int:1..max-1:www >>> >>> The backedge control?s type is changed to non-top and the type of the Phi is recomputed. This time the special code for counted loop in PhiNode::Value(): >>> >>> CountedLoopNode* l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL; >>> if (l && l->can_be_counted_loop(phase) && >>> ((const Node*)l->phi() == this)) { // Trip counted loop! >>> // protect against init_trip() or limit() returning NULL >>> const Node *init = l->init_trip(); >>> const Node *limit = l->limit(); >>> const Node* stride = l->stride(); >>> if (init != NULL && limit != NULL && stride != NULL) { >>> const TypeInt* lo = phase->type(init)->isa_int(); >>> const TypeInt* hi = phase->type(limit)->isa_int(); >>> const TypeInt* stride_t = phase->type(stride)->isa_int(); >>> if (lo != NULL && hi != NULL && stride_t != NULL) { // Dying loops might have TOP here >>> assert(stride_t->_hi >= stride_t->_lo, "bad stride type"); >>> const Type* res = NULL; >>> if (stride_t->_hi < 0) { // Down-counter loop >>> swap(lo, hi); >>> return TypeInt::make(MIN2(lo->_lo, hi->_lo) , hi->_hi, 3); >>> } else if (stride_t->_lo >= 0) { >>> return TypeInt::make(lo->_lo, MAX2(lo->_hi, hi->_hi), 3); >>> } >>> } >>> } >>> } >>> >>> >>> kicks in and it computes a type of: int:1..8:www. The type of the Phi was narrowed and the assert fires. >>> >>> I suggest we fix this by saturating the type of the Phi only once the type of the loop?s backedge is non top. This way, the special code for counted loop above has a chance to run and that should be enough to keep the types during CCP monotonic. >>> >>> http://cr.openjdk.java.net/~roland/8147853/webrev.00/ >>> >>> Roland. >>> > From vladimir.kozlov at oracle.com Mon Jan 25 17:53:49 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 25 Jan 2016 09:53:49 -0800 Subject: RFR(S) : 8148136 : compile control tests have incorrect @build directives In-Reply-To: References: Message-ID: <56A6612D.8010706@oracle.com> Good. Thanks, Vladimir On 1/25/16 9:17 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8148136/webrev.00/ >> 49 lines changed: 24 ins; 0 del; 25 mod > > Hi all, > > could you please review the patch which fixes build directives for compile control tests? > > Compile control tests intermittently fail in concurrent jtreg execution w/ NoClassDefFoundError or ClassNotFoundException for different classes. The tests have @build directives which refers to classname w/o package, but they should refer to FQN. So there are no @build/compile actions which guarantees that all needed classes would be compiled. The patch replaces classname w/ fully qualified name. > > testing: run all hotspot/test/compiler > jbs: https://bugs.openjdk.java.net/browse/JDK-8148136 > > Thanks, > ? Igor > From dmitry.fazunenko at oracle.com Mon Jan 25 20:07:54 2016 From: dmitry.fazunenko at oracle.com (Dmitry Fazunenko) Date: Mon, 25 Jan 2016 23:07:54 +0300 Subject: RFR(M) 8147461: Use byte offsets for vtable start and vtable length offsets In-Reply-To: <56A1FA78.3090608@oracle.com> References: <569926B9.4070806@oracle.com> <569F7E22.3090905@oracle.com> <56A04DCF.9090204@oracle.com> <56A1FA78.3090608@oracle.com> Message-ID: <56A6809A.40104@oracle.com> Hi Igor, The GC part of change looks good. Thanks, Dima On 22.01.2016 12:46, Mikael Gerdin wrote: > Hi Chris, > > On 2016-01-21 04:17, Chris Plummer wrote: >> Hi Mikael, >> >> The changes look good except I think you should get someone from the >> compiler team to make sure the change in >> HotSpotResolvedJavaMethodImpl.java and HotSpotVMConfig.java are ok. I'm >> not sure why you chose to remove instanceKlassVtableStartOffset() rather >> than just fix it. > > I'm cc:ing hotspot-compiler-dev and graal-dev to see if I can get > someone to ok the JVMCI parts. > > The reason for removing the method is that the only reason for it > being a method was to apply the wordSize scaling on the value and > since I changed the offset to be a byte offset it does not need > scaling and can be treated similar to the other constants in > HotSpotVMConfig which are accessed without any accessor method. > >> >> I think some of your changes may conflict with my changes for >> JDK-8143608. Coleen is pushing JDK-8143608 for me once hs-rt opens up. >> I'd appreciate it if you could wait until after then before doing your >> push. > > Will do, would you mind pinging me when you've integrated 8143608? > > /Mikael > >> >> thanks, >> >> Chris >> >> On 1/20/16 4:31 AM, Mikael Gerdin wrote: >>> Hi again, >>> >>> I've rebased the on hs-rt and had to include some additional changes >>> for JVMCI. >>> I've also updated the copyright years. >>> Unfortunately I can't generate an incremental webrev since i rebased >>> the patch and there's no good way that I know of to make that work >>> with webrev. >>> >>> New webrev at: http://cr.openjdk.java.net/~mgerdin/8147461/webrev.1/ >>> >>> Testing: JPRT again (which includes the JVMCI jtreg tests) >>> >>> /Mikael >>> >>> On 2016-01-15 18:04, Mikael Gerdin wrote: >>>> Hi all, >>>> >>>> As per the previous discussion in mid-December[0] about moving the >>>> _vtable_length field to class Klass, here's the first RFR and webrev, >>>> according to my suggested plan[1]: >>>> >>>>> My current plan is to first modify the vtable_length_offset >>>>> accessor to >>>>> return a byte offset (which is what it's translated to by all >>>>> callers). >>>>> >>>>> Then I'll tackle moving the _vtable_len field to Klass. >>>>> >>>>> Finally I'll try to consolidate the vtable related methods to Klass, >>>>> where they belong. >>>> >>>> This change actually consists of three changes: >>>> * modifying InstanceKlass::vtable_length_offset to become a byte >>>> offset >>>> and use the ByteSize type to communicate the scaling. >>>> * modifying InstanceKlass::vtable_start_offset to become a byte offset >>>> and use the ByteSize type, for symmetry reasons mainly. >>>> * adding a vtableEntry::size_in_bytes() since in many places the >>>> vtable >>>> entry size is used in combination with the vtable start to compute a >>>> byte offset for vtable lookups. >>>> >>>> I don't foresee any issues with the fact that the byte offset is >>>> represented as an int, for two reasons: >>>> 1) If the offset of any of these grows to over 2 gigabytes then we >>>> have >>>> a huge footprint problem with InstanceKlass >>>> 2) The offsets are converted to byte offsets and stored in ints >>>> already >>>> in the cpu specific code I've modified. >>>> >>>> Bug link: https://bugs.openjdk.java.net/browse/JDK-8147461 >>>> Webrev: http://cr.openjdk.java.net/~mgerdin/8147461/webrev.0/ >>>> >>>> Testing: JPRT on Oracle supported platforms, testing on AARCH64 and >>>> PPC64 would be much appreciated, appropriate mailing lists have been >>>> CC:ed to notify them of the request. >>>> >>>> >>>> [0] >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-December/021152.html >>>> >>>> >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-December/021224.html >>>> >>>> >>>> >>>> >>>> Thanks! >>>> /Mikael >>> >> > From christian.thalinger at oracle.com Thu Jan 7 16:45:46 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 07 Jan 2016 16:45:46 -0000 Subject: RFR (S): 8146246: JVMCICompiler::abort_on_pending_exception: assert(!thread->owns_locks()) failed: must release all locks when leaving VM In-Reply-To: <0C62FED5-F3F8-44CE-B1DB-095F9170370B@oracle.com> References: <568D6C63.5000403@oracle.com> <0C62FED5-F3F8-44CE-B1DB-095F9170370B@oracle.com> Message-ID: <80FAECCD-94BA-479C-B042-0A50D9121C8F@oracle.com> [Changing lists because it should have been on hotspot-dev.] Coleen, in case 2) below I could replace java_lang_Throwable::print_stack_trace with java_lang_Throwable::java_printStackTrace. > On Jan 6, 2016, at 12:57 PM, Christian Thalinger wrote: > > >> On Jan 6, 2016, at 9:34 AM, Vladimir Kozlov wrote: >> >> I would go with "Java code do the printing?. > > Yeah, it might be better. > >> You left ttyLocker in case 2) in src/share/vm/runtime/java.cpp > > Right. Thanks for pointing that out. > >> >> Thanks, >> Vladimir >> >> On 1/6/16 11:19 AM, Christian Thalinger wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8146246 >>> >>> The problem is that https://bugs.openjdk.java.net/browse/JDK-8145435 introduced ttyLocker to synchronize the exception output but java_lang_Throwable::print_stack_trace can call out to Java to get the cause. >>> >>> There are two solutions: >>> >>> 1) Remove ttyLocker and deal with some possible scrambling in the rare case of an exception: >>> >>> diff -r df8d635f2296 -r e87e187552fb src/share/vm/jvmci/jvmciCompiler.cpp >>> --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 11:24:01 2015 -0800 >>> +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Thu Dec 31 09:20:16 2015 -0800 >>> @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const >>> Handle exception(THREAD, PENDING_EXCEPTION); >>> CLEAR_PENDING_EXCEPTION; >>> >>> - { >>> - ttyLocker ttyl; >>> - java_lang_Throwable::print_stack_trace(exception, tty); >>> - } >>> + java_lang_Throwable::print_stack_trace(exception, tty); >>> >>> // Something went wrong so disable compilation at this level >>> method->set_not_compilable(CompLevel_full_optimization); >>> @@ -181,11 +178,8 @@ void JVMCICompiler::abort_on_pending_exc >>> Thread* THREAD = Thread::current(); >>> CLEAR_PENDING_EXCEPTION; >>> >>> - { >>> - ttyLocker ttyl; >>> - tty->print_raw_cr(message); >>> - java_lang_Throwable::print_stack_trace(exception, tty); >>> - } >>> + tty->print_raw_cr(message); >>> + java_lang_Throwable::print_stack_trace(exception, tty); >>> >>> // Give other aborting threads to also print their stack traces. >>> // This can be very useful when debugging class initialization >>> diff -r df8d635f2296 -r e87e187552fb src/share/vm/runtime/java.cpp >>> --- a/src/share/vm/runtime/java.cpp Tue Dec 29 11:24:01 2015 -0800 >>> +++ b/src/share/vm/runtime/java.cpp Thu Dec 31 09:20:16 2015 -0800 >>> @@ -432,7 +432,6 @@ void before_exit(JavaThread* thread) { >>> if (HAS_PENDING_EXCEPTION) { >>> Handle exception(THREAD, PENDING_EXCEPTION); >>> CLEAR_PENDING_EXCEPTION; >>> - ttyLocker ttyl; >>> java_lang_Throwable::print_stack_trace(exception, tty); >>> } >>> #endif >>> >>> or >>> >>> 2) Call out to Java and let the Java code do the printing: >>> >>> diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.cpp >>> --- a/src/share/vm/classfile/javaClasses.cpp Tue Dec 29 18:30:51 2015 +0100 >>> +++ b/src/share/vm/classfile/javaClasses.cpp Wed Jan 06 09:12:00 2016 -1000 >>> @@ -1784,6 +1784,20 @@ void java_lang_Throwable::print_stack_tr >>> } >>> } >>> >>> +/** >>> + * Print the throwable stack trace by calling the Java method java.lang.Throwable.printStackTrace(). >>> + */ >>> +void java_lang_Throwable::java_printStackTrace(Handle throwable, TRAPS) { >>> + assert(throwable->is_a(SystemDictionary::Throwable_klass()), "Throwable instance expected"); >>> + JavaValue result(T_VOID); >>> + JavaCalls::call_virtual(&result, >>> + throwable, >>> + KlassHandle(THREAD, SystemDictionary::Throwable_klass()), >>> + vmSymbols::printStackTrace_name(), >>> + vmSymbols::void_method_signature(), >>> + THREAD); >>> +} >>> + >>> void java_lang_Throwable::fill_in_stack_trace(Handle throwable, const methodHandle& method, TRAPS) { >>> if (!StackTraceInThrowable) return; >>> ResourceMark rm(THREAD); >>> diff -r 0fcfe4b07f7e src/share/vm/classfile/javaClasses.hpp >>> --- a/src/share/vm/classfile/javaClasses.hpp Tue Dec 29 18:30:51 2015 +0100 >>> +++ b/src/share/vm/classfile/javaClasses.hpp Wed Jan 06 09:12:00 2016 -1000 >>> @@ -554,6 +554,7 @@ class java_lang_Throwable: AllStatic { >>> // Printing >>> static void print(Handle throwable, outputStream* st); >>> static void print_stack_trace(Handle throwable, outputStream* st); >>> + static void java_printStackTrace(Handle throwable, TRAPS); >>> // Debugging >>> friend class JavaClasses; >>> }; >>> diff -r 0fcfe4b07f7e src/share/vm/jvmci/jvmciCompiler.cpp >>> --- a/src/share/vm/jvmci/jvmciCompiler.cpp Tue Dec 29 18:30:51 2015 +0100 >>> +++ b/src/share/vm/jvmci/jvmciCompiler.cpp Wed Jan 06 09:12:00 2016 -1000 >>> @@ -162,10 +162,7 @@ void JVMCICompiler::compile_method(const >>> Handle exception(THREAD, PENDING_EXCEPTION); >>> CLEAR_PENDING_EXCEPTION; >>> >>> - { >>> - ttyLocker ttyl; >>> - java_lang_Throwable::print_stack_trace(exception, tty); >>> - } >>> + java_lang_Throwable::java_printStackTrace(exception, THREAD); >>> >>> // Something went wrong so disable compilation at this level >>> method->set_not_compilable(CompLevel_full_optimization); >>> @@ -181,11 +178,7 @@ void JVMCICompiler::abort_on_pending_exc >>> Thread* THREAD = Thread::current(); >>> CLEAR_PENDING_EXCEPTION; >>> >>> - { >>> - ttyLocker ttyl; >>> - tty->print_raw_cr(message); >>> - java_lang_Throwable::print_stack_trace(exception, tty); >>> - } >>> + java_lang_Throwable::java_printStackTrace(exception, THREAD); >>> >>> // Give other aborting threads to also print their stack traces. >>> // This can be very useful when debugging class initialization >>> diff -r 0fcfe4b07f7e src/share/vm/runtime/java.cpp >>> --- a/src/share/vm/runtime/java.cpp Tue Dec 29 18:30:51 2015 +0100 >>> +++ b/src/share/vm/runtime/java.cpp Wed Jan 06 09:12:00 2016 -1000 >>> @@ -433,7 +433,7 @@ void before_exit(JavaThread* thread) { >>> Handle exception(THREAD, PENDING_EXCEPTION); >>> CLEAR_PENDING_EXCEPTION; >>> ttyLocker ttyl; >>> - java_lang_Throwable::print_stack_trace(exception, tty); >>> + java_lang_Throwable::java_printStackTrace(exception, THREAD); >>> } >>> #endif >>> > From christian.thalinger at oracle.com Thu Jan 14 21:48:32 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 14 Jan 2016 11:48:32 -1000 Subject: RFR (S): 8146820: JVMCI properties should use HotSpotJVMCIRuntime.getBooleanProperty mechanism In-Reply-To: References: <83D3AB99-8164-4326-B847-06BFF27280C7@oracle.com> <56940779.8070804@oracle.com> <490C48FD-48A2-459F-BF0A-56D33966CC60@oracle.com> <9EC9F964-26EE-43B6-BF7E-43F40D192C1E@oracle.com> <41621484-0886-401C-A8AD-36D534DDE591@oracle.com> <7C1CBFFE-9A7C-4195-A8EA-BD7B94092E4F@oracle.com> Message-ID: <97A6E072-7B52-4083-86F1-0DFC8AD287C7@oracle.com> > On Jan 14, 2016, at 2:44 AM, Doug Simon wrote: > >> >> On 14 Jan 2016, at 06:58, Christian Thalinger wrote: >> >>> >>> On Jan 12, 2016, at 12:39 PM, Christian Thalinger wrote: >>> >>>> >>>> On Jan 12, 2016, at 12:14 PM, Christian Thalinger wrote: >>>> >>>>> >>>>> On Jan 12, 2016, at 12:03 PM, Doug Simon wrote: >>>>> >>>>>> >>>>>> On 12 Jan 2016, at 22:39, Christian Thalinger wrote: >>>>>> >>>>>>> >>>>>>> On Jan 12, 2016, at 10:14 AM, Doug Simon wrote: >>>>>>> >>>>>>> If we?re going with an enum, you could put accessors directly in the enum: >>>>>>> >>>>>>> private static final boolean TrustFinalDefaultFields = Option.TrustFinalDefaultFields.getBoolean(true); >>>>>>> >>>>>>> private static final String TraceMethodDataFilter = Option.TraceMethodDataFilter.getString(null); >>>>>>> >>>>>>> You could then type the value of the options and check the right accessor is used: >>>>>>> >>>>>>> public enum Option { >>>>>>> ImplicitStableValues(boolean.class), >>>>>>> InitTimer, // Note: Not used because of visibility issues (see InitTimer.ENABLED). >>>>>>> PrintConfig(boolean.class), >>>>>>> PrintFlags(boolean.class), >>>>>>> ShowFlags(boolean.class), >>>>>>> TraceMethodDataFilter(String.class), >>>>>>> TrustFinalDefaultFields(String.class); >>>>>>> >>>>>>> Even ignoring these suggestions, the discipline imposed by the enum if a good idea. >>>>>> >>>>>> Excellent idea! I was also thinking about adding the default value to the enum. >>>>> >>>>> Can you do that without having to box the default value? >>>> >>>> No, we have to box but we can initialize all flags in the constructor: >>>> >>>> http://cr.openjdk.java.net/~twisti/8146820/webrev.02/ >> >> Do we agree on the change? > > I would prefer it if the value was lazy initialized (for non-AOT runtimes): It?s not different in AOT-land because these cannot be constants. > > > /** > * Supported JVMCI options. > */ > public enum Option { > ImplicitStableValues(boolean.class, true), > InitTimer(boolean.class, false), // Note: Not used (see InitTimer.ENABLED). > PrintConfig(boolean.class, false), > PrintFlags(boolean.class, false), > ShowFlags(boolean.class, false), > TraceMethodDataFilter(String.class, null), > TrustFinalDefaultFields(String.class, true); > > /** > * The prefix for system properties that are JVMCI options. > */ > private static final String JVMCI_OPTION_PROPERTY_PREFIX = "jvmci."; > > private final Class type; > private Object value; > private final Object defaultValue; > private boolean isDefault; > > private Option(Class type, Object defaultValue) { > assert Character.isUpperCase(name().charAt(0)) : "Option name must start with upper-case letter: " + name(); > this.type = type; > this.value = "UNINITIALIZED"; > this.defaultValue = defaultValue; > } > > private Object getValue() { > if (value == "UNINITIALIZED") { > String propertyValue = VM.getSavedProperty(JVMCI_OPTION_PROPERTY_PREFIX + name()); > if (propertyValue == null) { > this.value = defaultValue; > this.isDefault = true; > } else { > if (type == boolean.class) { > this.value = Boolean.parseBoolean(propertyValue); > } else if (type == String.class) { > this.value = propertyValue; > } else { > throw new JVMCIError("Unexpected option type " + type); > } > this.isDefault = false; > } > // Saved properties should not be interned - let?s be sure > assert value != "UNINITIALIZED"; > } > return value; > } > > /** > * Returns the option's value as boolean. > * > * @return option's value > */ > public boolean getBoolean() { > return (boolean) getValue(); > } > > /** > * Returns the option's value as String. > * > * @return option's value > */ > public String getString() { > return (String) getValue(); > } > > /** > * Prints all option flags to {@code out}. > * > * @param out stream to print to > */ > public static void printFlags(PrintStream out) { > out.println("[List of JVMCI options]"); > for (Option option : values()) { > Object value = option.getValue(); > String assign = option.isDefault ? ":=" : " ="; > out.printf("%9s %-40s %s %-14s%n", option.type.getSimpleName(), option, assign, value); > } > } > } > > > Also, you can remove all the static fields that just cache a (possibly unboxed) option value and use the option directly. For example: > > diff -r 1034ff44c5d0 src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java > --- a/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Tue Jan 12 15:04:27 2016 +0100 > +++ b/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaFieldImpl.java Thu Jan 14 13:40:28 2016 +0100 > @@ -29,6 +29,7 @@ > import java.lang.reflect.Field; > > import jdk.vm.ci.common.JVMCIError; > +import jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.Option; > import jdk.vm.ci.meta.JavaType; > import jdk.vm.ci.meta.LocationIdentity; > import jdk.vm.ci.meta.MetaAccessProvider; > @@ -41,11 +42,6 @@ > */ > class HotSpotResolvedJavaFieldImpl implements HotSpotResolvedJavaField, HotSpotProxified { > > - /** > - * Mark well-known stable fields as such. > - */ > - private static final boolean ImplicitStableValues = HotSpotJVMCIRuntime.getBooleanProperty("jvmci.ImplicitStableValues", true); > - > private final HotSpotResolvedObjectTypeImpl holder; > private final String name; > private JavaType type; > @@ -198,7 +194,7 @@ > return true; > } > assert getAnnotation(Stable.class) == null; > - if (ImplicitStableValues && isImplicitStableField()) { > + if (Option.ImplicitStableValues.getBoolean() && isImplicitStableField()) { > return true; > } > return false; > > None of the current options are used in tight loops where the cost of the unboxing (if any) would matter. Right. > > Lastly, since you?ve added PrintFlags and ShowFlags, why not add a help message to each option. For example: > > ImplicitStableValues(boolean.class, true, ?Mark well-known stable fields as such.?), We should. http://cr.openjdk.java.net/~twisti/8146820/webrev.03/ $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.ShowFlags=true InitGraal [List of JVMCI options] boolean ImplicitStableValues := true Mark well-known stable fields as such. boolean InitTimer := false Specifies if initialization timing is enabled. boolean PrintConfig := false Prints all HotSpotVMConfig fields. boolean PrintFlags := false Prints all JVMCI flags and exits. boolean ShowFlags = true Prints all JVMCI flags and continues. String TraceMethodDataFilter := null boolean TrustFinalDefaultFields := true Determines whether to treat final fields with default values as constant. > > -Doug > >> >>>> >>>> We will not have many flags so this should be alright. A PrintFlags looks like this: >>>> >>>> $ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -Djvmci.PrintFlags=true InitGraal >>>> [List of JVMCI options] >>>> boolean ImplicitStableValues := true >>>> boolean InitTimer := false >>>> boolean PrintConfig := false >>>> boolean PrintFlags = true >>>> boolean ShowFlags := false >>>> String TraceMethodDataFilter := null >>>> String TrustFinalDefaultFields := true >>> >>> ?and this is a bug, of course :-) >>> >>>> >>>> I?m almost tempted to move InitTimer to another package, like jdk.vm.ci.common ? >>>> >>>>> >>>>> -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaroslav.bachorik at oracle.com Fri Jan 8 13:12:02 2016 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Fri, 08 Jan 2016 13:12:02 -0000 Subject: RFR 8146620: CodelistTest.java fails with "Test failed on: jdk.internal.misc.Unsafe.getUnsafe()Ljdk/internal/misc/Unsafe;" In-Reply-To: <8AA1795C-7E67-4CE6-8E07-490564C91D2A@oracle.com> References: <568F7A0F.50006@oracle.com> <8AA1795C-7E67-4CE6-8E07-490564C91D2A@oracle.com> Message-ID: <568FB59D.4050902@oracle.com> On 8.1.2016 10:06, Staffan Larsen wrote: > Looks good! Thanks! > > Thanks, > /Staffan > >> On 8 jan. 2016, at 09:57, Jaroslav Bachorik wrote: >> >> Please, review the following simple test fix >> >> Issue : https://bugs.openjdk.java.net/browse/JDK-8146620 >> Webrev: http://cr.openjdk.java.net/~jbachorik/8146620/webrev.00 >> >> The test is treating the 'sun.misc.Unsafe.getUnsafe()' entry from the code list in a specific way - but since now it is possible to meet also the 'jdk.internal.misc.Unsafe.getUnsafe()' entry it is necessary to modify the test to expect this eventuality. >> >> Thanks, >> >> -JB- > From christian.thalinger at oracle.com Mon Jan 25 23:11:41 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 25 Jan 2016 13:11:41 -1000 Subject: RFR: 8147470: update JVMCI mx extensions In-Reply-To: References: Message-ID: <420CEE06-87D7-4A6C-BB2A-FF76DD303756@oracle.com> This all looks good. > On Jan 25, 2016, at 6:14 AM, Doug Simon wrote: > > Please review these changes to the mx extensions for JVMCI to account for recent HotSpot and Graal changes. > > https://bugs.openjdk.java.net/browse/JDK-8147470 > http://cr.openjdk.java.net/~dnsimon/8147470/ > > -Doug From tobias.hartmann at oracle.com Tue Jan 26 08:13:10 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 26 Jan 2016 09:13:10 +0100 Subject: [9] RFR(XS): 8147876: ciTypeFlow::is_dominated_by() writes outside dominated array In-Reply-To: <56A65540.6010105@oracle.com> References: <56A5EF6C.4090603@oracle.com> <56A65540.6010105@oracle.com> Message-ID: <56A72A96.8080306@oracle.com> Thanks, Vladimir. Best, Tobias On 25.01.2016 18:02, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 1/25/16 12:48 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8147876 >> http://cr.openjdk.java.net/~thartmann/8147876/webrev.00/ >> >> ciTypeFlow::is_dominated_by() write outside the 'dominated' array because it's size is too small. The problem is that the number of ciBlocks is not equal to the Blocks used by ciTypeFlow (there is a 1:n relation). Therefore, we should use block_count() instead of _methodBlocks->num_blocks(). >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Tue Jan 26 08:17:41 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 26 Jan 2016 09:17:41 +0100 Subject: [9] RFR(XS): 8147876: ciTypeFlow::is_dominated_by() writes outside dominated array In-Reply-To: <56A65766.6030307@oracle.com> References: <56A5EF6C.4090603@oracle.com> <56A65766.6030307@oracle.com> Message-ID: <56A72BA5.1020700@oracle.com> Thanks, Vladimir. On 25.01.2016 18:12, Vladimir Kozlov wrote: > Looks good. > > Note to all. When you find problem in recent changes, please, add link in bug report to original changes (JDK-8140574 in this case). In a future it will help, for example, if we want to backport a original changes. Right, I'll keep that in mind. Best, Tobias > > Thanks, > Vladimir > > On 1/25/16 1:48 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> >> https://bugs.openjdk.java.net/browse/JDK-8147876 >> http://cr.openjdk.java.net/~thartmann/8147876/webrev.00/ >> >> ciTypeFlow::is_dominated_by() write outside the 'dominated' array because it's size is too small. The problem is that the number of ciBlocks is not equal to the Blocks used by ciTypeFlow (there is a 1:n relation). Therefore, we should use block_count() instead of _methodBlocks->num_blocks(). >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Tue Jan 26 09:10:14 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 26 Jan 2016 10:10:14 +0100 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: <56A737F6.6030909@oracle.com> Hi Rahul, looks good to me (not a Reviewer). The code in sharedRuntime_x86_64.cpp is much better now! Best, Tobias On 25.01.2016 18:02, Rahul Raghavan wrote: > Hello, > > With reference to below email thread, please send review comments for the revised patch for JDK-6378256. > http://cr.openjdk.java.net/~thartmann/6378256/webrev.02/ > > Thanks, > Rahul > >> -----Original Message----- >> From: Tobias Hartmann > Sent: Monday, January 25, 2016 12:40 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net >> >> Hi Rahul, >> >> On 22.01.2016 17:11, Rahul Raghavan wrote: >>> >>>> -----Original Message----- >>>> From: Tobias Hartmann > Sent: Monday, January 11, 2016 2:56 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net >>>> >>>> Hi Rahul, >>>> >>>>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ >>>> >>>> Why don't you use 'markOopDesc::hash_mask_in_place' for the 64 bit version? This should safe some instructions and you also >> don't >>>> need the 'hash' register if you compute everything in 'result'. >>> >>> Thank you for your comments Tobias. >>> >>> I could not get the implementation work with the usage of 'markOopDesc::hash_mask_in_place' in x86_64 (similar to support in >> x86_32). >>> Usage of - __ andptr(result, markOopDesc::hash_mask_in_place); >>> Results in build error - ' overflow in implicit constant conversion' >>> >>> Then understood from 'sharedRuntime_sparc.cpp', 'markOop.hpp' - that the usage of 'hash_mask_in_place' should be avoided for >> 64-bit because the values are too big! >>> Similar comments in LibraryCallKit::inline_native_hashcode [hotspot/src/share/vm/opto/library_call.cpp] also. >>> Could not find some other way to use hash_mask_in_place here for x86_64? >> >> You are right, I missed that. >> >>> So depending on markOopDesc::hash_mask, markOopDesc::hash_shift value instead (similar to done in sharedRuntime_sparc) >>> Added missing comment regarding above in the revised webrev. >>> >>> Also yes I missed the optimized codegen. >>> Tried revised patch removing usages of extra 'hash', 'mask' registers and computed all in 'result' itself. >>> >>> [sharedRuntime_x86_64.cpp] >>> .................... >>> + Register obj_reg = j_rarg0; >>> + Register result = rax; >>> ........ >>> + // get hash >>> + // Read the header and build a mask to get its hash field. >>> + // Depend on hash_mask being at most 32 bits and avoid the use of hash_mask_in_place >>> + // because it could be larger than 32 bits in a 64-bit vm. See markOop.hpp. >>> + __ shrptr(result, markOopDesc::hash_shift); >>> + __ andptr(result, markOopDesc::hash_mask); >>> + // test if hashCode exists >>> + __ jcc (Assembler::zero, slowCase); >>> + __ ret(0); >>> + __ bind (slowCase); >>> ........ >>> >>> Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. >>> >>> Please send your comments. I can submit revised webrev if all okay. >> >> Looks good. Please send a new webrev. >> >> Best, >> Tobias >> >>> >>>> >>>> Best, >>>> Tobias >>>> >>>> >>>> On 08.01.2016 18:13, Rahul Raghavan wrote: >>>>> Hello, >>>>> >>>>> Please review the following revised patch for JDK-6378256 - >>>>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ >>>>> >>>>> This revised webrev got following changes - >>>>> >>>>> 1) A minor, better optimized code with return 0 at initial stage (instead of continuing to 'slowCase' path), for special/rare null >>>> reference input! >>>>> (as per documentation, test results confirmed it is safe to 'return 0' for null reference input, for System.identityHashCode) >>>>> >>>>> 2) Added similar Object.hashCode, System.identityHashCode optimization support in sharedRuntime_x86_64.cpp. >>>>> >>>>> Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. >>>>> >>>>> Thanks, >>>>> Rahul >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan> Cc: hotspot-compiler- >>>> dev at openjdk.java.net >>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >>>>>> >>>>>> Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again >> would >>>> be >>>>>> nice. >>>>>> Shouldn't we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? >>>>>> >>>>>> Roland. >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Rahul Raghavan > Sent: Wednesday, December 09, 2015 2:43 PM > To: hotspot-compiler-dev at openjdk.java.net >>>>>> >>>>>> Hello, >>>>>> >>>>>> Please review the following patch for JDK-6378256. >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . >>>>>> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven times >>>>>> slower). >>>>>> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). >>>>>> >>>>>> sample unit test: >>>>>> public class Jdk6378256Test >>>>>> { >>>>>> public static void main(String[] args) >>>>>> { >>>>>> Object obj = new Object(); >>>>>> long time = System.nanoTime(); >>>>>> for(int i = 0 ; i < 1000000 ; i++) >>>>>> System.identityHashCode(obj); //compare to obj.hashCode(); >>>>>> System.out.println ("Result = " + (System.nanoTime() - time)); >>>>>> } >>>>>> } >>>>>> >>>>>> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. >>>>>> (looks in the header for the hashCode before calling into the VM). >>>>>> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. >>>>>> So also added required additional null check for System.identityHashCode case. >>>>>> >>>>>> Testing: >>>>>> - successful JPRT run (-testset hotspot). >>>>>> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). >>>>>> (with -client / -XX:TieredStopAtLevel=1 etc. options). >>>>>> - Added 'noreg-perf' label for this performance bug. >>>>>> Manual testing done and confirmed expected performance values for unit tests with fix. >>>>>> >>>>>> Thanks, >>>>>> Rahul From roland.westrelin at oracle.com Tue Jan 26 09:18:22 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 26 Jan 2016 10:18:22 +0100 Subject: RFR(XS): 8147853: "assert(t->meet(t0) == t) failed: Not monotonic" with sun/util/calendar/zi/TestZoneInfo310.java In-Reply-To: <56A660E1.5020300@oracle.com> References: <56A276AE.8060408@oracle.com> <8686820A-60A2-4CEC-AEAC-DD3A74369323@oracle.com> <56A660E1.5020300@oracle.com> Message-ID: Thanks for the review, Vladimir. Roland. From doug.simon at oracle.com Tue Jan 26 09:28:16 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 26 Jan 2016 10:28:16 +0100 Subject: RFR: 8148202: move lookup of Java class and hub from ResolvedJavaType to ConstantReflectionProvider Message-ID: Most access to VM constants in JVMCI goes through the ConstantReflectionProvider interface meaning the VM implementation for constant handling is in one place. For historic reasons, some small amount of reflection on VM constants was located in ResolvedJavaType. This issue consolidates these methods to ConstantReflectionProvider. https://bugs.openjdk.java.net/browse/JDK-8148202 http://cr.openjdk.java.net/~dnsimon/8148202 -Doug From nils.eliasson at oracle.com Tue Jan 26 10:40:06 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 26 Jan 2016 11:40:06 +0100 Subject: RFR(S): 8063112: Compiler diagnostic commands should have locking instead of safepoint In-Reply-To: <56A281C3.6010408@oracle.com> References: <56A23F61.9000201@oracle.com> <56A281C3.6010408@oracle.com> Message-ID: <56A74D06.7030408@oracle.com> Hi Vladimir, On 2016-01-22 20:23, Vladimir Kozlov wrote: > Why you need new print method? Why you can't use existing print()? > Also I prefer to get current compilation tasks print in separate lines > - not in the list of threads. Then you don't need to use new print? Works for me. I moved it directly after the existing thread printing: --------------- P R O C E S S --------------- Java Threads: ( => current thread ) 0x00007f4cfc485000 JavaThread "Service Thread" daemon [_thread_blocked, id=22409, stack(0x00007f4bf1c5e000,0x00007f4bf1d5f000)] 0x00007f4cfc476000 JavaThread "Sweeper thread" daemon [_thread_blocked, id=22408, stack(0x00007f4bf1d5f000,0x00007f4bf1e60000)] ... stack(0x00007f4bf35db000,0x00007f4bf36dc000)] 0x00007f4cfc018800 JavaThread "main" [_thread_in_vm, id=22332, stack(0x00007f4d05c78000,0x00007f4d05d79000)] Other Threads: 0x00007f4cfc3ea000 VMThread [stack: 0x00007f4bf36dc000,0x00007f4bf37dd000] [id=22388] 0x00007f4cfc486800 WatcherThread [stack: 0x00007f4bf1b5d000,0x00007f4bf1c5e000] [id=22410] Threads with active compile tasks: 0x00007f4cfc46a800 id=22403 Compiling: 244 1 3 java.lang.String::isLatin1 (19 bytes) > > I am worry about using locks for printing because print code also has > locks. Do we really have to have locks here? The output for these > directives is local bufferedStream. As I understand it is separate for > each directive. So why you need lock? Or VM operation as before? I think you are mixing my two RFRs together - this change doesn't print directives. I am removing vm_ops from three diagnostic commands that uses code that expects safepoint or lock. Some of the commands are really quick, and requesting a safepoint is overkill when it can be done concurrently. Only new lock taken is the thread lock when iterating the compiler threads from the Compiler.queue jcmd. The thread lock is ranked so it can not be reordered with the compile.queue lock. I cleaned it up a bit further and removed the unused print_compiler_threads_on(...) from compileBroker. It is printed in JavaThread::print_on(..) where all the other thread info is located. Hs_err-file looks like the example above. jcmd Thread.print looks like this for compiling threads: C1 CompilerThread13" #19 daemon prio=9 os_prio=0 tid=0x00007f8748471800 nid=0x7732 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE JavaThread state: _thread_in_native Thread: 0x00007f8748471800 [0x7732] State: _at_safepoint _has_called_back 0 _at_poll_safepoint 0 JavaThread state: _thread_in_native Compiling: 716 b 2 java.util.regex.Pattern::compile (406 bytes) And Compiler.queue looks like this: "Current compiles: C1 CompilerThread14 435 b 2 java.net.URLStreamHandler::parseURL (1166 bytes) C1 compile queue: Empty C2 compile queue: Empty" New webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.04/ Regards, Nils > > Thanks, > Vladimir > > On 1/22/16 6:40 AM, Nils Eliasson wrote: >> Hi, >> >> Please review. >> >> Summary: >> Firstly this change removes the unnecessary vm-ops from three compiler >> diagnostic commands and adds locking instead. >> Secondly the Compiler.queue diagnostic command is improved with printing >> of any active compilations. I found this useful when diagnosing a >> rouge VM. >> Thirdly, as a bonus, I also add printing of active compilations in the >> thread section of the hs_err file. Very useful when investigating VMs >> terminated by a timeout. >> >> Testing: >> This does not pass all tests yet. A few tests is dependent on the output >> from the diagnostic command, and I want to be sure the reviewers are >> happy with the output format first. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8063112 >> Webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.02/ >> >> Regards, >> Nils >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivan at azulsystems.com Tue Jan 26 10:59:58 2016 From: ivan at azulsystems.com (Ivan Krylov) Date: Tue, 26 Jan 2016 13:59:58 +0300 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic Message-ID: <56A751AE.9090203@azulsystems.com> Hello, Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. Thanks, Ivan [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry.fazunenko at oracle.com Tue Jan 26 12:02:29 2016 From: dmitry.fazunenko at oracle.com (Dmitry Fazunenko) Date: Tue, 26 Jan 2016 15:02:29 +0300 Subject: RFR(S) : 8148012 : get rid of slash-dot-dot in @library directives In-Reply-To: References: Message-ID: <56A76055.5070709@oracle.com> Hi Igor, the GC part of fix looks good! Thank you for caring of GC tests -- Dima On 25.01.2016 22:36, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8148012/webrev.00/ >> 31 lines changed: 0 ins; 0 del; 31 mod; > Having ?external.lib.roots=/../../' in TEST.ROOT made it possible not to use ?/..? in tests, the latest change in JTreg(https://bugs.openjdk.java.net/browse/CODETOOLS-7901585) made it illegal to have references to directories outside a test suite, so it?s required to fix all such entries for switching to the next jtreg version. the fix basically replaces all '/../../test/lib? w/ ?/test/lib? in @library directives. > > testing: run the affected tests > jbs: https://bugs.openjdk.java.net/browse/JDK-8148012 > > Thanks, > Igor From edward.nevill at gmail.com Tue Jan 26 14:25:08 2016 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 26 Jan 2016 14:25:08 +0000 Subject: RFR: 8148240: random infrequent null pointer exceptions in javac Message-ID: <1453818308.16279.8.camel@mylittlepony.linaroharston> Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8148240/webrev/ Jira issue: https://bugs.openjdk.java.net/browse/JDK-8148240 The patch simply works around the issue by disabling FP as an allocatable register. Thanks, Ed. From aph at redhat.com Tue Jan 26 14:34:58 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 26 Jan 2016 14:34:58 +0000 Subject: [aarch64-port-dev ] RFR: 8148240: random infrequent null pointer exceptions in javac In-Reply-To: <1453818308.16279.8.camel@mylittlepony.linaroharston> References: <1453818308.16279.8.camel@mylittlepony.linaroharston> Message-ID: <56A78412.9060203@redhat.com> On 01/26/2016 02:25 PM, Edward Nevill wrote: > Hi, > > Please review the following webrev > > http://cr.openjdk.java.net/~enevill/8148240/webrev/ > > Jira issue: https://bugs.openjdk.java.net/browse/JDK-8148240 > > The patch simply works around the issue by disabling FP as an allocatable register. Yes, thanks. OK for JDK9 and backports to AArch64 JDK8 and 7. Andrew. From zoltan.majo at oracle.com Tue Jan 26 16:43:15 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 26 Jan 2016 17:43:15 +0100 Subject: [9] RFR (S): 8146478: Node limit exceeded with -XX:AllocateInstancePrefetchLines=1073741823 Message-ID: <56A7A223.9050403@oracle.com> Hi, please review the patch for 8146478. https://bugs.openjdk.java.net/browse/JDK-8146478 Problem: Setting a high value for AllocateInstancePrefetchLines can trigger an assert in the C2 compiler The reasons is that the number of live nodes exceeds the maximum node limit. The same problem can happen if AllocateInstanceLines is given a high value. Solution: Limit the range for AllocateInstancePrefetchLines/AllocateInstanceLines to 8. I picked the value 8 because - (1) the maximum possible value for theses flags is 4/2, so having a slightly higher value than 4/2 still allows for some experiments; - (2) the node_check() in PhaseMacroExpand::expand_macro_nodes() assumes that each macro node expansion will generate <75 new nodes. The number of nodes generated by expand_allocate_array()/expand_allocate() for 8 prefetched lines closely fits into that margin (experimentally verified). In addition, I removed some code that is that is now unnecessary because of the range checks we have in place. Webrev: http://cr.openjdk.java.net/~zmajo/8146478/webrev.00/ Testing: - JPRT: All JTREG hotspot tests, incl. TestOptionsWithRanges.java Thank you and best regards, Zoltan From vladimir.kozlov at oracle.com Tue Jan 26 19:01:15 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 26 Jan 2016 11:01:15 -0800 Subject: [9] RFR (S): 8146478: Node limit exceeded with -XX:AllocateInstancePrefetchLines=1073741823 In-Reply-To: <56A7A223.9050403@oracle.com> References: <56A7A223.9050403@oracle.com> Message-ID: <56A7C27B.8050004@oracle.com> Where 4/2 number comes from? Some spec runs used higher number: -XX:AllocatePrefetchLines=16 http://spec.org/jbb2005/results/res2009q1/jbb2005-20081203-00563.html I would suggest something like 64 - I never see such number is used. Also, please, limit AllocatePrefetchStepSize range. It corresponds to cache line size. 512 I would say for future proof - with assert that check that its setting in vm_Version_.cpp is in these range. For the case AllocatePrefetchStyle == 2 number of lines is calculated as: uint lines = AllocatePrefetchDistance / AllocatePrefetchStepSize; Since AllocatePrefetchDistance limit is big you can get a lot of nodes again. May be also set the limit - AllocatePrefetchLines*AllocatePrefetchStepSize 64*32 = 2048. Thanks, Vladimir On 1/26/16 8:43 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for 8146478. > > https://bugs.openjdk.java.net/browse/JDK-8146478 > > Problem: Setting a high value for AllocateInstancePrefetchLines can > trigger an assert in the C2 compiler The reasons is that the number of > live nodes exceeds the maximum node limit. The same problem can happen > if AllocateInstanceLines is given a high value. > > Solution: > Limit the range for AllocateInstancePrefetchLines/AllocateInstanceLines > to 8. I picked the value 8 because > - (1) the maximum possible value for theses flags is 4/2, so having a > slightly higher value than 4/2 still allows for some experiments; > - (2) the node_check() in PhaseMacroExpand::expand_macro_nodes() assumes > that each macro node expansion will generate <75 new nodes. The number > of nodes generated by expand_allocate_array()/expand_allocate() for 8 > prefetched lines closely fits into that margin (experimentally verified). > > In addition, I removed some code that is that is now unnecessary because > of the range checks we have in place. > > > Webrev: > http://cr.openjdk.java.net/~zmajo/8146478/webrev.00/ > > Testing: > - JPRT: All JTREG hotspot tests, incl. TestOptionsWithRanges.java > > Thank you and best regards, > > > Zoltan > From christian.thalinger at oracle.com Tue Jan 26 20:16:05 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 26 Jan 2016 10:16:05 -1000 Subject: RFR: 8148202: move lookup of Java class and hub from ResolvedJavaType to ConstantReflectionProvider In-Reply-To: References: Message-ID: Looks good. For the record, this was contributed by Christian Wimmer. > On Jan 25, 2016, at 11:28 PM, Doug Simon wrote: > > Most access to VM constants in JVMCI goes through the ConstantReflectionProvider interface meaning the VM implementation for constant handling is in one place. For historic reasons, some small amount of reflection on VM constants was located in ResolvedJavaType. This issue consolidates these methods to ConstantReflectionProvider. > > https://bugs.openjdk.java.net/browse/JDK-8148202 > http://cr.openjdk.java.net/~dnsimon/8148202 > > -Doug From christian.thalinger at oracle.com Tue Jan 26 21:05:01 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 26 Jan 2016 11:05:01 -1000 Subject: RFR: 8148202: move lookup of Java class and hub from ResolvedJavaType to ConstantReflectionProvider In-Reply-To: References: Message-ID: <68B93E91-BBFA-4C14-A427-18FB7A6AC842@oracle.com> Correction, two tests fail: FAILED: compiler/jvmci/code/DataPatchTest.java FAILED: compiler/jvmci/code/SimpleDebugInfoTest.java > On Jan 26, 2016, at 10:16 AM, Christian Thalinger wrote: > > Looks good. For the record, this was contributed by Christian Wimmer. > >> On Jan 25, 2016, at 11:28 PM, Doug Simon wrote: >> >> Most access to VM constants in JVMCI goes through the ConstantReflectionProvider interface meaning the VM implementation for constant handling is in one place. For historic reasons, some small amount of reflection on VM constants was located in ResolvedJavaType. This issue consolidates these methods to ConstantReflectionProvider. >> >> https://bugs.openjdk.java.net/browse/JDK-8148202 >> http://cr.openjdk.java.net/~dnsimon/8148202 >> >> -Doug > From doug.simon at oracle.com Tue Jan 26 21:23:19 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 26 Jan 2016 22:23:19 +0100 Subject: RFR: 8148202: move lookup of Java class and hub from ResolvedJavaType to ConstantReflectionProvider In-Reply-To: <68B93E91-BBFA-4C14-A427-18FB7A6AC842@oracle.com> References: <68B93E91-BBFA-4C14-A427-18FB7A6AC842@oracle.com> Message-ID: <986242A4-21E8-4EB7-9486-EB1A49D0E365@oracle.com> I fixed these tests and updated the webrev in situ. > On 26 Jan 2016, at 22:05, Christian Thalinger wrote: > > Correction, two tests fail: > > FAILED: compiler/jvmci/code/DataPatchTest.java > FAILED: compiler/jvmci/code/SimpleDebugInfoTest.java > >> On Jan 26, 2016, at 10:16 AM, Christian Thalinger wrote: >> >> Looks good. For the record, this was contributed by Christian Wimmer. >> >>> On Jan 25, 2016, at 11:28 PM, Doug Simon wrote: >>> >>> Most access to VM constants in JVMCI goes through the ConstantReflectionProvider interface meaning the VM implementation for constant handling is in one place. For historic reasons, some small amount of reflection on VM constants was located in ResolvedJavaType. This issue consolidates these methods to ConstantReflectionProvider. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8148202 >>> http://cr.openjdk.java.net/~dnsimon/8148202 >>> >>> -Doug >> > From christian.thalinger at oracle.com Tue Jan 26 21:28:41 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 26 Jan 2016 11:28:41 -1000 Subject: RFR: 8148202: move lookup of Java class and hub from ResolvedJavaType to ConstantReflectionProvider In-Reply-To: <986242A4-21E8-4EB7-9486-EB1A49D0E365@oracle.com> References: <68B93E91-BBFA-4C14-A427-18FB7A6AC842@oracle.com> <986242A4-21E8-4EB7-9486-EB1A49D0E365@oracle.com> Message-ID: <4CA58378-4063-466A-832F-F225EA39EA71@oracle.com> Now we are good: Test results: passed: 61; error: 4 > On Jan 26, 2016, at 11:23 AM, Doug Simon wrote: > > I fixed these tests and updated the webrev in situ. > >> On 26 Jan 2016, at 22:05, Christian Thalinger wrote: >> >> Correction, two tests fail: >> >> FAILED: compiler/jvmci/code/DataPatchTest.java >> FAILED: compiler/jvmci/code/SimpleDebugInfoTest.java >> >>> On Jan 26, 2016, at 10:16 AM, Christian Thalinger wrote: >>> >>> Looks good. For the record, this was contributed by Christian Wimmer. >>> >>>> On Jan 25, 2016, at 11:28 PM, Doug Simon wrote: >>>> >>>> Most access to VM constants in JVMCI goes through the ConstantReflectionProvider interface meaning the VM implementation for constant handling is in one place. For historic reasons, some small amount of reflection on VM constants was located in ResolvedJavaType. This issue consolidates these methods to ConstantReflectionProvider. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8148202 >>>> http://cr.openjdk.java.net/~dnsimon/8148202 >>>> >>>> -Doug >>> >> > From john.r.rose at oracle.com Tue Jan 26 22:48:22 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 26 Jan 2016 14:48:22 -0800 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <569F7558.1030800@oracle.com> References: <569CE098.4030807@oracle.com> <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> <569F7558.1030800@oracle.com> Message-ID: What I would like to see is for users to feel free to use CallSites with any amount of mutability, and have the JVM pick a good strategy for speculating and optimizing through CS target bindings. By "good" I mean that, if the CS is not megamutable, you get the performance comparable to an "invokestatic". But if the CS *is* megamutable (unstable), it is not "good" (IMO) to issue a storm of recompilations, especially if (as is usually the case) the megamutable CS is one of 1000s of other call sites in the same code blob, all of which must be recompiled because one CS had a problem. Instead, the megamutable CS should be downgraded to an indirect call through a normal (or volatile) variable. So does this leave some performance on the floor? Of course; perhaps the CS finally settles down long enough for the JVM to venture a profitable recompilation, and for the cost of recompilation to be paid off by further stability and efficient execution of the CS. My main point here is that reoptimization of megamutables is a misuse of speculation. I'm not saying that the JIT should have a tantrum and refuse to compile the call site (which is a bug), but it should stop speculating that it is stable when in fact it is not. There are lots of ways to improve the performance of megamutables, but unconditional recompilation is not one of those ways. It uses a wrecking ball to swat a fly. Handling megamutables is very much like handling megamorphics. You want to hang on to the hope that there are really just a few branches (common case) and optimize those, and call out-of-line for the rest. If that hope fails, you call out-of-line always. And you want to detect if the statistics change, where the entropy of the CS target goes down to a small number, so you can venture another recompile with up-to-date speculation. We should apply these techniques to both megamorphics and megamutables. So there's an ambiguity in the contract: Is CS speculation just a best-efforts kind of thing, or is the JVM contracted to mechanically recompile on every CS change? I think the reasonable reading of the javadoc (etc.) is the first, not the second. How would a user communicate that his CS is a special one, whose invalidation should *always* trigger reoptimization? I don't know, maybe an integer-valued callback that is triggered during setTarget calls, and returns the amount of (virtual) time before the next reoptimization should be attempted. The callback would be passed the number of previous reoptimizations (at this site or in the whole method or both), as a warning of how resource-intensive this CS is becoming. Returning constant zero means the current behavior. I think you can see lots of problems with such an API. And, I think that sort of thing isn't notably better than simple JVM heuristics. Here's how I think we should fix the megamutable problem: 1. Speculate at first that a CS is immutable. 2. If that fails, speculate that it is stable, as: if (cs.t == expected) inline expected(); else outline cs.t(); Collect a profile count along the outline path. 3. Every once in a while, if a code blog is accumulating outline counts, queue it for reoptimization. Crucially, do this in such a way that the JIT does not become a foreground consumer of CPU cycles. 4. When recompiling a stable call site, always inline the current target ("this time fer sure!"). Maybe if this is a *really* bad actor (but how can you tell?) forget the speculation part. 5. Maybe, speculate on the LF of the target, not the target itself, to allow some degree of harmless variation by targets. (For some codes that will help, although it interacts with MH customization in tricky ways.) 6. Maybe fiddle with collecting previous hot targets, or (better) empower the JDK code to manage that stuff. PIC logic should be handled at the JDK level, not in the JIT. Anyway, if the above gets addressed eventually, or if the rest of the MLVM crew proves that I don't know what I'm talking about, I'm OK with this fix. "Reviewed", assuming future improvements. ? John On Jan 20, 2016, at 3:54 AM, Vladimir Ivanov wrote: > > John, Chris, thanks for the feedback. > > I don't think it is only about microbenchmarks. Long-running large applications with lots of mutable call sites should also benefit for this change. Current JVM behavior counts invalidations on root method, so nmethods with multiple mutable call sites (from root & all inlined callees) are more likely to hit the limit, even if there's no mega-mutable sites. It just sums up and PerMethodRecompilationCutoff (= 400, by default) doesn't look like a huge number. > > Also, LambdaForm sharing somewhat worsen the situation. When LambdaForms were mostly customized, different method handle chains were compiled into a single nmethod. Right now, it means that not only the root method is always interpreted, but all bound method handle chains are broken into numerous per-LF nmethods (see JDK-8069591 for some details). > > MLVM folks, I'd like to hear your opinion about what kind of behavior do you expect from JVM w.r.t. mutable call sites. > > There are valid use-cases when JVM shouldn't throttle the recompilation (e.g., long-running application with indy-based dynamic tracing). Maybe there's a place for a new CallSite flavor to clearly communicate application expectations to the JVM? Either always recompile (thus eventually reaching peak performance) or give up and generate less efficient machine code, but save on possible recompilations. > > Best regards, > Vladimir Ivanov > > On 1/20/16 2:37 AM, John Rose wrote: >> On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov >> > wrote: >>> >>> The fix is to avoid updating recompilation count when corresponding >>> nmethod is invalidated due to a call site target change. >> >> Although I'm not vetoing it (since it seems it will help customers in >> the short term), I'm uncomfortable with this fix because it doesn't >> scale to large dyn. lang. applications with many unstable call sites. >> Put another way, it feels like we are duct-taping down a failsafe >> switch (against infinite recompilation) in order to spam a >> micro-benchmark: a small number mega-mutable call sites for which we >> are willing to spend (potentially) all of the JIT resources, including >> those usually allocated to application performance in the steady state. >> Put a third way: I am not comfortable with unthrottled infinite >> recompilation as a performance strategy. >> >> I've commented on the new RFE (JDK-8147550) where to go next, including >> the following sentiments: >> >>> There is a serious design tension here, though: Some users apparently >>> are willing to endure an infinite series of recompilations as part of >>> the cost of doing business; JDK-7177745 addresses this need by turning >>> off the fail-safe against (accidental, buggy) infinite recompilation >>> for unstable CSs. Other users might find that having a percentage of >>> machine time devoted to recompilation is a problem. (This has been the >>> case in the past with non-dynamic languages, at least.) The code shape >>> proposed in this bug report would cover all simple unstable call >>> sites (bi-stable, for example, would compile to a bi-morphic call), >>> but, in pathological cases (infinite sequence of distinct CS targets) >>> would "settle down" into a code shape that would be sub-optimal for >>> any single target, but (as an indirect MH call) reasonable for all the >>> targets together. >>> >>> In the absence of clear direction from the user or the profile, the >>> JVM has to choose infinite recompilation or a good-enough final >>> compilation. The latter choice is safer. And the >>> infinite recompilation is less safe because there is no intrinsic >>> bound on the amount of machine cycles that could be diverted to >>> recompilation, given a dynamic language application with >>> enough mega-mutable CSs. Settling down to a network of indirect calls >>> has a bounded cost. >>> >>> Yes, one size-fits-all tactics never please everybody. But the JVM >>> should not choose tactics with unlimited downsides. >> >> ? John From john.r.rose at oracle.com Tue Jan 26 23:18:53 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 26 Jan 2016 15:18:53 -0800 Subject: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode In-Reply-To: <2036838501.1079316.1453292009390.JavaMail.zimbra@u-pem.fr> References: <569CE098.4030807@oracle.com> <894B7E15-D940-4EC5-8E4B-CF48B557A86D@oracle.com> <2036838501.1079316.1453292009390.JavaMail.zimbra@u-pem.fr> Message-ID: <525DDA2D-9676-43CF-8D75-C5ED52031E73@oracle.com> On Jan 20, 2016, at 4:13 AM, Remi Forax wrote: > > I understand that having the VM that may always recompile may be seen as a bug, > but having the VM that bailout and stop recompiling, or more generally change the compilation strategy is a bug too. As you can guess from my previous message, I agree with this, except for "change the compilation strategy". The JVM earns its way in the world by routinely changing compilation strategy. The reason most people don't notice is the strategy changes are profile-driven and self-correcting. Nothing in the 292 world promises a particular strategy, just a best effort to create and execute great code, assuming stable application behavior. When an optimization breaks, the JVM's strategy may also fail to adjust correctly. One symptom of that is infinite recompilation, usually because one line of code is being handled badly, but which creates huge bloat in the code cache for thousands of lines of code that happen to be inlined nearby. We try hard to avoid this. We also try hard to detect this problem. That is the true meaning of those strange cutoffs. Nobody things falling into the interpreter is a good idea, except that it, on balance, is a better idea than (a) throwing an assertion error, or (b) filling the CPU with JIT jobs and the code cache with discards. The third choice (c) run offending method in the interpreter at least preserves a degree of forward progress, while allowing the outraged user to report a bug. The correct fix to the bug, IMO, is never to jump from (c) to (a) or (b). It is to find and fix the problem with the compilation strategy, and the profile-driven gating logic for it. If your car's transmission gets a bug (now that they are computers, they can), what would you prefer? (a) stop the car immediately, (b) run the car in first gear at full speed, or (c) slow the car to a defined speed limit (25mph). Detroit prefers, and the JVM implements, option (c). > The problem here is that there is no way from the point of view of a dyn lang runtime to know what will be the behavior of the VM for a callsite if the VM decide to stop to recompile, decide to not inline, decide to inline some part of the tree, etc. Yes. And it usually doesn't matter; the issue doesn't come up until something breaks, or we find a performance pothole. The current problem is (in my mind) a break, not a performance pothole that needs tuning. If we fix the break, people shouldn't need to worry about this stuff, usually. > Said differently, using an invokedynamic allows to create code shapes that will change dynamically, if the VM behavior also changes dynamically, it's like building a wall on moving parts, the result is strange dynamic behaviors that are hard to diagnose and reproduce. JVMs have always been like that, because of dynamic class loading, but with indy it is more so, since it's much easier to "override" some previously fixed behavior. > The recompilation behavior of the VM should be keep simple and predicatable, basically, the VM should always recompile the CS with no failsafe switch. We agree that the failsafe should not trip. Just like we agree that the circuit breakers in our building should not trip. We disagree, perhaps, about what to do when they trip. I don't want to duct-tape them back into the "on" position; do you? > If dyn lang runtime devs have trouble with that, they can already use an exactInvoker to simulate an indirect mh call and we can even provide new method handle combiners to gracefully handle multi-stable CS. That's all true. The new combiners might have some sort of handshake with the JVM to self-adjust their code shape. But I claim the baseline behavior that I have called for is the most generally useful, since it is able to amortize recompilation resources over multiple CS misses, put global limits on total recompilation effort, and preserve reasonable forward progress executing good-enough code. (Having a CS change force a reoptimization is tantamount to adding a JIT control API, as Compiler.recompile(cs) like System.gc(). But just for CS-bearing methods. We are a long way from understanding how to work such an API.) Idea: Perhaps CS's should have a callback which says, "Hey, CS, the JIT has mispredicted you a bunch of times; would you like to nominate an alternative representation?" The call would be made asynchronously, outside the JIT. The default behavior would be to say "nope" with the results given above, but the CS could also return a MH (perhaps the CS.dynamicInvoker, or perhaps some more elaborate logic), which the JVM would slide into place over the top of the CS. Despite the fact that CS bindings are final, the new binding would take its place. And it would be the user's choice whether that binding pointed to the old CS or a new CS or some combination of both. ? John From igor.veresov at oracle.com Wed Jan 27 00:44:53 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 26 Jan 2016 16:44:53 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56A751AE.9090203@azulsystems.com> References: <56A751AE.9090203@azulsystems.com> Message-ID: So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? igor > On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: > > Hello, > > Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832 . There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: > http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ > > At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 > Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ > > The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. > > The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. > > Thanks, > > Ivan > > [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 27 02:15:25 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 26 Jan 2016 21:15:25 -0500 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> Message-ID: Subsequent loads at this point will likely be polls of same memory location that just failed a test, and the author inserted a pause. It's unlikely that the memory changed that quickly and scheduling the next load before the pause is equivalent to two loads back to back essentially, which wouldn't make sense given the intended usage. There's also the risk that the compiler would move enough of those load+test pairs before the pause and fill up the speculative pipeline with them; that pipeline will need to be flushed once the spin exits since those load instructions likely speculated incorrectly. And here we're basically describing the reason for putting pause there in the first place :). On Tuesday, January 26, 2016, Igor Veresov wrote: > So, why does the new node have a memory effect? That would seem to prevent > any movement of the subsequent loads in your loop, right? If that?s > intentional I wonder why is that? > > igor > > On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: > > Hello, > > Some of you may have a seen a few e-mails on the core-libs alias about a > proposed ?spin wait hint?. The JEP is forming up nicely at > https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a > consensus on the API side. It is now in a draft state and I hope this JEP > will get targeted for java 9 shortly. The upcoming API changes can be seen > at the webrev: > http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ > > At this time I would like to ask for a review of the hs-comp changes. The > plan is push changes into class libraries and hotspot synchronously but > that may happen after the JEP gets targeted. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 > Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ > > The idea of the fix is pretty simple: hotspot replaces a call to > java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a > 'pause' instruction on x86. This intrinsic is guarded by the > -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a > verification code that makes sure the flag is off, VM will just execute at > empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. > According the [1] the 'pause' instruction is functional since SSE2, but > even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence > harmless, there seems to be no need to add guarding code for older > generations of Intel CPUs. > > The proposed patch includes a simple regression test that simply makes > sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There > are several other producer-consumer-like performance tests ready that the > authors of this JEP would be happy to make available under JEP-230 but I am > uncertain about the process. > > Thanks, > > Ivan > > [1] - > https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Jan 27 03:56:52 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 26 Jan 2016 19:56:52 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> Message-ID: <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> Wouldn?t you use a volatile load for the memory location you?re polling? igor > On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich wrote: > > Subsequent loads at this point will likely be polls of same memory location that just failed a test, and the author inserted a pause. It's unlikely that the memory changed that quickly and scheduling the next load before the pause is equivalent to two loads back to back essentially, which wouldn't make sense given the intended usage. There's also the risk that the compiler would move enough of those load+test pairs before the pause and fill up the speculative pipeline with them; that pipeline will need to be flushed once the spin exits since those load instructions likely speculated incorrectly. And here we're basically describing the reason for putting pause there in the first place :). > > On Tuesday, January 26, 2016, Igor Veresov > wrote: > So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? > > igor > >> On Jan 26, 2016, at 2:59 AM, Ivan Krylov > wrote: >> >> Hello, >> >> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832 . There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >> >> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >> >> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >> >> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >> >> Thanks, >> >> Ivan >> >> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops > > > -- > Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 27 04:08:00 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 26 Jan 2016 23:08:00 -0500 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> References: <56A751AE.9090203@azulsystems.com> <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> Message-ID: You would but subsequent volatile load could move before the pause. If you unroll the loop, you could (theoretically) end up with all loads moved before the pause but all appearing ordered with respect to each other, eg: cmp addr, 0 // from iteration 1 je label cmp addr, 0 // from iteration 2 je label ... pause What prevents that if pause is not a compiler member? On Tuesday, January 26, 2016, Igor Veresov wrote: > Wouldn?t you use a volatile load for the memory location you?re polling? > > igor > > On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich > wrote: > > Subsequent loads at this point will likely be polls of same memory > location that just failed a test, and the author inserted a pause. It's > unlikely that the memory changed that quickly and scheduling the next load > before the pause is equivalent to two loads back to back essentially, which > wouldn't make sense given the intended usage. There's also the risk that > the compiler would move enough of those load+test pairs before the pause > and fill up the speculative pipeline with them; that pipeline will need to > be flushed once the spin exits since those load instructions likely > speculated incorrectly. And here we're basically describing the reason for > putting pause there in the first place :). > > On Tuesday, January 26, 2016, Igor Veresov > wrote: > >> So, why does the new node have a memory effect? That would seem to >> prevent any movement of the subsequent loads in your loop, right? If that?s >> intentional I wonder why is that? >> >> igor >> >> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >> >> Hello, >> >> Some of you may have a seen a few e-mails on the core-libs alias about a >> proposed ?spin wait hint?. The JEP is forming up nicely at >> https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a >> consensus on the API side. It is now in a draft state and I hope this JEP >> will get targeted for java 9 shortly. The upcoming API changes can be seen >> at the webrev: >> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >> >> At this time I would like to ask for a review of the hs-comp changes. The >> plan is push changes into class libraries and hotspot synchronously but >> that may happen after the JEP gets targeted. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >> >> The idea of the fix is pretty simple: hotspot replaces a call to >> java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a >> 'pause' instruction on x86. This intrinsic is guarded by the >> -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a >> verification code that makes sure the flag is off, VM will just execute at >> empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. >> According the [1] the 'pause' instruction is functional since SSE2, but >> even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence >> harmless, there seems to be no need to add guarding code for older >> generations of Intel CPUs. >> >> The proposed patch includes a simple regression test that simply makes >> sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There >> are several other producer-consumer-like performance tests ready that the >> authors of this JEP would be happy to make available under JEP-230 but I am >> uncertain about the process. >> >> Thanks, >> >> Ivan >> >> [1] - >> https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >> >> >> > > -- > Sent from my phone > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Jan 27 04:47:43 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 26 Jan 2016 20:47:43 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> Message-ID: > On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich wrote: > > You would but subsequent volatile load could move before the pause. If you unroll the loop, you could (theoretically) end up with all loads moved before the pause but all appearing ordered with respect to each other, eg: > > cmp addr, 0 // from iteration 1 > je label > cmp addr, 0 // from iteration 2 > je label > ... > pause > > What prevents that if pause is not a compiler member? > I think volatile loads explicitly depend on control. If the pause node consumes and produces control it all should be in a rigid control chain. Other regular loads (that don?t have control dependencies) would still be free to move around. igor > On Tuesday, January 26, 2016, Igor Veresov > wrote: > Wouldn?t you use a volatile load for the memory location you?re polling? > > igor > >> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich > wrote: >> >> Subsequent loads at this point will likely be polls of same memory location that just failed a test, and the author inserted a pause. It's unlikely that the memory changed that quickly and scheduling the next load before the pause is equivalent to two loads back to back essentially, which wouldn't make sense given the intended usage. There's also the risk that the compiler would move enough of those load+test pairs before the pause and fill up the speculative pipeline with them; that pipeline will need to be flushed once the spin exits since those load instructions likely speculated incorrectly. And here we're basically describing the reason for putting pause there in the first place :). >> >> On Tuesday, January 26, 2016, Igor Veresov > wrote: >> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >> >> igor >> >>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov > wrote: >>> >>> Hello, >>> >>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832 . There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>> >>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>> >>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>> >>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>> >>> Thanks, >>> >>> Ivan >>> >>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >> >> >> -- >> Sent from my phone > > > > -- > Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 27 05:35:37 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 27 Jan 2016 00:35:37 -0500 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> Message-ID: On Tuesday, January 26, 2016, Igor Veresov wrote: > > On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich > wrote: > > You would but subsequent volatile load could move before the pause. If > you unroll the loop, you could (theoretically) end up with all loads moved > before the pause but all appearing ordered with respect to each other, eg: > > cmp addr, 0 // from iteration 1 > je label > cmp addr, 0 // from iteration 2 > je label > ... > pause > > What prevents that if pause is not a compiler member? > > > I think volatile loads explicitly depend on control. If the pause node > consumes and produces control it all should be in a rigid control chain. > Other regular loads (that don?t have control dependencies) would still be > free to move around. > Is this to avoid out of thin air values? That is, suppose you have: if (some condition) read volatile (or regular) Regular load can be scheduled before the if and result used if control reaches there. For volatile, load cannot be scheduled above the if since value can be bogus at that point? Is it safe for compiler to assume that something else anchors loads around the pause? That aside, given the intended usage, I'm not sure what other regular loads would be there. The usage is a tight spin loop waiting for exit condition to be met. Although I suppose if compiler sees regular loads after the loop exits successfully, perhaps scheduling them before the loop can be beneficial. Is that what you have in mind? > igor > > On Tuesday, January 26, 2016, Igor Veresov > wrote: > >> Wouldn?t you use a volatile load for the memory location you?re polling? >> >> igor >> >> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich wrote: >> >> Subsequent loads at this point will likely be polls of same memory >> location that just failed a test, and the author inserted a pause. It's >> unlikely that the memory changed that quickly and scheduling the next load >> before the pause is equivalent to two loads back to back essentially, which >> wouldn't make sense given the intended usage. There's also the risk that >> the compiler would move enough of those load+test pairs before the pause >> and fill up the speculative pipeline with them; that pipeline will need to >> be flushed once the spin exits since those load instructions likely >> speculated incorrectly. And here we're basically describing the reason for >> putting pause there in the first place :). >> >> On Tuesday, January 26, 2016, Igor Veresov >> wrote: >> >>> So, why does the new node have a memory effect? That would seem to >>> prevent any movement of the subsequent loads in your loop, right? If that?s >>> intentional I wonder why is that? >>> >>> igor >>> >>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>> >>> Hello, >>> >>> Some of you may have a seen a few e-mails on the core-libs alias about a >>> proposed ?spin wait hint?. The JEP is forming up nicely at >>> https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a >>> consensus on the API side. It is now in a draft state and I hope this JEP >>> will get targeted for java 9 shortly. The upcoming API changes can be seen >>> at the webrev: >>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>> >>> At this time I would like to ask for a review of the hs-comp changes. >>> The plan is push changes into class libraries and hotspot synchronously but >>> that may happen after the JEP gets targeted. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>> >>> The idea of the fix is pretty simple: hotspot replaces a call to >>> java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a >>> 'pause' instruction on x86. This intrinsic is guarded by the >>> -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a >>> verification code that makes sure the flag is off, VM will just execute at >>> empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. >>> According the [1] the 'pause' instruction is functional since SSE2, but >>> even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence >>> harmless, there seems to be no need to add guarding code for older >>> generations of Intel CPUs. >>> >>> The proposed patch includes a simple regression test that simply makes >>> sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There >>> are several other producer-consumer-like performance tests ready that the >>> authors of this JEP would be happy to make available under JEP-230 but I am >>> uncertain about the process. >>> >>> Thanks, >>> >>> Ivan >>> >>> [1] - >>> https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >>> >>> >>> >> >> -- >> Sent from my phone >> >> >> > > -- > Sent from my phone > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Jan 27 06:03:13 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 26 Jan 2016 22:03:13 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> Message-ID: > On Jan 26, 2016, at 9:35 PM, Vitaly Davidovich wrote: > > > > On Tuesday, January 26, 2016, Igor Veresov > wrote: > >> On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich > wrote: >> >> You would but subsequent volatile load could move before the pause. If you unroll the loop, you could (theoretically) end up with all loads moved before the pause but all appearing ordered with respect to each other, eg: >> >> cmp addr, 0 // from iteration 1 >> je label >> cmp addr, 0 // from iteration 2 >> je label >> ... >> pause >> >> What prevents that if pause is not a compiler member? >> > > I think volatile loads explicitly depend on control. If the pause node consumes and produces control it all should be in a rigid control chain. > Other regular loads (that don?t have control dependencies) would still be free to move around. > > Is this to avoid out of thin air values? That is, suppose you have: > > if (some condition) > read volatile (or regular) > > Regular load can be scheduled before the if and result used if control reaches there. For volatile, load cannot be scheduled above the if since value can be bogus at that point? Right. Regular reads can move up anywhere to the preceding memory effect, that modified that alias index. > > Is it safe for compiler to assume that something else anchors loads around the pause? > > That aside, given the intended usage, I'm not sure what other regular loads would be there. The usage is a tight spin loop waiting for exit condition to be met. Although I suppose if compiler sees regular loads after the loop exits successfully, perhaps scheduling them before the loop can be beneficial. Is that what you have in mind? No just simple stuff like: while(?) { a = x.f; pause(); b = x.f; } If pause() is a wide memory kill, regular field loads around it obviously won?t fold. So in the example above those field loads are both going to be there. I realize it?s probably not a big deal in reality for the wait loops, but I was just wondering why make it a wide mem kill if membar nodes for volatiles (that will have to be in the loop) already have wide kill semantics. igor > > > igor > >> On Tuesday, January 26, 2016, Igor Veresov > wrote: >> Wouldn?t you use a volatile load for the memory location you?re polling? >> >> igor >> >>> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich > wrote: >>> >>> Subsequent loads at this point will likely be polls of same memory location that just failed a test, and the author inserted a pause. It's unlikely that the memory changed that quickly and scheduling the next load before the pause is equivalent to two loads back to back essentially, which wouldn't make sense given the intended usage. There's also the risk that the compiler would move enough of those load+test pairs before the pause and fill up the speculative pipeline with them; that pipeline will need to be flushed once the spin exits since those load instructions likely speculated incorrectly. And here we're basically describing the reason for putting pause there in the first place :). >>> >>> On Tuesday, January 26, 2016, Igor Veresov > wrote: >>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >>> >>> igor >>> >>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov > wrote: >>>> >>>> Hello, >>>> >>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832 . There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>> >>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>> >>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>>> >>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>>> >>>> Thanks, >>>> >>>> Ivan >>>> >>>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >>> >>> >>> -- >>> Sent from my phone >> >> >> >> -- >> Sent from my phone > > > > -- > Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Jan 27 06:12:06 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 26 Jan 2016 22:12:06 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> Message-ID: <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> I realize it?s not a big deal. I was just wondering if there was any specific reason control alone is not enough. Anyways, looks ok for the first cut. igor > On Jan 26, 2016, at 9:24 PM, Gil Tene wrote: > > Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and possibly a volatile store), the new node does not create significant extra move restrictions that are not already there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. There are probably other ways to achieve this, but this one doesn't really have a performance downside? > > ? Gil. > >> On Jan 26, 2016, at 4:44 PM, Igor Veresov > wrote: >> >> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >> >> igor >> >>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov > wrote: >>> >>> Hello, >>> >>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832 . There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>> >>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>> >>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>> >>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>> >>> Thanks, >>> >>> Ivan >>> >>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Jan 27 06:30:39 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 26 Jan 2016 22:30:39 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> Message-ID: <412CE10D-193F-48D7-9E94-7D0C4DD9D6FD@oracle.com> Or to put it another way. Memory effect of the pause prevents ordinary loads to float up. However, the control effect of it alone should be enough to prevent the _volatile_ loads to float up, since they are control-dependent. Hence the original thought that the memory effect of the pause might be unnecessarily restrictive if it?s used with volatile loads. But may be I?m missing something. igor > On Jan 26, 2016, at 10:03 PM, Igor Veresov wrote: > >> >> On Jan 26, 2016, at 9:35 PM, Vitaly Davidovich > wrote: >> >> >> >> On Tuesday, January 26, 2016, Igor Veresov > wrote: >> >>> On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich > wrote: >>> >>> You would but subsequent volatile load could move before the pause. If you unroll the loop, you could (theoretically) end up with all loads moved before the pause but all appearing ordered with respect to each other, eg: >>> >>> cmp addr, 0 // from iteration 1 >>> je label >>> cmp addr, 0 // from iteration 2 >>> je label >>> ... >>> pause >>> >>> What prevents that if pause is not a compiler member? >>> >> >> I think volatile loads explicitly depend on control. If the pause node consumes and produces control it all should be in a rigid control chain. >> Other regular loads (that don?t have control dependencies) would still be free to move around. >> >> Is this to avoid out of thin air values? That is, suppose you have: >> >> if (some condition) >> read volatile (or regular) > >> >> Regular load can be scheduled before the if and result used if control reaches there. For volatile, load cannot be scheduled above the if since value can be bogus at that point? > > Right. Regular reads can move up anywhere to the preceding memory effect, that modified that alias index. > >> >> Is it safe for compiler to assume that something else anchors loads around the pause? >> >> That aside, given the intended usage, I'm not sure what other regular loads would be there. The usage is a tight spin loop waiting for exit condition to be met. Although I suppose if compiler sees regular loads after the loop exits successfully, perhaps scheduling them before the loop can be beneficial. Is that what you have in mind? > > > No just simple stuff like: > > while(?) { > a = x.f; > pause(); > b = x.f; > } > > If pause() is a wide memory kill, regular field loads around it obviously won?t fold. So in the example above those field loads are both going to be there. I realize it?s probably not a big deal in reality for the wait loops, but I was just wondering why make it a wide mem kill if membar nodes for volatiles (that will have to be in the loop) already have wide kill semantics. > > igor > > >> >> >> igor >> >>> On Tuesday, January 26, 2016, Igor Veresov > wrote: >>> Wouldn?t you use a volatile load for the memory location you?re polling? >>> >>> igor >>> >>>> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich > wrote: >>>> >>>> Subsequent loads at this point will likely be polls of same memory location that just failed a test, and the author inserted a pause. It's unlikely that the memory changed that quickly and scheduling the next load before the pause is equivalent to two loads back to back essentially, which wouldn't make sense given the intended usage. There's also the risk that the compiler would move enough of those load+test pairs before the pause and fill up the speculative pipeline with them; that pipeline will need to be flushed once the spin exits since those load instructions likely speculated incorrectly. And here we're basically describing the reason for putting pause there in the first place :). >>>> >>>> On Tuesday, January 26, 2016, Igor Veresov > wrote: >>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >>>> >>>> igor >>>> >>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov > wrote: >>>>> >>>>> Hello, >>>>> >>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832 . There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>>> >>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>>> >>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>>>> >>>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>>>> >>>>> Thanks, >>>>> >>>>> Ivan >>>>> >>>>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >>>> >>>> >>>> -- >>>> Sent from my phone >>> >>> >>> >>> -- >>> Sent from my phone >> >> >> >> -- >> Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.v.raghavan at oracle.com Wed Jan 27 11:14:03 2016 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 27 Jan 2016 03:14:03 -0800 (PST) Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: <56A737F6.6030909@oracle.com> References: <56A737F6.6030909@oracle.com> Message-ID: <00a3e7ca-2212-4699-b591-52ceaa9c909b@default> > -----Original Message----- > From: Tobias Hartmann > Sent: Tuesday, January 26, 2016 2:40 PM > To: Rahul Raghavan; hotspot-compiler-dev at openjdk.java.net > > Hi Rahul, > > looks good to me (not a Reviewer). The code in sharedRuntime_x86_64.cpp is much better now! Thank you Tobias. > > Best, > Tobias > > On 25.01.2016 18:02, Rahul Raghavan wrote: > > Hello, > > > > With reference to below email thread, please send review comments for the revised patch for JDK-6378256. > > http://cr.openjdk.java.net/~thartmann/6378256/webrev.02/ > > > > Thanks, > > Rahul > > > >> -----Original Message----- > >> From: Tobias Hartmann > Sent: Monday, January 25, 2016 12:40 PM > To: Rahul Raghavan; hotspot-compiler- > dev at openjdk.java.net > >> > >> Hi Rahul, > >> > >> On 22.01.2016 17:11, Rahul Raghavan wrote: > >>> > >>>> -----Original Message----- > >>>> From: Tobias Hartmann > Sent: Monday, January 11, 2016 2:56 PM > To: Rahul Raghavan; hotspot-compiler- > dev at openjdk.java.net > >>>> > >>>> Hi Rahul, > >>>> > >>>>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ > >>>> > >>>> Why don't you use 'markOopDesc::hash_mask_in_place' for the 64 bit version? This should safe some instructions and you also > >> don't > >>>> need the 'hash' register if you compute everything in 'result'. > >>> > >>> Thank you for your comments Tobias. > >>> > >>> I could not get the implementation work with the usage of 'markOopDesc::hash_mask_in_place' in x86_64 (similar to support in > >> x86_32). > >>> Usage of - __ andptr(result, markOopDesc::hash_mask_in_place); > >>> Results in build error - ' overflow in implicit constant conversion' > >>> > >>> Then understood from 'sharedRuntime_sparc.cpp', 'markOop.hpp' - that the usage of 'hash_mask_in_place' should be avoided > for > >> 64-bit because the values are too big! > >>> Similar comments in LibraryCallKit::inline_native_hashcode [hotspot/src/share/vm/opto/library_call.cpp] also. > >>> Could not find some other way to use hash_mask_in_place here for x86_64? > >> > >> You are right, I missed that. > >> > >>> So depending on markOopDesc::hash_mask, markOopDesc::hash_shift value instead (similar to done in sharedRuntime_sparc) > >>> Added missing comment regarding above in the revised webrev. > >>> > >>> Also yes I missed the optimized codegen. > >>> Tried revised patch removing usages of extra 'hash', 'mask' registers and computed all in 'result' itself. > >>> > >>> [sharedRuntime_x86_64.cpp] > >>> .................... > >>> + Register obj_reg = j_rarg0; > >>> + Register result = rax; > >>> ........ > >>> + // get hash > >>> + // Read the header and build a mask to get its hash field. > >>> + // Depend on hash_mask being at most 32 bits and avoid the use of hash_mask_in_place > >>> + // because it could be larger than 32 bits in a 64-bit vm. See markOop.hpp. > >>> + __ shrptr(result, markOopDesc::hash_shift); > >>> + __ andptr(result, markOopDesc::hash_mask); > >>> + // test if hashCode exists > >>> + __ jcc (Assembler::zero, slowCase); > >>> + __ ret(0); > >>> + __ bind (slowCase); > >>> ........ > >>> > >>> Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. > >>> > >>> Please send your comments. I can submit revised webrev if all okay. > >> > >> Looks good. Please send a new webrev. > >> > >> Best, > >> Tobias > >> > >>> > >>>> > >>>> Best, > >>>> Tobias > >>>> > >>>> > >>>> On 08.01.2016 18:13, Rahul Raghavan wrote: > >>>>> Hello, > >>>>> > >>>>> Please review the following revised patch for JDK-6378256 - > >>>>> http://cr.openjdk.java.net/~thartmann/6378256/webrev.01/ > >>>>> > >>>>> This revised webrev got following changes - > >>>>> > >>>>> 1) A minor, better optimized code with return 0 at initial stage (instead of continuing to 'slowCase' path), for special/rare null > >>>> reference input! > >>>>> (as per documentation, test results confirmed it is safe to 'return 0' for null reference input, for System.identityHashCode) > >>>>> > >>>>> 2) Added similar Object.hashCode, System.identityHashCode optimization support in sharedRuntime_x86_64.cpp. > >>>>> > >>>>> Confirmed no issues with jprt testing (-testset hotspot) and expected results for unit tests. > >>>>> > >>>>> Thanks, > >>>>> Rahul > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Roland Westrelin > Sent: Wednesday, December 09, 2015 8:03 PM > To: Rahul Raghavan> Cc: hotspot-compiler- > >>>> dev at openjdk.java.net > >>>>>> > >>>>>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > >>>>>> > >>>>>> Justifying the comment lines 2019-2022 in sharedRuntime_sparc.cpp (lines 1743-1746 in sharedRuntime_x86_32.cpp) again > >> would > >>>> be > >>>>>> nice. > >>>>>> Shouldn't we use this as an opportunity to add the same optimization to sharedRuntime_x86_64.cpp? > >>>>>> > >>>>>> Roland. > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Rahul Raghavan > Sent: Wednesday, December 09, 2015 2:43 PM > To: hotspot-compiler-dev at openjdk.java.net > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> Please review the following patch for JDK-6378256. > >>>>>> > >>>>>> webrev: http://cr.openjdk.java.net/~thartmann/6378256/webrev.00/ . > >>>>>> > >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-6378256 . > >>>>>> Performance problem with System.identityHashCode, compared to Object.hashCode, with client compiler (at least seven > times > >>>>>> slower). > >>>>>> Issue reproducible for x86_32, SPARC (with -client / -XX:TieredStopAtLevel=1 , 2, 3 options). > >>>>>> > >>>>>> sample unit test: > >>>>>> public class Jdk6378256Test > >>>>>> { > >>>>>> public static void main(String[] args) > >>>>>> { > >>>>>> Object obj = new Object(); > >>>>>> long time = System.nanoTime(); > >>>>>> for(int i = 0 ; i < 1000000 ; i++) > >>>>>> System.identityHashCode(obj); //compare to obj.hashCode(); > >>>>>> System.out.println ("Result = " + (System.nanoTime() - time)); > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> Fix: Enabled the C1 optimization which was done only for Object.hashCode, now for System.identityHashCode() also. > >>>>>> (looks in the header for the hashCode before calling into the VM). > >>>>>> Unlike for Object.hashCode, System.identityHashCode is static method and gets object as argument instead of the receiver. > >>>>>> So also added required additional null check for System.identityHashCode case. > >>>>>> > >>>>>> Testing: > >>>>>> - successful JPRT run (-testset hotspot). > >>>>>> - JTREG testing (hotspot/test, jdk/test - java/util, java/io, java/lang/System). > >>>>>> (with -client / -XX:TieredStopAtLevel=1 etc. options). > >>>>>> - Added 'noreg-perf' label for this performance bug. > >>>>>> Manual testing done and confirmed expected performance values for unit tests with fix. > >>>>>> > >>>>>> Thanks, > >>>>>> Rahul From vitalyd at gmail.com Wed Jan 27 11:28:22 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 27 Jan 2016 06:28:22 -0500 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <412CE10D-193F-48D7-9E94-7D0C4DD9D6FD@oracle.com> References: <56A751AE.9090203@azulsystems.com> <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> <412CE10D-193F-48D7-9E94-7D0C4DD9D6FD@oracle.com> Message-ID: Although those same volatile loads will prevent much of regular load movement on their own; my statement earlier about scheduling a regular load that is after the loop to be before the loop won't work anyway due to the volatile loads in the loop. So all in all, it seems it wouldn't matter in practice. On Wednesday, January 27, 2016, Igor Veresov wrote: > Or to put it another way. Memory effect of the pause prevents ordinary > loads to float up. However, the control effect of it alone should be enough > to prevent the _volatile_ loads to float up, since they are > control-dependent. Hence the original thought that the memory effect of the > pause might be unnecessarily restrictive if it?s used with volatile loads. > But may be I?m missing something. > > igor > > On Jan 26, 2016, at 10:03 PM, Igor Veresov > wrote: > > > On Jan 26, 2016, at 9:35 PM, Vitaly Davidovich > wrote: > > > > On Tuesday, January 26, 2016, Igor Veresov > wrote: > >> >> On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich wrote: >> >> You would but subsequent volatile load could move before the pause. If >> you unroll the loop, you could (theoretically) end up with all loads moved >> before the pause but all appearing ordered with respect to each other, eg: >> >> cmp addr, 0 // from iteration 1 >> je label >> cmp addr, 0 // from iteration 2 >> je label >> ... >> pause >> >> What prevents that if pause is not a compiler member? >> >> >> I think volatile loads explicitly depend on control. If the pause node >> consumes and produces control it all should be in a rigid control chain. >> > Other regular loads (that don?t have control dependencies) would still be >> free to move around. >> > > Is this to avoid out of thin air values? That is, suppose you have: > > if (some condition) > read volatile (or regular) > > > Regular load can be scheduled before the if and result used if control > reaches there. For volatile, load cannot be scheduled above the if since > value can be bogus at that point? > > > Right. Regular reads can move up anywhere to the preceding memory effect, > that modified that alias index. > > > > Is it safe for compiler to assume that something else anchors loads around > the pause? > > That aside, given the intended usage, I'm not sure what other regular > loads would be there. The usage is a tight spin loop waiting for exit > condition to be met. Although I suppose if compiler sees regular loads > after the loop exits successfully, perhaps scheduling them before the loop > can be beneficial. Is that what you have in mind? > > > > No just simple stuff like: > > while(?) { > a = x.f; > pause(); > b = x.f; > } > > If pause() is a wide memory kill, regular field loads around it obviously > won?t fold. So in the example above those field loads are both going to be > there. I realize it?s probably not a big deal in reality for the wait > loops, but I was just wondering why make it a wide mem kill if membar nodes > for volatiles (that will have to be in the loop) already have wide kill > semantics. > > igor > > > > >> igor >> >> On Tuesday, January 26, 2016, Igor Veresov >> wrote: >> >>> Wouldn?t you use a volatile load for the memory location you?re polling? >>> >>> igor >>> >>> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich >>> wrote: >>> >>> Subsequent loads at this point will likely be polls of same memory >>> location that just failed a test, and the author inserted a pause. It's >>> unlikely that the memory changed that quickly and scheduling the next load >>> before the pause is equivalent to two loads back to back essentially, which >>> wouldn't make sense given the intended usage. There's also the risk that >>> the compiler would move enough of those load+test pairs before the pause >>> and fill up the speculative pipeline with them; that pipeline will need to >>> be flushed once the spin exits since those load instructions likely >>> speculated incorrectly. And here we're basically describing the reason for >>> putting pause there in the first place :). >>> >>> On Tuesday, January 26, 2016, Igor Veresov >>> wrote: >>> >>>> So, why does the new node have a memory effect? That would seem to >>>> prevent any movement of the subsequent loads in your loop, right? If that?s >>>> intentional I wonder why is that? >>>> >>>> igor >>>> >>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>>> >>>> Hello, >>>> >>>> Some of you may have a seen a few e-mails on the core-libs alias about >>>> a proposed ?spin wait hint?. The JEP is forming up nicely at >>>> https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a >>>> consensus on the API side. It is now in a draft state and I hope this JEP >>>> will get targeted for java 9 shortly. The upcoming API changes can be seen >>>> at the webrev: >>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>> >>>> At this time I would like to ask for a review of the hs-comp changes. >>>> The plan is push changes into class libraries and hotspot synchronously but >>>> that may happen after the JEP gets targeted. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>> >>>> The idea of the fix is pretty simple: hotspot replaces a call to >>>> java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a >>>> 'pause' instruction on x86. This intrinsic is guarded by the >>>> -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a >>>> verification code that makes sure the flag is off, VM will just execute at >>>> empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. >>>> According the [1] the 'pause' instruction is functional since SSE2, but >>>> even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence >>>> harmless, there seems to be no need to add guarding code for older >>>> generations of Intel CPUs. >>>> >>>> The proposed patch includes a simple regression test that simply makes >>>> sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There >>>> are several other producer-consumer-like performance tests ready that the >>>> authors of this JEP would be happy to make available under JEP-230 but I am >>>> uncertain about the process. >>>> >>>> Thanks, >>>> >>>> Ivan >>>> >>>> [1] - >>>> https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >>>> >>>> >>>> >>> >>> -- >>> Sent from my phone >>> >>> >>> >> >> -- >> Sent from my phone >> >> >> > > -- > Sent from my phone > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Jan 27 12:22:15 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 27 Jan 2016 07:22:15 -0500 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> <0DCFF214-7A0D-48CF-A9CD-6DD32922701D@oracle.com> Message-ID: On Wednesday, January 27, 2016, Igor Veresov wrote: > > On Jan 26, 2016, at 9:35 PM, Vitaly Davidovich > wrote: > > > > On Tuesday, January 26, 2016, Igor Veresov > wrote: > >> >> On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich wrote: >> >> You would but subsequent volatile load could move before the pause. If >> you unroll the loop, you could (theoretically) end up with all loads moved >> before the pause but all appearing ordered with respect to each other, eg: >> >> cmp addr, 0 // from iteration 1 >> je label >> cmp addr, 0 // from iteration 2 >> je label >> ... >> pause >> >> What prevents that if pause is not a compiler member? >> >> >> I think volatile loads explicitly depend on control. If the pause node >> consumes and produces control it all should be in a rigid control chain. >> > Other regular loads (that don?t have control dependencies) would still be >> free to move around. >> > > Is this to avoid out of thin air values? That is, suppose you have: > > if (some condition) > read volatile (or regular) > > > Regular load can be scheduled before the if and result used if control > reaches there. For volatile, load cannot be scheduled above the if since > value can be bogus at that point? > > > Right. Regular reads can move up anywhere to the preceding memory effect, > that modified that alias index. > I wonder if that's required by JMM though. In my example above, if the condition being read doesn't have volatile load semantics then it seems there's no happens-before between the condition and the volatile load. Your sentence above regarding modifying the alias index sort of makes it sound like store-load forwarding by the compiler, allowing the read to be skipped entirely (for regular loads), is that right or did I read too much into it? If that's right, volatile loads cannot be eliminated so not quite sure where that nets out. I can see how volatile loads having control is a safe/conservative implementation approach but I can also see how scheduling them aggressively, when not prevented by other memory ordering, could be beneficial. > > > Is it safe for compiler to assume that something else anchors loads around > the pause? > > That aside, given the intended usage, I'm not sure what other regular > loads would be there. The usage is a tight spin loop waiting for exit > condition to be met. Although I suppose if compiler sees regular loads > after the loop exits successfully, perhaps scheduling them before the loop > can be beneficial. Is that what you have in mind? > > > > No just simple stuff like: > > while(?) { > a = x.f; > pause(); > b = x.f; > } > > If pause() is a wide memory kill, regular field loads around it obviously > won?t fold. So in the example above those field loads are both going to be > there. I realize it?s probably not a big deal in reality for the wait > loops, but I was just wondering why make it a wide mem kill if membar nodes > for volatiles (that will have to be in the loop) already have wide kill > semantics. > > igor > > > > >> igor >> >> On Tuesday, January 26, 2016, Igor Veresov >> wrote: >> >>> Wouldn?t you use a volatile load for the memory location you?re polling? >>> >>> igor >>> >>> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich >>> wrote: >>> >>> Subsequent loads at this point will likely be polls of same memory >>> location that just failed a test, and the author inserted a pause. It's >>> unlikely that the memory changed that quickly and scheduling the next load >>> before the pause is equivalent to two loads back to back essentially, which >>> wouldn't make sense given the intended usage. There's also the risk that >>> the compiler would move enough of those load+test pairs before the pause >>> and fill up the speculative pipeline with them; that pipeline will need to >>> be flushed once the spin exits since those load instructions likely >>> speculated incorrectly. And here we're basically describing the reason for >>> putting pause there in the first place :). >>> >>> On Tuesday, January 26, 2016, Igor Veresov >>> wrote: >>> >>>> So, why does the new node have a memory effect? That would seem to >>>> prevent any movement of the subsequent loads in your loop, right? If that?s >>>> intentional I wonder why is that? >>>> >>>> igor >>>> >>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>>> >>>> Hello, >>>> >>>> Some of you may have a seen a few e-mails on the core-libs alias about >>>> a proposed ?spin wait hint?. The JEP is forming up nicely at >>>> https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a >>>> consensus on the API side. It is now in a draft state and I hope this JEP >>>> will get targeted for java 9 shortly. The upcoming API changes can be seen >>>> at the webrev: >>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>> >>>> At this time I would like to ask for a review of the hs-comp changes. >>>> The plan is push changes into class libraries and hotspot synchronously but >>>> that may happen after the JEP gets targeted. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>> >>>> The idea of the fix is pretty simple: hotspot replaces a call to >>>> java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a >>>> 'pause' instruction on x86. This intrinsic is guarded by the >>>> -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a >>>> verification code that makes sure the flag is off, VM will just execute at >>>> empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. >>>> According the [1] the 'pause' instruction is functional since SSE2, but >>>> even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence >>>> harmless, there seems to be no need to add guarding code for older >>>> generations of Intel CPUs. >>>> >>>> The proposed patch includes a simple regression test that simply makes >>>> sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There >>>> are several other producer-consumer-like performance tests ready that the >>>> authors of this JEP would be happy to make available under JEP-230 but I am >>>> uncertain about the process. >>>> >>>> Thanks, >>>> >>>> Ivan >>>> >>>> [1] - >>>> https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >>>> >>>> >>>> >>> >>> -- >>> Sent from my phone >>> >>> >>> >> >> -- >> Sent from my phone >> >> >> > > -- > Sent from my phone > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivan at azulsystems.com Wed Jan 27 12:48:29 2016 From: ivan at azulsystems.com (Ivan Krylov) Date: Wed, 27 Jan 2016 15:48:29 +0300 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> Message-ID: <56A8BC9D.8060004@azulsystems.com> Looks like there was some good discussion while I was peacefully sleeping. I don't have much to add. This patch was somewhat inspired by JEP-171 changes. Perhaps,there are other ways to achieve the same semantics. So, if we can consider this reviewed - I will wait for the actual JEP to become targeted to 9 and then seek a sponsor to do the push. Thanks, Ivan On 27/01/2016 09:12, Igor Veresov wrote: > I realize it?s not a big deal. I was just wondering if there was any > specific reason control alone is not enough. > Anyways, looks ok for the first cut. > > igor > >> On Jan 26, 2016, at 9:24 PM, Gil Tene > > wrote: >> >> Since a sensical loop that calls onSpinWait() would include at least >> a volatile load on every iteration (and possibly a volatile store), >> the new node does not create significant extra move restrictions that >> are not already there. Modeling this with a memory effect is one >> simple way to prevent it from being re-ordered out of the loop. There >> are probably other ways to achieve this, but this one doesn't really >> have a performance downside? >> >> ? Gil. >> >>> On Jan 26, 2016, at 4:44 PM, Igor Veresov >> > wrote: >>> >>> So, why does the new node have a memory effect? That would seem to >>> prevent any movement of the subsequent loads in your loop, right? If >>> that?s intentional I wonder why is that? >>> >>> igor >>> >>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov >>> > wrote: >>>> >>>> Hello, >>>> >>>> Some of you may have a seen a few e-mails on the core-libs alias >>>> about a proposed ?spin wait hint?. The JEP is forming up nicely at >>>> https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be >>>> a consensus on the API side. It is now in a draft state and I hope >>>> this JEP will get targeted for java 9 shortly. The upcoming API >>>> changes can be seen at the webrev: >>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>> >>>> At this time I would like to ask for a review of the hs-comp >>>> changes. The plan is push changes into class libraries and hotspot >>>> synchronously but that may happen after the JEP gets targeted. >>>> >>>> Bug:https://bugs.openjdk.java.net/browse/JDK-8147844 >>>> Webrev:http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>> >>>> The idea of the fix is pretty simple: hotspot replaces a call to >>>> java.lang.Runtime.onSpinWait() with an intrinsic that is >>>> effectively a 'pause' instruction on x86. This intrinsic is >>>> guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 >>>> platforms there is a verification code that makes sure the flag is >>>> off, VM will just execute at empty method >>>> java.lang.Runtime.onSpinWait() ? effectively a no-op. According the >>>> [1] the 'pause' instruction is functional since SSE2, but even on >>>> CPUs prior to SSE2 the 'pause' instruction is a no-op and hence >>>> harmless, there seems to be no need to add guarding code for older >>>> generations of Intel CPUs. >>>> >>>> The proposed patch includes a simple regression test that simply >>>> makes sure that method java.lang.Runtime.onSpinWait() gets >>>> intrinsified. There are several other producer-consumer-like >>>> performance tests ready that the authors of this JEP would be happy >>>> to make available under JEP-230 but I am uncertain about the process. >>>> >>>> Thanks, >>>> >>>> Ivan >>>> >>>> [1] >>>> -https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.punegov at oracle.com Wed Jan 27 14:51:19 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Wed, 27 Jan 2016 17:51:19 +0300 Subject: RFR (XXS): [TESTBUG] InlineCommandTest.java: unknown compiler level 0 for commpile ID: 651 Message-ID: <1EC15E02-BB6F-480C-8FB4-40F8DB9A7C39@oracle.com> Please review the following small patch for inlining tests. Issue: tests are unable to find JFR compilation event for appropriate inline event. This happens because the recording stops before the compilation finished. Invocation of the test method is not synchronised with compilation. Fix: add Xbatch to make compilation block test thread. bug: https://bugs.openjdk.java.net/browse/JDK-8144239 webrev: http://cr.openjdk.java.net/~ppunegov/8144239/webrev.00/ ? Thanks, Pavel Punegov -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Jan 27 15:39:11 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 27 Jan 2016 16:39:11 +0100 Subject: RFR(S): 8147645: get_ctrl_no_update() code is wrong Message-ID: <77A6696F-8F4B-4023-AE58-61E2D01A8035@oracle.com> The intrinsify_fill() code doesn?t mark a replaced control as dead. As suggested in the bug, I added an assert to get_ctrl_no_update() so we don?t use a loop as a control by accident. I also dropped lazy_replace_proj() which is obsolete AFAICT. http://cr.openjdk.java.net/~roland/8147645/webrev.00/ Roland. From zoltan.majo at oracle.com Wed Jan 27 15:56:28 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 27 Jan 2016 16:56:28 +0100 Subject: [9] RFR (S): 8146478: Node limit exceeded with -XX:AllocateInstancePrefetchLines=1073741823 In-Reply-To: <56A7C27B.8050004@oracle.com> References: <56A7A223.9050403@oracle.com> <56A7C27B.8050004@oracle.com> Message-ID: <56A8E8AC.7010007@oracle.com> Hi Vladimir, thank you for the feedback! On 01/26/2016 08:01 PM, Vladimir Kozlov wrote: > Where 4/2 number comes from? Some spec runs used higher number: Those are the highest values set by the VM. I was not aware that SPEC runs using values higher than those. > > -XX:AllocatePrefetchLines=16 > > http://spec.org/jbb2005/results/res2009q1/jbb2005-20081203-00563.html > > I would suggest something like 64 - I never see such number is used. OK, I set the maximum value for both AllocatePrefetchLines and AllocateInstancePrefetchLines to 64. > > Also, please, limit AllocatePrefetchStepSize range. It corresponds to > cache line size. 512 I would say for future proof - OK, done. > with assert that check that its setting in vm_Version_.cpp is in > these OK, I modified the range check in the AllocatePrefetchStepSizeConstraintFunc() constraint function accordingly. I hope that is fine. > > For the case AllocatePrefetchStyle == 2 number of lines is calculated as: > > uint lines = AllocatePrefetchDistance / AllocatePrefetchStepSize; > > Since AllocatePrefetchDistance limit is big you can get a lot of nodes > again. May be also set the limit - > AllocatePrefetchLines*AllocatePrefetchStepSize 64*32 = 2048. Thank you for catching that. I extended the constraint function AllocatePrefetchStepSizeConstraintFunc() to check that AllocatePrefetchDistance / AllocatePrefetchStepSize <= 64 (64 is the maximum value that we expect for 'lines' in PhaseMacroExpand::prefetch_allocation() AllocatePrefetchStyle == 2.) I hope this is fine. I also modified the expected node count increase after expansion in PhaseMacroExpand::expand_macro_nodes() to account for the increased thresholds. Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8146478/webrev.01/ I re-tested with JPRT (incl. TestOptionsWithRanges.java), all tests pass. Thank you and best regards, Zoltan > > Thanks, > Vladimir > > On 1/26/16 8:43 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for 8146478. >> >> https://bugs.openjdk.java.net/browse/JDK-8146478 >> >> Problem: Setting a high value for AllocateInstancePrefetchLines can >> trigger an assert in the C2 compiler The reasons is that the number of >> live nodes exceeds the maximum node limit. The same problem can happen >> if AllocateInstanceLines is given a high value. >> >> Solution: >> Limit the range for AllocateInstancePrefetchLines/AllocateInstanceLines >> to 8. I picked the value 8 because >> - (1) the maximum possible value for theses flags is 4/2, so having a >> slightly higher value than 4/2 still allows for some experiments; >> - (2) the node_check() in PhaseMacroExpand::expand_macro_nodes() assumes >> that each macro node expansion will generate <75 new nodes. The number >> of nodes generated by expand_allocate_array()/expand_allocate() for 8 >> prefetched lines closely fits into that margin (experimentally >> verified). >> >> In addition, I removed some code that is that is now unnecessary because >> of the range checks we have in place. >> >> >> Webrev: >> http://cr.openjdk.java.net/~zmajo/8146478/webrev.00/ >> >> Testing: >> - JPRT: All JTREG hotspot tests, incl. TestOptionsWithRanges.java >> >> Thank you and best regards, >> >> >> Zoltan >> From tatiana.pivovarova at oracle.com Wed Jan 27 17:15:12 2016 From: tatiana.pivovarova at oracle.com (Tatiana Pivovarova) Date: Wed, 27 Jan 2016 20:15:12 +0300 Subject: RFR(M): 8148375: [jittester] Bug with generation function with void parameter and non empty arguments Message-ID: <56A8FB20.4000709@oracle.com> Hello! Please review the following patch for jit-tester. When jit-tester generates Function node with void return type and with some arguments then this function in .java file returns arg_0. While I fixed this bug there accumulated some small fixes like: - small performance improvement in SymbolTabe::merge function - renamed 'klass' to 'owner' where it make sense - added more stream-style - moved TypeUtil to utils package - changed Makefile to get ability to use different 'seed' and 'number-of-tests' (not only from property file) bug-id: https://bugs.openjdk.java.net/browse/JDK-8148375 webrev: http://cr.openjdk.java.net/~tpivovarova/8148375/webrev.00/ Thanks, Tatiana -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Jan 27 19:03:35 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 27 Jan 2016 11:03:35 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56A8BC9D.8060004@azulsystems.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> <56A8BC9D.8060004@azulsystems.com> Message-ID: <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> Actually, I?d rather use Matcher::match_rule_supported() to test if it?s supported on the platform, rather than fixing all vm_version_*.* to check for the flag validity, that?s tedious (you forgot x86-32 and there?s going to be more platforms to fix for you sponsor). Something like UseOnSpinWaitIntrinsic && Matcher::match_rule_supported(Op_OnSpinWait) to decide whether or not to inline the intrinsic. Also, why are you not turning it on by default? igor > On Jan 27, 2016, at 4:48 AM, Ivan Krylov wrote: > > Looks like there was some good discussion while I was peacefully sleeping. > I don't have much to add. This patch was somewhat inspired by JEP-171 changes. > Perhaps,there are other ways to achieve the same semantics. > > So, if we can consider this reviewed - I will wait for the actual JEP to become targeted to 9 and then seek a sponsor to do the push. > > Thanks, > > Ivan > > On 27/01/2016 09:12, Igor Veresov wrote: >> I realize it?s not a big deal. I was just wondering if there was any specific reason control alone is not enough. >> Anyways, looks ok for the first cut. >> >> igor >> >>> On Jan 26, 2016, at 9:24 PM, Gil Tene wrote: >>> >>> Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and possibly a volatile store), the new node does not create significant extra move restrictions that are not already there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. There are probably other ways to achieve this, but this one doesn't really have a performance downside? >>> >>> ? Gil. >>> >>>> On Jan 26, 2016, at 4:44 PM, Igor Veresov wrote: >>>> >>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >>>> >>>> igor >>>> >>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>>>> >>>>> Hello, >>>>> >>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>>> >>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>>> >>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>>>> >>>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>>>> >>>>> Thanks, >>>>> >>>>> Ivan >>>>> >>>>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >>>> >>> >> > From vladimir.kozlov at oracle.com Wed Jan 27 19:10:31 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Jan 2016 11:10:31 -0800 Subject: RFR (XXS): [TESTBUG] InlineCommandTest.java: unknown compiler level 0 for commpile ID: 651 In-Reply-To: <1EC15E02-BB6F-480C-8FB4-40F8DB9A7C39@oracle.com> References: <1EC15E02-BB6F-480C-8FB4-40F8DB9A7C39@oracle.com> Message-ID: <56A91627.4040500@oracle.com> Looks fine. Vlaidmir On 1/27/16 6:51 AM, Pavel Punegov wrote: > Please review the following small patch for inlining tests. > > Issue: tests are unable to find JFR compilation event for appropriate > inline event. This happens because the recording stops before the > compilation finished. Invocation of the test method is not synchronised > with compilation. > > Fix: add Xbatch to make compilation block test thread. > > bug: https://bugs.openjdk.java.net/browse/JDK-8144239 > webrev: http://cr.openjdk.java.net/~ppunegov/8144239/webrev.00/ > > ? Thanks, > Pavel Punegov > From vladimir.kozlov at oracle.com Wed Jan 27 19:13:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Jan 2016 11:13:16 -0800 Subject: [9] RFR (S): 8146478: Node limit exceeded with -XX:AllocateInstancePrefetchLines=1073741823 In-Reply-To: <56A8E8AC.7010007@oracle.com> References: <56A7A223.9050403@oracle.com> <56A7C27B.8050004@oracle.com> <56A8E8AC.7010007@oracle.com> Message-ID: <56A916CC.6070103@oracle.com> Looks good. Thanks, Vladimir On 1/27/16 7:56 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! > > On 01/26/2016 08:01 PM, Vladimir Kozlov wrote: >> Where 4/2 number comes from? Some spec runs used higher number: > > Those are the highest values set by the VM. I was not aware that SPEC > runs using values higher than those. > >> >> -XX:AllocatePrefetchLines=16 >> >> http://spec.org/jbb2005/results/res2009q1/jbb2005-20081203-00563.html >> >> I would suggest something like 64 - I never see such number is used. > > OK, I set the maximum value for both AllocatePrefetchLines and > AllocateInstancePrefetchLines to 64. > >> >> Also, please, limit AllocatePrefetchStepSize range. It corresponds to >> cache line size. 512 I would say for future proof - > > OK, done. > >> with assert that check that its setting in vm_Version_.cpp is in >> these > > OK, I modified the range check in the > AllocatePrefetchStepSizeConstraintFunc() constraint function > accordingly. I hope that is fine. > >> >> For the case AllocatePrefetchStyle == 2 number of lines is calculated as: >> >> uint lines = AllocatePrefetchDistance / AllocatePrefetchStepSize; >> >> Since AllocatePrefetchDistance limit is big you can get a lot of nodes >> again. May be also set the limit - >> AllocatePrefetchLines*AllocatePrefetchStepSize 64*32 = 2048. > > Thank you for catching that. I extended the constraint function > AllocatePrefetchStepSizeConstraintFunc() to check that > > AllocatePrefetchDistance / AllocatePrefetchStepSize <= 64 > > (64 is the maximum value that we expect for 'lines' in > PhaseMacroExpand::prefetch_allocation() AllocatePrefetchStyle == 2.) I > hope this is fine. > > I also modified the expected node count increase after expansion in > PhaseMacroExpand::expand_macro_nodes() to account for the increased > thresholds. > > > Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8146478/webrev.01/ > > I re-tested with JPRT (incl. TestOptionsWithRanges.java), all tests pass. > > Thank you and best regards, > > > Zoltan > > >> >> Thanks, >> Vladimir >> >> On 1/26/16 8:43 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the patch for 8146478. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8146478 >>> >>> Problem: Setting a high value for AllocateInstancePrefetchLines can >>> trigger an assert in the C2 compiler The reasons is that the number of >>> live nodes exceeds the maximum node limit. The same problem can happen >>> if AllocateInstanceLines is given a high value. >>> >>> Solution: >>> Limit the range for AllocateInstancePrefetchLines/AllocateInstanceLines >>> to 8. I picked the value 8 because >>> - (1) the maximum possible value for theses flags is 4/2, so having a >>> slightly higher value than 4/2 still allows for some experiments; >>> - (2) the node_check() in PhaseMacroExpand::expand_macro_nodes() assumes >>> that each macro node expansion will generate <75 new nodes. The number >>> of nodes generated by expand_allocate_array()/expand_allocate() for 8 >>> prefetched lines closely fits into that margin (experimentally >>> verified). >>> >>> In addition, I removed some code that is that is now unnecessary because >>> of the range checks we have in place. >>> >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zmajo/8146478/webrev.00/ >>> >>> Testing: >>> - JPRT: All JTREG hotspot tests, incl. TestOptionsWithRanges.java >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> > From vladimir.kozlov at oracle.com Wed Jan 27 19:53:11 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Jan 2016 11:53:11 -0800 Subject: RFR(S): 8147645: get_ctrl_no_update() code is wrong In-Reply-To: <77A6696F-8F4B-4023-AE58-61E2D01A8035@oracle.com> References: <77A6696F-8F4B-4023-AE58-61E2D01A8035@oracle.com> Message-ID: <56A92027.2070602@oracle.com> Yes, old_node->add_req(NULL) was very odd. It was from day one and I don't get why it was needed. I would understand if it was set_req(0, NULL) but that is done by remove_globally_dead_node() later. Your changes are good. I agree. Thanks, Vladimir On 1/27/16 7:39 AM, Roland Westrelin wrote: > The intrinsify_fill() code doesn?t mark a replaced control as dead. As suggested in the bug, I added an assert to > get_ctrl_no_update() so we don?t use a loop as a control by accident. I also dropped > lazy_replace_proj() which is obsolete AFAICT. > > http://cr.openjdk.java.net/~roland/8147645/webrev.00/ > > Roland. > From vladimir.kozlov at oracle.com Wed Jan 27 20:57:19 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 27 Jan 2016 12:57:19 -0800 Subject: RFR(S): 8063112: Compiler diagnostic commands should have locking instead of safepoint In-Reply-To: <56A74D06.7030408@oracle.com> References: <56A23F61.9000201@oracle.com> <56A281C3.6010408@oracle.com> <56A74D06.7030408@oracle.com> Message-ID: <56A92F2F.9070507@oracle.com> Yes, this looks much better. Reviewed. thanks, Vladimir On 1/26/16 2:40 AM, Nils Eliasson wrote: > Hi Vladimir, > > On 2016-01-22 20:23, Vladimir Kozlov wrote: >> Why you need new print method? Why you can't use existing print()? >> Also I prefer to get current compilation tasks print in separate lines >> - not in the list of threads. Then you don't need to use new print? > > Works for me. I moved it directly after the existing thread printing: > > --------------- P R O C E S S --------------- > > Java Threads: ( => current thread ) > 0x00007f4cfc485000 JavaThread "Service Thread" daemon > [_thread_blocked, id=22409, stack(0x00007f4bf1c5e000,0x00007f4bf1d5f000)] > 0x00007f4cfc476000 JavaThread "Sweeper thread" daemon > [_thread_blocked, id=22408, stack(0x00007f4bf1d5f000,0x00007f4bf1e60000)] > ... > stack(0x00007f4bf35db000,0x00007f4bf36dc000)] > 0x00007f4cfc018800 JavaThread "main" [_thread_in_vm, id=22332, > stack(0x00007f4d05c78000,0x00007f4d05d79000)] > > Other Threads: > 0x00007f4cfc3ea000 VMThread [stack: > 0x00007f4bf36dc000,0x00007f4bf37dd000] [id=22388] > 0x00007f4cfc486800 WatcherThread [stack: > 0x00007f4bf1b5d000,0x00007f4bf1c5e000] [id=22410] > > Threads with active compile tasks: > 0x00007f4cfc46a800 id=22403 Compiling: 244 1 3 > java.lang.String::isLatin1 (19 bytes) > > >> >> I am worry about using locks for printing because print code also has >> locks. Do we really have to have locks here? The output for these >> directives is local bufferedStream. As I understand it is separate for >> each directive. So why you need lock? Or VM operation as before? > > I think you are mixing my two RFRs together - this change doesn't print > directives. > > I am removing vm_ops from three diagnostic commands that uses code that > expects safepoint or lock. Some of the commands are really quick, and > requesting a safepoint is overkill when it can be done concurrently. > Only new lock taken is the thread lock when iterating the compiler > threads from the Compiler.queue jcmd. The thread lock is ranked so it > can not be reordered with the compile.queue lock. > > I cleaned it up a bit further and removed the unused > print_compiler_threads_on(...) from compileBroker. It is printed in > JavaThread::print_on(..) where all the other thread info is located. > > Hs_err-file looks like the example above. > > jcmd Thread.print looks like this for compiling threads: > > C1 CompilerThread13" #19 daemon prio=9 os_prio=0 tid=0x00007f8748471800 > nid=0x7732 runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > JavaThread state: _thread_in_native > Thread: 0x00007f8748471800 [0x7732] State: _at_safepoint > _has_called_back 0 _at_poll_safepoint 0 > JavaThread state: _thread_in_native > Compiling: 716 b 2 java.util.regex.Pattern::compile (406 > bytes) > > And Compiler.queue looks like this: > > "Current compiles: > C1 CompilerThread14 435 b 2 java.net.URLStreamHandler::parseURL > (1166 bytes) > > C1 compile queue: > Empty > > C2 compile queue: > Empty" > > > New webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.04/ > > Regards, > Nils > >> >> Thanks, >> Vladimir >> >> On 1/22/16 6:40 AM, Nils Eliasson wrote: >>> Hi, >>> >>> Please review. >>> >>> Summary: >>> Firstly this change removes the unnecessary vm-ops from three compiler >>> diagnostic commands and adds locking instead. >>> Secondly the Compiler.queue diagnostic command is improved with printing >>> of any active compilations. I found this useful when diagnosing a >>> rouge VM. >>> Thirdly, as a bonus, I also add printing of active compilations in the >>> thread section of the hs_err file. Very useful when investigating VMs >>> terminated by a timeout. >>> >>> Testing: >>> This does not pass all tests yet. A few tests is dependent on the output >>> from the diagnostic command, and I want to be sure the reviewers are >>> happy with the output format first. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8063112 >>> Webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.02/ >>> >>> Regards, >>> Nils >>> > From vladimir.x.ivanov at oracle.com Wed Jan 27 22:05:47 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 28 Jan 2016 01:05:47 +0300 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56A751AE.9090203@azulsystems.com> References: <56A751AE.9090203@azulsystems.com> Message-ID: <56A93F3B.7070301@oracle.com> Ivan, There's no need in yet another flag (-XX:?UseOnSpinWaitIntrinsic). -XX:DisableIntrinsic=_onSpinWait should do the same. Best regards, Vladimir Ivanov On 1/26/16 1:59 PM, Ivan Krylov wrote: > Hello, > > Some of you may have a seen a few e-mails on the core-libs alias about a > proposed ?spin wait hint?. The JEP is forming up nicely at > https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a > consensus on the API side. It is now in a draft state and I hope this > JEP will get targeted for java 9 shortly. The upcoming API changes can > be seen at the webrev: > http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ > > At this time I would like to ask for a review of the hs-comp changes. > The plan is push changes into class libraries and hotspot synchronously > but that may happen after the JEP gets targeted. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 > Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ > > The idea of the fix is pretty simple: hotspot replaces a call to > java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a > 'pause' instruction on x86. This intrinsic is guarded by the > -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a > verification code that makes sure the flag is off, VM will just execute > at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. > According the [1] the 'pause' instruction is functional since SSE2, but > even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence > harmless, there seems to be no need to add guarding code for older > generations of Intel CPUs. > > The proposed patch includes a simple regression test that simply makes > sure that method java.lang.Runtime.onSpinWait() gets intrinsified. > There are several other producer-consumer-like performance tests ready > that the authors of this JEP would be happy to make available under > JEP-230 but I am uncertain about the process. > > Thanks, > > Ivan > > [1] - > https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops > From ivan at azulsystems.com Wed Jan 27 22:37:55 2016 From: ivan at azulsystems.com (Ivan Krylov) Date: Thu, 28 Jan 2016 01:37:55 +0300 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56A93F3B.7070301@oracle.com> References: <56A751AE.9090203@azulsystems.com> <56A93F3B.7070301@oracle.com> Message-ID: <56A946C3.3060104@azulsystems.com> On 28/01/2016 01:05, Vladimir Ivanov wrote: > Ivan, > > There's no need in yet another flag (-XX:?UseOnSpinWaitIntrinsic). > -XX:DisableIntrinsic=_onSpinWait should do the same. Good suggestion. I will accommodate that. Thanks, Ivan > > Best regards, > Vladimir Ivanov > > On 1/26/16 1:59 PM, Ivan Krylov wrote: >> Hello, >> >> Some of you may have a seen a few e-mails on the core-libs alias about a >> proposed ?spin wait hint?. The JEP is forming up nicely at >> https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a >> consensus on the API side. It is now in a draft state and I hope this >> JEP will get targeted for java 9 shortly. The upcoming API changes can >> be seen at the webrev: >> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >> >> At this time I would like to ask for a review of the hs-comp changes. >> The plan is push changes into class libraries and hotspot synchronously >> but that may happen after the JEP gets targeted. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >> >> The idea of the fix is pretty simple: hotspot replaces a call to >> java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a >> 'pause' instruction on x86. This intrinsic is guarded by the >> -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a >> verification code that makes sure the flag is off, VM will just execute >> at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. >> According the [1] the 'pause' instruction is functional since SSE2, but >> even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence >> harmless, there seems to be no need to add guarding code for older >> generations of Intel CPUs. >> >> The proposed patch includes a simple regression test that simply makes >> sure that method java.lang.Runtime.onSpinWait() gets intrinsified. >> There are several other producer-consumer-like performance tests ready >> that the authors of this JEP would be happy to make available under >> JEP-230 but I am uncertain about the process. >> >> Thanks, >> >> Ivan >> >> [1] - >> https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >> >> From zoltan.majo at oracle.com Thu Jan 28 07:22:36 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 28 Jan 2016 08:22:36 +0100 Subject: [9] RFR (S): 8146478: Node limit exceeded with -XX:AllocateInstancePrefetchLines=1073741823 In-Reply-To: <56A916CC.6070103@oracle.com> References: <56A7A223.9050403@oracle.com> <56A7C27B.8050004@oracle.com> <56A8E8AC.7010007@oracle.com> <56A916CC.6070103@oracle.com> Message-ID: <56A9C1BC.7000202@oracle.com> Hi Vladimir, thank you for the review! Best regards, Zoltan On 01/27/2016 08:13 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 1/27/16 7:56 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! >> >> On 01/26/2016 08:01 PM, Vladimir Kozlov wrote: >>> Where 4/2 number comes from? Some spec runs used higher number: >> >> Those are the highest values set by the VM. I was not aware that SPEC >> runs using values higher than those. >> >>> >>> -XX:AllocatePrefetchLines=16 >>> >>> http://spec.org/jbb2005/results/res2009q1/jbb2005-20081203-00563.html >>> >>> I would suggest something like 64 - I never see such number is used. >> >> OK, I set the maximum value for both AllocatePrefetchLines and >> AllocateInstancePrefetchLines to 64. >> >>> >>> Also, please, limit AllocatePrefetchStepSize range. It corresponds to >>> cache line size. 512 I would say for future proof - >> >> OK, done. >> >>> with assert that check that its setting in vm_Version_.cpp is in >>> these >> >> OK, I modified the range check in the >> AllocatePrefetchStepSizeConstraintFunc() constraint function >> accordingly. I hope that is fine. >> >>> >>> For the case AllocatePrefetchStyle == 2 number of lines is >>> calculated as: >>> >>> uint lines = AllocatePrefetchDistance / AllocatePrefetchStepSize; >>> >>> Since AllocatePrefetchDistance limit is big you can get a lot of nodes >>> again. May be also set the limit - >>> AllocatePrefetchLines*AllocatePrefetchStepSize 64*32 = 2048. >> >> Thank you for catching that. I extended the constraint function >> AllocatePrefetchStepSizeConstraintFunc() to check that >> >> AllocatePrefetchDistance / AllocatePrefetchStepSize <= 64 >> >> (64 is the maximum value that we expect for 'lines' in >> PhaseMacroExpand::prefetch_allocation() AllocatePrefetchStyle == 2.) I >> hope this is fine. >> >> I also modified the expected node count increase after expansion in >> PhaseMacroExpand::expand_macro_nodes() to account for the increased >> thresholds. >> >> >> Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8146478/webrev.01/ >> >> I re-tested with JPRT (incl. TestOptionsWithRanges.java), all tests >> pass. >> >> Thank you and best regards, >> >> >> Zoltan >> >> >>> >>> Thanks, >>> Vladimir >>> >>> On 1/26/16 8:43 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for 8146478. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8146478 >>>> >>>> Problem: Setting a high value for AllocateInstancePrefetchLines can >>>> trigger an assert in the C2 compiler The reasons is that the number of >>>> live nodes exceeds the maximum node limit. The same problem can happen >>>> if AllocateInstanceLines is given a high value. >>>> >>>> Solution: >>>> Limit the range for >>>> AllocateInstancePrefetchLines/AllocateInstanceLines >>>> to 8. I picked the value 8 because >>>> - (1) the maximum possible value for theses flags is 4/2, so having a >>>> slightly higher value than 4/2 still allows for some experiments; >>>> - (2) the node_check() in PhaseMacroExpand::expand_macro_nodes() >>>> assumes >>>> that each macro node expansion will generate <75 new nodes. The number >>>> of nodes generated by expand_allocate_array()/expand_allocate() for 8 >>>> prefetched lines closely fits into that margin (experimentally >>>> verified). >>>> >>>> In addition, I removed some code that is that is now unnecessary >>>> because >>>> of the range checks we have in place. >>>> >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~zmajo/8146478/webrev.00/ >>>> >>>> Testing: >>>> - JPRT: All JTREG hotspot tests, incl. TestOptionsWithRanges.java >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> From nils.eliasson at oracle.com Thu Jan 28 08:21:12 2016 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 28 Jan 2016 09:21:12 +0100 Subject: RFR(S): 8063112: Compiler diagnostic commands should have locking instead of safepoint In-Reply-To: <56A92F2F.9070507@oracle.com> References: <56A23F61.9000201@oracle.com> <56A281C3.6010408@oracle.com> <56A74D06.7030408@oracle.com> <56A92F2F.9070507@oracle.com> Message-ID: <56A9CF78.8000106@oracle.com> Thanks Vladimir! Regards, Nils On 2016-01-27 21:57, Vladimir Kozlov wrote: > Yes, this looks much better. Reviewed. > > thanks, > Vladimir > > On 1/26/16 2:40 AM, Nils Eliasson wrote: >> Hi Vladimir, >> >> On 2016-01-22 20:23, Vladimir Kozlov wrote: >>> Why you need new print method? Why you can't use existing print()? >>> Also I prefer to get current compilation tasks print in separate lines >>> - not in the list of threads. Then you don't need to use new print? >> >> Works for me. I moved it directly after the existing thread printing: >> >> --------------- P R O C E S S --------------- >> >> Java Threads: ( => current thread ) >> 0x00007f4cfc485000 JavaThread "Service Thread" daemon >> [_thread_blocked, id=22409, >> stack(0x00007f4bf1c5e000,0x00007f4bf1d5f000)] >> 0x00007f4cfc476000 JavaThread "Sweeper thread" daemon >> [_thread_blocked, id=22408, >> stack(0x00007f4bf1d5f000,0x00007f4bf1e60000)] >> ... >> stack(0x00007f4bf35db000,0x00007f4bf36dc000)] >> 0x00007f4cfc018800 JavaThread "main" [_thread_in_vm, id=22332, >> stack(0x00007f4d05c78000,0x00007f4d05d79000)] >> >> Other Threads: >> 0x00007f4cfc3ea000 VMThread [stack: >> 0x00007f4bf36dc000,0x00007f4bf37dd000] [id=22388] >> 0x00007f4cfc486800 WatcherThread [stack: >> 0x00007f4bf1b5d000,0x00007f4bf1c5e000] [id=22410] >> >> Threads with active compile tasks: >> 0x00007f4cfc46a800 id=22403 Compiling: 244 1 3 >> java.lang.String::isLatin1 (19 bytes) >> >> >>> >>> I am worry about using locks for printing because print code also has >>> locks. Do we really have to have locks here? The output for these >>> directives is local bufferedStream. As I understand it is separate for >>> each directive. So why you need lock? Or VM operation as before? >> >> I think you are mixing my two RFRs together - this change doesn't print >> directives. >> >> I am removing vm_ops from three diagnostic commands that uses code that >> expects safepoint or lock. Some of the commands are really quick, and >> requesting a safepoint is overkill when it can be done concurrently. >> Only new lock taken is the thread lock when iterating the compiler >> threads from the Compiler.queue jcmd. The thread lock is ranked so it >> can not be reordered with the compile.queue lock. >> >> I cleaned it up a bit further and removed the unused >> print_compiler_threads_on(...) from compileBroker. It is printed in >> JavaThread::print_on(..) where all the other thread info is located. >> >> Hs_err-file looks like the example above. >> >> jcmd Thread.print looks like this for compiling threads: >> >> C1 CompilerThread13" #19 daemon prio=9 os_prio=0 tid=0x00007f8748471800 >> nid=0x7732 runnable [0x0000000000000000] >> java.lang.Thread.State: RUNNABLE >> JavaThread state: _thread_in_native >> Thread: 0x00007f8748471800 [0x7732] State: _at_safepoint >> _has_called_back 0 _at_poll_safepoint 0 >> JavaThread state: _thread_in_native >> Compiling: 716 b 2 java.util.regex.Pattern::compile (406 >> bytes) >> >> And Compiler.queue looks like this: >> >> "Current compiles: >> C1 CompilerThread14 435 b 2 java.net.URLStreamHandler::parseURL >> (1166 bytes) >> >> C1 compile queue: >> Empty >> >> C2 compile queue: >> Empty" >> >> >> New webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.04/ >> >> Regards, >> Nils >> >>> >>> Thanks, >>> Vladimir >>> >>> On 1/22/16 6:40 AM, Nils Eliasson wrote: >>>> Hi, >>>> >>>> Please review. >>>> >>>> Summary: >>>> Firstly this change removes the unnecessary vm-ops from three compiler >>>> diagnostic commands and adds locking instead. >>>> Secondly the Compiler.queue diagnostic command is improved with >>>> printing >>>> of any active compilations. I found this useful when diagnosing a >>>> rouge VM. >>>> Thirdly, as a bonus, I also add printing of active compilations in the >>>> thread section of the hs_err file. Very useful when investigating VMs >>>> terminated by a timeout. >>>> >>>> Testing: >>>> This does not pass all tests yet. A few tests is dependent on the >>>> output >>>> from the diagnostic command, and I want to be sure the reviewers are >>>> happy with the output format first. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8063112 >>>> Webrev: http://cr.openjdk.java.net/~neliasso/8063112/webrev.02/ >>>> >>>> Regards, >>>> Nils >>>> >> From christian.thalinger at oracle.com Thu Jan 28 09:41:58 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 28 Jan 2016 10:41:58 +0100 Subject: RFR(M) 8147461: Use byte offsets for vtable start and vtable length offsets In-Reply-To: <56A1FA78.3090608@oracle.com> References: <569926B9.4070806@oracle.com> <569F7E22.3090905@oracle.com> <56A04DCF.9090204@oracle.com> <56A1FA78.3090608@oracle.com> Message-ID: > On Jan 22, 2016, at 10:46 AM, Mikael Gerdin wrote: > > Hi Chris, > > On 2016-01-21 04:17, Chris Plummer wrote: >> Hi Mikael, >> >> The changes look good except I think you should get someone from the >> compiler team to make sure the change in >> HotSpotResolvedJavaMethodImpl.java and HotSpotVMConfig.java are ok. I'm >> not sure why you chose to remove instanceKlassVtableStartOffset() rather >> than just fix it. > > I'm cc:ing hotspot-compiler-dev and graal-dev to see if I can get someone to ok the JVMCI parts. > > The reason for removing the method is that the only reason for it being a method was to apply the wordSize scaling on the value and since I changed the offset to be a byte offset it does not need scaling and can be treated similar to the other constants in HotSpotVMConfig which are accessed without any accessor method. For the record, the JVMCI changes look good. > >> >> I think some of your changes may conflict with my changes for >> JDK-8143608. Coleen is pushing JDK-8143608 for me once hs-rt opens up. >> I'd appreciate it if you could wait until after then before doing your >> push. > > Will do, would you mind pinging me when you've integrated 8143608? > > /Mikael > >> >> thanks, >> >> Chris >> >> On 1/20/16 4:31 AM, Mikael Gerdin wrote: >>> Hi again, >>> >>> I've rebased the on hs-rt and had to include some additional changes >>> for JVMCI. >>> I've also updated the copyright years. >>> Unfortunately I can't generate an incremental webrev since i rebased >>> the patch and there's no good way that I know of to make that work >>> with webrev. >>> >>> New webrev at: http://cr.openjdk.java.net/~mgerdin/8147461/webrev.1/ >>> >>> Testing: JPRT again (which includes the JVMCI jtreg tests) >>> >>> /Mikael >>> >>> On 2016-01-15 18:04, Mikael Gerdin wrote: >>>> Hi all, >>>> >>>> As per the previous discussion in mid-December[0] about moving the >>>> _vtable_length field to class Klass, here's the first RFR and webrev, >>>> according to my suggested plan[1]: >>>> >>>>> My current plan is to first modify the vtable_length_offset accessor to >>>>> return a byte offset (which is what it's translated to by all callers). >>>>> >>>>> Then I'll tackle moving the _vtable_len field to Klass. >>>>> >>>>> Finally I'll try to consolidate the vtable related methods to Klass, >>>>> where they belong. >>>> >>>> This change actually consists of three changes: >>>> * modifying InstanceKlass::vtable_length_offset to become a byte offset >>>> and use the ByteSize type to communicate the scaling. >>>> * modifying InstanceKlass::vtable_start_offset to become a byte offset >>>> and use the ByteSize type, for symmetry reasons mainly. >>>> * adding a vtableEntry::size_in_bytes() since in many places the vtable >>>> entry size is used in combination with the vtable start to compute a >>>> byte offset for vtable lookups. >>>> >>>> I don't foresee any issues with the fact that the byte offset is >>>> represented as an int, for two reasons: >>>> 1) If the offset of any of these grows to over 2 gigabytes then we have >>>> a huge footprint problem with InstanceKlass >>>> 2) The offsets are converted to byte offsets and stored in ints already >>>> in the cpu specific code I've modified. >>>> >>>> Bug link: https://bugs.openjdk.java.net/browse/JDK-8147461 >>>> Webrev: http://cr.openjdk.java.net/~mgerdin/8147461/webrev.0/ >>>> >>>> Testing: JPRT on Oracle supported platforms, testing on AARCH64 and >>>> PPC64 would be much appreciated, appropriate mailing lists have been >>>> CC:ed to notify them of the request. >>>> >>>> >>>> [0] >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-December/021152.html >>>> >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-December/021224.html >>>> >>>> >>>> >>>> Thanks! >>>> /Mikael >>> >> > From tobias.hartmann at oracle.com Thu Jan 28 11:16:24 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Jan 2016 12:16:24 +0100 Subject: [9] RFR(XS): 8148460: TestUnsafeUnalignedMismatchedAccesses.java fails: error: package jdk.internal.misc does not exist Message-ID: <56A9F888.4050609@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8148460 http://cr.openjdk.java.net/~thartmann/8148460/webrev.00/ The test fails because it's missing the jtreg tag to load the 'jdk.internal.misc' module. Thanks, Tobias From roland.westrelin at oracle.com Thu Jan 28 12:49:38 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 28 Jan 2016 13:49:38 +0100 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 Message-ID: http://cr.openjdk.java.net/~roland/8087341/webrev.00/ C2 currently doesn?t optimize the field load in the following code: static Object field; static Object m(Object o) { field = o; return field; } It should return o but instead loads the value back from memory. The reason it misses such simple optimization is that the G1 post barrier has a memory barrier with a wide effect on the memory state. C2 doesn?t optimize this either: object.field = other_object; object.field = other_object; Same applies to -XX:+UseConcMarkSweepGC -XX:+UseCondCardMark That memory barrier was added to have a memory barrier instruction and doesn?t have to have a wide memory effect. Roland. From pavel.punegov at oracle.com Thu Jan 28 14:07:56 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Thu, 28 Jan 2016 17:07:56 +0300 Subject: RFR (XXS): [TESTBUG] InlineCommandTest.java: unknown compiler level 0 for commpile ID: 651 In-Reply-To: <56A91627.4040500@oracle.com> References: <1EC15E02-BB6F-480C-8FB4-40F8DB9A7C39@oracle.com> <56A91627.4040500@oracle.com> Message-ID: <7B1F2311-AFFD-4C4A-BEDF-418662A16564@oracle.com> Thanks Vladimir ? Pavel. > On 27 Jan 2016, at 22:10, Vladimir Kozlov wrote: > > Looks fine. > > Vlaidmir > > On 1/27/16 6:51 AM, Pavel Punegov wrote: >> Please review the following small patch for inlining tests. >> >> Issue: tests are unable to find JFR compilation event for appropriate >> inline event. This happens because the recording stops before the >> compilation finished. Invocation of the test method is not synchronised >> with compilation. >> >> Fix: add Xbatch to make compilation block test thread. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8144239 >> webrev: http://cr.openjdk.java.net/~ppunegov/8144239/webrev.00/ >> >> ? Thanks, >> Pavel Punegov >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Thu Jan 28 14:11:37 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 28 Jan 2016 15:11:37 +0100 Subject: [9] RFR(XS): 8148460: TestUnsafeUnalignedMismatchedAccesses.java fails: error: package jdk.internal.misc does not exist In-Reply-To: <56A9F888.4050609@oracle.com> References: <56A9F888.4050609@oracle.com> Message-ID: <148AD43A-07E3-4E55-B03A-A6915306DF57@oracle.com> > http://cr.openjdk.java.net/~thartmann/8148460/webrev.00/ That looks good to me. Roland. From roland.westrelin at oracle.com Thu Jan 28 14:22:19 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 28 Jan 2016 15:22:19 +0100 Subject: RFR(S): 6378256: Performance problem with System.identityHashCode in client compiler In-Reply-To: References: Message-ID: > With reference to below email thread, please send review comments for the revised patch for JDK-6378256. > http://cr.openjdk.java.net/~thartmann/6378256/webrev.02/ That looks good. Can you justify the comments again? Also the x86_64 and x86_32 are (mostly?) identical. Do we want to create a sharedRuntime_x86.cpp, move the InlineObjectHash code in its own function there to avoid duplication? Roland. From tobias.hartmann at oracle.com Thu Jan 28 14:25:52 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Jan 2016 15:25:52 +0100 Subject: [9] RFR(XS): 8148460: TestUnsafeUnalignedMismatchedAccesses.java fails: error: package jdk.internal.misc does not exist In-Reply-To: <148AD43A-07E3-4E55-B03A-A6915306DF57@oracle.com> References: <56A9F888.4050609@oracle.com> <148AD43A-07E3-4E55-B03A-A6915306DF57@oracle.com> Message-ID: <56AA24F0.5010709@oracle.com> Thanks, Roland. Best, Tobias On 28.01.2016 15:11, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~thartmann/8148460/webrev.00/ > > That looks good to me. > > Roland. > From tobias.hartmann at oracle.com Thu Jan 28 14:27:25 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 28 Jan 2016 15:27:25 +0100 Subject: [9] RFR(XS): 8148460: TestUnsafeUnalignedMismatchedAccesses.java fails: error: package jdk.internal.misc does not exist In-Reply-To: <56A9F888.4050609@oracle.com> References: <56A9F888.4050609@oracle.com> Message-ID: <56AA254D.7020303@oracle.com> Hi, I was asked to push this directly to the Jake repo. Christian T. (CC'ed) will sponsor the change. Thanks, Tobias On 28.01.2016 12:16, Tobias Hartmann wrote: > Hi, > > please review the following patch: > > https://bugs.openjdk.java.net/browse/JDK-8148460 > http://cr.openjdk.java.net/~thartmann/8148460/webrev.00/ > > The test fails because it's missing the jtreg tag to load the 'jdk.internal.misc' module. > > Thanks, > Tobias > From adinn at redhat.com Thu Jan 28 14:30:35 2016 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 28 Jan 2016 14:30:35 +0000 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 In-Reply-To: References: Message-ID: <56AA260B.8080101@redhat.com> Hi Roland, On 28/01/16 12:49, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8087341/webrev.00/ > > C2 currently doesn?t optimize the field load in the following code: > > static Object field; > > static Object m(Object o) { field = o; return field; } > > It should return o but instead loads the value back from memory. The > reason it misses such simple optimization is that the G1 post barrier > has a memory barrier with a wide effect on the memory state. C2 > doesn?t optimize this either: > > object.field = other_object; > object.field = other_object; > > Same applies to -XX:+UseConcMarkSweepGC -XX:+UseCondCardMark > > That memory barrier was added to have a memory barrier instruction > and doesn?t have to have a wide memory effect. I think this looks ok -- not sure until I try it out. However, I /am/ fairly sure it is going to cause a problem for the AArch64 code which optimizes volatile loads and stores. That's because it change the characteristic shape of the subgraph searched for by the predicates which decide whether to i) generate loads + membars or ii) plant stlr or ladr instructions. I'll look into this asap. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (US), Michael O'Neill (Ireland), Paul Argiry (US) From ivan at azulsystems.com Thu Jan 28 14:51:16 2016 From: ivan at azulsystems.com (Ivan Krylov) Date: Thu, 28 Jan 2016 17:51:16 +0300 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> <56A8BC9D.8060004@azulsystems.com> <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> Message-ID: <56AA2AE4.2090803@azulsystems.com> Hi Igor, Following Vladimir's suggestion I eliminated the UseOnSpinWaitIntrinsic flag altogether. I have adopted the Matcher::match_rule_supported() logic - seems to work on intel, but I don't have any non-intel box to test. Anyway, the new webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.01/ Igor, Vladimir, thanks, Ivan On 27/01/2016 22:03, Igor Veresov wrote: > Actually, I?d rather use Matcher::match_rule_supported() to test if it?s supported on the platform, rather than fixing all vm_version_*.* to check for the flag validity, that?s tedious (you forgot x86-32 and there?s going to be more platforms to fix for you sponsor). Something like UseOnSpinWaitIntrinsic && Matcher::match_rule_supported(Op_OnSpinWait) to decide whether or not to inline the intrinsic. Also, why are you not turning it on by default? > > igor > >> On Jan 27, 2016, at 4:48 AM, Ivan Krylov wrote: >> >> Looks like there was some good discussion while I was peacefully sleeping. >> I don't have much to add. This patch was somewhat inspired by JEP-171 changes. >> Perhaps,there are other ways to achieve the same semantics. >> >> So, if we can consider this reviewed - I will wait for the actual JEP to become targeted to 9 and then seek a sponsor to do the push. >> >> Thanks, >> >> Ivan >> >> On 27/01/2016 09:12, Igor Veresov wrote: >>> I realize it?s not a big deal. I was just wondering if there was any specific reason control alone is not enough. >>> Anyways, looks ok for the first cut. >>> >>> igor >>> >>>> On Jan 26, 2016, at 9:24 PM, Gil Tene wrote: >>>> >>>> Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and possibly a volatile store), the new node does not create significant extra move restrictions that are not already there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. There are probably other ways to achieve this, but this one doesn't really have a performance downside? >>>> >>>> ? Gil. >>>> >>>>> On Jan 26, 2016, at 4:44 PM, Igor Veresov wrote: >>>>> >>>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >>>>> >>>>> igor >>>>> >>>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>>>> >>>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>>>> >>>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>>>>> >>>>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Ivan >>>>>> >>>>>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops From aleksey.shipilev at oracle.com Thu Jan 28 15:04:05 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 28 Jan 2016 18:04:05 +0300 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 In-Reply-To: References: Message-ID: <56AA2DE5.5050008@oracle.com> On 01/28/2016 03:49 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8087341/webrev.00/ This looks good, for most tests here: http://cr.openjdk.java.net/~shade/8087341/G1BackToBackStores.java The generated code indeed shows commoned loads/stores with this patch, and some other things that can be improved in the codegen -- I'll file the separate issue(s) for that. I think this one is better to be renamed to something more specific, e.g. "Overly wide StoreLoad barrier in G1 breaks load/store coalescing"? On i7-4790K @ 4.0 GHz, Linux x86_64: == Baseline: Benchmark Mode Cnt Score Error Units G1BackToBackStores.test_1 avgt 15 2.193 ? 0.037 ns/op G1BackToBackStores.test_11 avgt 15 2.984 ? 0.076 ns/op G1BackToBackStores.test_111 avgt 15 3.706 ? 0.017 ns/op G1BackToBackStores.test_112 avgt 15 3.978 ? 0.078 ns/op G1BackToBackStores.test_121 avgt 15 4.107 ? 0.028 ns/op G1BackToBackStores.test_211 avgt 15 3.824 ? 0.186 ns/op == Patched: Benchmark Mode Cnt Score Error Units G1BackToBackStores.test_1 avgt 15 2.184 ? 0.020 ns/op G1BackToBackStores.test_11 avgt 15 2.790 ? 0.065 ns/op // ! G1BackToBackStores.test_111 avgt 15 3.264 ? 0.008 ns/op // !!! G1BackToBackStores.test_112 avgt 15 3.640 ? 0.011 ns/op // ! G1BackToBackStores.test_121 avgt 15 4.194 ? 0.033 ns/op G1BackToBackStores.test_211 avgt 15 3.665 ? 0.415 ns/op // ! Cheers, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jamsheed.c.m at oracle.com Thu Jan 28 16:16:23 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Thu, 28 Jan 2016 21:46:23 +0530 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same Message-ID: <56AA3ED7.4030407@oracle.com> Hi, Please review the fix made for issue bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ Unit tests: As its hard, none Other tests: jprt. Description of the issue: A valid pc match in exception cache returning an invalid handler makes assert to fail. This happens as ExceptionCache reads are lock free access. As a fix for this i have put a storestore mem barrier before the count is updated. Best Regards, Jamsheed From christian.thalinger at oracle.com Thu Jan 28 18:45:37 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 28 Jan 2016 19:45:37 +0100 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56AA3ED7.4030407@oracle.com> References: <56AA3ED7.4030407@oracle.com> Message-ID: <844981E3-4C91-4D52-BBDB-450D055C2599@oracle.com> if (count() < cache_size) { set_pc_at(count(),addr); set_handler_at(count(), handler); Shouldn?t we read count() only once into a local variable to rule any odd race bugs down the road? > On Jan 28, 2016, at 5:16 PM, Jamsheed C m wrote: > > Hi, > > Please review the fix made for issue > > bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 > web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ > > Unit tests: As its hard, none > > Other tests: jprt. > > Description of the issue: > A valid pc match in exception cache returning an invalid handler makes assert to fail. > This happens as ExceptionCache reads are lock free access. > > As a fix for this i have put a storestore mem barrier before the count is updated. > > Best Regards, > Jamsheed From jamsheed.c.m at oracle.com Thu Jan 28 20:29:51 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 29 Jan 2016 01:59:51 +0530 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <844981E3-4C91-4D52-BBDB-450D055C2599@oracle.com> References: <56AA3ED7.4030407@oracle.com> <844981E3-4C91-4D52-BBDB-450D055C2599@oracle.com> Message-ID: <56AA7A3F.6040800@oracle.com> On 1/29/2016 12:15 AM, Christian Thalinger wrote: > if (count() < cache_size) { > set_pc_at(count(),addr); > set_handler_at(count(), handler); > > Shouldn?t we read count() only once into a local variable to rule any odd race bugs down the road? write to cache is mutex lock protected. so this code is safe. Issue is seen in weak memory order machines. lockless read of exception cache values fails as writes in cache get reordered. Best Regards, Jamsheed > >> On Jan 28, 2016, at 5:16 PM, Jamsheed C m wrote: >> >> Hi, >> >> Please review the fix made for issue >> >> bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 >> web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ >> >> Unit tests: As its hard, none >> >> Other tests: jprt. >> >> Description of the issue: >> A valid pc match in exception cache returning an invalid handler makes assert to fail. >> This happens as ExceptionCache reads are lock free access. >> >> As a fix for this i have put a storestore mem barrier before the count is updated. >> >> Best Regards, >> Jamsheed From igor.veresov at oracle.com Thu Jan 28 20:41:42 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 28 Jan 2016 12:41:42 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56AA2AE4.2090803@azulsystems.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> <56A8BC9D.8060004@azulsystems.com> <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> <56AA2AE4.2090803@azulsystems.com> Message-ID: <2538083C-7906-44AA-A074-7DBF5F2D8654@oracle.com> x86.ad: It seems that the comment here is off: 1714 case Op_OnSpinWait: 1715 if (UseSSE < 2) // requires at least SSE4 1716 ret_value = false; 1717 break; Also we don?t support CPUs with SSE < 2, so you don?t have to make these changes to x86.ad. It?s enough that has_match_rule(), that is called by Matcher::match_rule_supported(), will return true for Op_OnSpinWait. x86_64.ad: +instruct onspinwait() +%{ + match(OnSpinWait); + ins_cost(200); ... Is there any reason this can?t be moved to generic x86.ad ? It can be easily supported on 32bit as well, right (we do still support 32bit mode on linux)? The encoding is the same for both 32 and 64 bit modes, so that should be trivial. library_call.cpp: I think you forgot to actually call Matcher::match_rule_supported(). I think it should be something like: bool LibraryCallKit::inline_onspinwait() { if (Matcher::match_rule_supported(Op_OnSpinWait) { insert_mem_bar(Op_OnSpinWait); return true; } return false; } igor > On Jan 28, 2016, at 6:51 AM, Ivan Krylov wrote: > > Hi Igor, > > Following Vladimir's suggestion I eliminated the UseOnSpinWaitIntrinsic flag altogether. I have adopted the Matcher::match_rule_supported() logic - seems to work on intel, but I don't have any non-intel box to test. > > Anyway, the new webrev: > http://cr.openjdk.java.net/~ikrylov/8147844.hs.01/ > > Igor, Vladimir, thanks, > > Ivan > > On 27/01/2016 22:03, Igor Veresov wrote: >> Actually, I?d rather use Matcher::match_rule_supported() to test if it?s supported on the platform, rather than fixing all vm_version_*.* to check for the flag validity, that?s tedious (you forgot x86-32 and there?s going to be more platforms to fix for you sponsor). Something like UseOnSpinWaitIntrinsic && Matcher::match_rule_supported(Op_OnSpinWait) to decide whether or not to inline the intrinsic. Also, why are you not turning it on by default? >> >> igor >> >>> On Jan 27, 2016, at 4:48 AM, Ivan Krylov wrote: >>> >>> Looks like there was some good discussion while I was peacefully sleeping. >>> I don't have much to add. This patch was somewhat inspired by JEP-171 changes. >>> Perhaps,there are other ways to achieve the same semantics. >>> >>> So, if we can consider this reviewed - I will wait for the actual JEP to become targeted to 9 and then seek a sponsor to do the push. >>> >>> Thanks, >>> >>> Ivan >>> >>> On 27/01/2016 09:12, Igor Veresov wrote: >>>> I realize it?s not a big deal. I was just wondering if there was any specific reason control alone is not enough. >>>> Anyways, looks ok for the first cut. >>>> >>>> igor >>>> >>>>> On Jan 26, 2016, at 9:24 PM, Gil Tene wrote: >>>>> >>>>> Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and possibly a volatile store), the new node does not create significant extra move restrictions that are not already there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. There are probably other ways to achieve this, but this one doesn't really have a performance downside? >>>>> >>>>> ? Gil. >>>>> >>>>>> On Jan 26, 2016, at 4:44 PM, Igor Veresov wrote: >>>>>> >>>>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >>>>>> >>>>>> igor >>>>>> >>>>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>>>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>>>>> >>>>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>>>>> >>>>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>>>>>> >>>>>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Ivan >>>>>>> >>>>>>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops > From gil at azul.com Wed Jan 27 05:24:42 2016 From: gil at azul.com (Gil Tene) Date: Wed, 27 Jan 2016 05:24:42 +0000 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: References: <56A751AE.9090203@azulsystems.com> Message-ID: Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and possibly a volatile store), the new node does not create significant extra move restrictions that are not already there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. There are probably other ways to achieve this, but this one doesn't really have a performance downside? ? Gil. > On Jan 26, 2016, at 4:44 PM, Igor Veresov wrote: > > So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? > > igor > >> On Jan 26, 2016, at 2:59 AM, Ivan Krylov > wrote: >> >> Hello, >> >> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832 . There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >> >> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >> >> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >> >> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >> >> Thanks, >> >> Ivan >> >> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From vladimir.kozlov at oracle.com Thu Jan 28 23:45:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Jan 2016 15:45:12 -0800 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 In-Reply-To: References: Message-ID: <56AAA808.6090604@oracle.com> G1 barrier was added by Mikael Gerdin from GC. He should also look on this change. https://bugs.openjdk.java.net/browse/JDK-8014555 Also we have specialized insert_mem_bar_volatile() if we don't want wide memory affect. Why not use it? And we need to keep precedent edge link to oop store in case EA eliminates related allocation. Thanks, Vladimir On 1/28/16 4:49 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8087341/webrev.00/ > > C2 currently doesn?t optimize the field load in the following code: > > static Object field; > > static Object m(Object o) { > field = o; > return field; > } > > It should return o but instead loads the value back from memory. The reason it misses such simple optimization is that the G1 post barrier has a memory barrier with a wide effect on the memory state. C2 doesn?t optimize this either: > > object.field = other_object; > object.field = other_object; > > Same applies to -XX:+UseConcMarkSweepGC -XX:+UseCondCardMark > > That memory barrier was added to have a memory barrier instruction and doesn?t have to have a wide memory effect. > > Roland. > From vladimir.kozlov at oracle.com Fri Jan 29 00:34:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Jan 2016 16:34:28 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <56AA2AE4.2090803@azulsystems.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> <56A8BC9D.8060004@azulsystems.com> <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> <56AA2AE4.2090803@azulsystems.com> Message-ID: <56AAB394.2080404@oracle.com> First, it was Igor's suggestion about match_rule_supported(). Why you check intrinsic in inline_native_Class_query() ? I don't see the match_rule_supported() check for this intrinsic (c2compiler.cpp). I think you should consider to implement this for C1 and Interpreter since Tiered Compilation is on by default. So that Client VM can benefit too. Change test: 1. Don't use /othervm since you fork separate process and don't use flags. 2. Don't use "-server" flag - Client VM could be tested which does not have server. 3. Don't use -Xcomp - it will timeout on slow machines. Create separate test() method to be compiled: public static void main(final String[] args) throws Exception { int end = 20_000; for (int i=0; i < end; i++) { test(); } } static void test() { java.lang.Runtime.onSpinWait(); } Thanks, Vladimir On 1/28/16 6:51 AM, Ivan Krylov wrote: > Hi Igor, > > Following Vladimir's suggestion I eliminated the UseOnSpinWaitIntrinsic flag altogether. I have adopted the > Matcher::match_rule_supported() logic - seems to work on intel, but I don't have any non-intel box to test. > > Anyway, the new webrev: > http://cr.openjdk.java.net/~ikrylov/8147844.hs.01/ > > Igor, Vladimir, thanks, > > Ivan > > On 27/01/2016 22:03, Igor Veresov wrote: >> Actually, I?d rather use Matcher::match_rule_supported() to test if it?s supported on the platform, rather than fixing >> all vm_version_*.* to check for the flag validity, that?s tedious (you forgot x86-32 and there?s going to be more >> platforms to fix for you sponsor). Something like UseOnSpinWaitIntrinsic && >> Matcher::match_rule_supported(Op_OnSpinWait) to decide whether or not to inline the intrinsic. Also, why are you not >> turning it on by default? >> >> igor >> >>> On Jan 27, 2016, at 4:48 AM, Ivan Krylov wrote: >>> >>> Looks like there was some good discussion while I was peacefully sleeping. >>> I don't have much to add. This patch was somewhat inspired by JEP-171 changes. >>> Perhaps,there are other ways to achieve the same semantics. >>> >>> So, if we can consider this reviewed - I will wait for the actual JEP to become targeted to 9 and then seek a sponsor >>> to do the push. >>> >>> Thanks, >>> >>> Ivan >>> >>> On 27/01/2016 09:12, Igor Veresov wrote: >>>> I realize it?s not a big deal. I was just wondering if there was any specific reason control alone is not enough. >>>> Anyways, looks ok for the first cut. >>>> >>>> igor >>>> >>>>> On Jan 26, 2016, at 9:24 PM, Gil Tene wrote: >>>>> >>>>> Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and >>>>> possibly a volatile store), the new node does not create significant extra move restrictions that are not already >>>>> there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. >>>>> There are probably other ways to achieve this, but this one doesn't really have a performance downside? >>>>> >>>>> ? Gil. >>>>> >>>>>> On Jan 26, 2016, at 4:44 PM, Igor Veresov wrote: >>>>>> >>>>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in >>>>>> your loop, right? If that?s intentional I wonder why is that? >>>>>> >>>>>> igor >>>>>> >>>>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is >>>>>>> forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a consensus on the API >>>>>>> side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API >>>>>>> changes can be seen at the webrev: >>>>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>>>>> >>>>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class >>>>>>> libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>>>>>> >>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>>>>> >>>>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic >>>>>>> that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic >>>>>>> flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at >>>>>>> empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is >>>>>>> functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, >>>>>>> there seems to be no need to add guarding code for older generations of Intel CPUs. >>>>>>> >>>>>>> The proposed patch includes a simple regression test that simply makes sure that method >>>>>>> java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance >>>>>>> tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about >>>>>>> the process. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Ivan >>>>>>> >>>>>>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops > From vladimir.kozlov at oracle.com Fri Jan 29 00:52:26 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Jan 2016 16:52:26 -0800 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56AA7A3F.6040800@oracle.com> References: <56AA3ED7.4030407@oracle.com> <844981E3-4C91-4D52-BBDB-450D055C2599@oracle.com> <56AA7A3F.6040800@oracle.com> Message-ID: <56AAB7CA.4000604@oracle.com> On 1/28/16 12:29 PM, Jamsheed C m wrote: > > > On 1/29/2016 12:15 AM, Christian Thalinger wrote: >> if (count() < cache_size) { >> set_pc_at(count(),addr); >> set_handler_at(count(), handler); >> >> Shouldn?t we read count() only once into a local variable to rule any odd race bugs down the road? +1. As I understand, Chris is suggesting to do it in addition to storestore barrier. Do we have other similar code? Thanks, Vladimir > > write to cache is mutex lock protected. so this code is safe. > > Issue is seen in weak memory order machines. lockless read of exception cache values fails as writes in cache get > reordered. > > Best Regards, > Jamsheed >> >>> On Jan 28, 2016, at 5:16 PM, Jamsheed C m wrote: >>> >>> Hi, >>> >>> Please review the fix made for issue >>> >>> bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 >>> web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ >>> >>> Unit tests: As its hard, none >>> >>> Other tests: jprt. >>> >>> Description of the issue: >>> A valid pc match in exception cache returning an invalid handler makes assert to fail. >>> This happens as ExceptionCache reads are lock free access. >>> >>> As a fix for this i have put a storestore mem barrier before the count is updated. >>> >>> Best Regards, >>> Jamsheed > From igor.veresov at oracle.com Fri Jan 29 01:48:48 2016 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 28 Jan 2016 17:48:48 -0800 Subject: RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic In-Reply-To: <2538083C-7906-44AA-A074-7DBF5F2D8654@oracle.com> References: <56A751AE.9090203@azulsystems.com> <45B4730C-CCC2-4523-ACD1-D18B20E5EC5F@oracle.com> <56A8BC9D.8060004@azulsystems.com> <6148E4D7-AF5E-4094-B363-52E0D83452E9@oracle.com> <56AA2AE4.2090803@azulsystems.com> <2538083C-7906-44AA-A074-7DBF5F2D8654@oracle.com> Message-ID: <50C14C66-4068-4DD7-BD94-96E37F7C9B0A@oracle.com> > On Jan 28, 2016, at 12:41 PM, Igor Veresov wrote: > > x86.ad: > > It seems that the comment here is off: > 1714 case Op_OnSpinWait: > 1715 if (UseSSE < 2) // requires at least SSE4 > 1716 ret_value = false; > 1717 break; > > Also we don?t support CPUs with SSE < 2, so you don?t have to make these changes to x86.ad. It?s enough that has_match_rule(), that is called by Matcher::match_rule_supported(), will return true for Op_OnSpinWait. > > > x86_64.ad: > +instruct onspinwait() > +%{ > + match(OnSpinWait); > + ins_cost(200); > ... > > Is there any reason this can?t be moved to generic x86.ad ? It can be easily supported on 32bit as well, right (we do still support 32bit mode on linux)? The encoding is the same for both 32 and 64 bit modes, so that should be trivial. > > library_call.cpp: > > I think you forgot to actually call Matcher::match_rule_supported(). I think it should be something like: > > bool LibraryCallKit::inline_onspinwait() { > if (Matcher::match_rule_supported(Op_OnSpinWait) { > insert_mem_bar(Op_OnSpinWait); > return true; > } > return false; > } > As Vladimir suggested, it?s better to check Matcher::match_rule_supported() in c2compiler.cpp in is_intrinsic_supported(). Sorry about the confusion. I stand by the other comments though. igor > > igor > >> On Jan 28, 2016, at 6:51 AM, Ivan Krylov wrote: >> >> Hi Igor, >> >> Following Vladimir's suggestion I eliminated the UseOnSpinWaitIntrinsic flag altogether. I have adopted the Matcher::match_rule_supported() logic - seems to work on intel, but I don't have any non-intel box to test. >> >> Anyway, the new webrev: >> http://cr.openjdk.java.net/~ikrylov/8147844.hs.01/ >> >> Igor, Vladimir, thanks, >> >> Ivan >> >> On 27/01/2016 22:03, Igor Veresov wrote: >>> Actually, I?d rather use Matcher::match_rule_supported() to test if it?s supported on the platform, rather than fixing all vm_version_*.* to check for the flag validity, that?s tedious (you forgot x86-32 and there?s going to be more platforms to fix for you sponsor). Something like UseOnSpinWaitIntrinsic && Matcher::match_rule_supported(Op_OnSpinWait) to decide whether or not to inline the intrinsic. Also, why are you not turning it on by default? >>> >>> igor >>> >>>> On Jan 27, 2016, at 4:48 AM, Ivan Krylov wrote: >>>> >>>> Looks like there was some good discussion while I was peacefully sleeping. >>>> I don't have much to add. This patch was somewhat inspired by JEP-171 changes. >>>> Perhaps,there are other ways to achieve the same semantics. >>>> >>>> So, if we can consider this reviewed - I will wait for the actual JEP to become targeted to 9 and then seek a sponsor to do the push. >>>> >>>> Thanks, >>>> >>>> Ivan >>>> >>>> On 27/01/2016 09:12, Igor Veresov wrote: >>>>> I realize it?s not a big deal. I was just wondering if there was any specific reason control alone is not enough. >>>>> Anyways, looks ok for the first cut. >>>>> >>>>> igor >>>>> >>>>>> On Jan 26, 2016, at 9:24 PM, Gil Tene wrote: >>>>>> >>>>>> Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and possibly a volatile store), the new node does not create significant extra move restrictions that are not already there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. There are probably other ways to achieve this, but this one doesn't really have a performance downside? >>>>>> >>>>>> ? Gil. >>>>>> >>>>>>> On Jan 26, 2016, at 4:44 PM, Igor Veresov wrote: >>>>>>> >>>>>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that?s intentional I wonder why is that? >>>>>>> >>>>>>> igor >>>>>>> >>>>>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov wrote: >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed ?spin wait hint?. The JEP is forming up nicely at https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly. The upcoming API changes can be seen at the webrev: >>>>>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ >>>>>>>> >>>>>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 >>>>>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ >>>>>>>> >>>>>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86. This intrinsic is guarded by the -XX:?UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() ? effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs. >>>>>>>> >>>>>>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Ivan >>>>>>>> >>>>>>>> [1] - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops >> > From igor.ignatyev at oracle.com Fri Jan 29 01:49:28 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 29 Jan 2016 04:49:28 +0300 Subject: RFR(M) : 8134102 : [TESTBUG] compiler/unsafe/UnsafeGetConstantField.java test fails in Jake Message-ID: http://cr.openjdk.java.net/~iignatyev/8134102/webrev.00/ > 134 lines changed: 84 ins; 15 del; 35 mod; Hi all, could you please review the patch for compiler/unsafe/UnsafeGetConstantField.java test? the test fails in jake, because the test class is in java.lang.invoke package, which is already defined in java.base module. Instead of using ?patch? mechanism, which allows to add classes into existing modules, I decided to remove direct usage of package-private members from j.l.i: - @DontInline was changed by a corresponding -XX:CompileCommand - direct usage of Stable.class replaced w/ Class.forName - UnsafeGetConstantField is moved from java.lang.invoke package to compile.unsafe, thus all the nested classes used from generated tests are made public Besides changes for jake, I also slightly modified the test (originally to be sure that the test still checks that it supposed to): - for getObject* tests, String constant is used as the field value instead of 'new Object()' - add checks that Test::testDirect/testUnsafe return prev. value even after field's value was changed. this check fails for Unsafe::getCharUnaligned if JVM is started w/ -XX:-UseUnalignedAccesses. I?ve filed a bug for that (JDK-8148518) and temporary disabled the check which fails - in case of failure, the generated class is dumped into workdir testing: run the test against 2016-01-26 jake nightly build JDK-8134102 : http://bugs.openjdk.java.net/browse/JDK-8134102 JDK-8148518 : http://bugs.openjdk.java.net/browse/JDK-8148518 PS the patch will be integrated thru jigsaw/jake repo Thanks, ? Igor From dean.long at oracle.com Fri Jan 29 04:10:34 2016 From: dean.long at oracle.com (Dean Long) Date: Thu, 28 Jan 2016 20:10:34 -0800 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56AA3ED7.4030407@oracle.com> References: <56AA3ED7.4030407@oracle.com> Message-ID: <56AAE63A.4060905@oracle.com> As you noticed, for this kind of bug the memory is going to consistent by the time the core file is written. So to help debug this assert it if happens again, could you change it to something like: #ifdef ASSERT address computed_address = SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true); vmassert(handler_address == computed_address, PTR_FORMAT " != " PTR_FORMAT, p2i(handler_address), p2i(computed_address)); #endif dl On 1/28/2016 8:16 AM, Jamsheed C m wrote: > Hi, > > Please review the fix made for issue > > bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 > web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ > > Unit tests: As its hard, none > > Other tests: jprt. > > Description of the issue: > A valid pc match in exception cache returning an invalid handler makes > assert to fail. > This happens as ExceptionCache reads are lock free access. > > As a fix for this i have put a storestore mem barrier before the count > is updated. > > Best Regards, > Jamsheed From jamsheed.c.m at oracle.com Fri Jan 29 06:36:24 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 29 Jan 2016 12:06:24 +0530 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56AAE63A.4060905@oracle.com> References: <56AA3ED7.4030407@oracle.com> <56AAE63A.4060905@oracle.com> Message-ID: <56AB0868.2080307@oracle.com> Hi Dean, On 1/29/2016 9:40 AM, Dean Long wrote: > As you noticed, for this kind of bug the memory is going to consistent > by the time the core file is written. > So to help debug this assert it if happens again, could you change it > to something like: > > #ifdef ASSERT > address computed_address = > SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, > force_unwind, true); > vmassert(handler_address == computed_address, PTR_FORMAT " != " > PTR_FORMAT, p2i(handler_address), p2i(computed_address)); > #endif I got handler_address value in this case. This value was inconsistent with value in ExceptionCache. It was having initial value and that was helpful in figuring out what would have went wrong. I will make this change. Best Regards, Jamsheed > > dl > > On 1/28/2016 8:16 AM, Jamsheed C m wrote: >> Hi, >> >> Please review the fix made for issue >> >> bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 >> web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ >> >> Unit tests: As its hard, none >> >> Other tests: jprt. >> >> Description of the issue: >> A valid pc match in exception cache returning an invalid handler >> makes assert to fail. >> This happens as ExceptionCache reads are lock free access. >> >> As a fix for this i have put a storestore mem barrier before the >> count is updated. >> >> Best Regards, >> Jamsheed > From jamsheed.c.m at oracle.com Fri Jan 29 07:09:27 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 29 Jan 2016 12:39:27 +0530 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56AAB7CA.4000604@oracle.com> References: <56AA3ED7.4030407@oracle.com> <844981E3-4C91-4D52-BBDB-450D055C2599@oracle.com> <56AA7A3F.6040800@oracle.com> <56AAB7CA.4000604@oracle.com> Message-ID: <56AB1027.4050402@oracle.com> Hi Vladimir, On 1/29/2016 6:22 AM, Vladimir Kozlov wrote: > On 1/28/16 12:29 PM, Jamsheed C m wrote: >> >> >> On 1/29/2016 12:15 AM, Christian Thalinger wrote: >>> if (count() < cache_size) { >>> set_pc_at(count(),addr); >>> set_handler_at(count(), handler); >>> >>> Shouldn?t we read count() only once into a local variable to rule >>> any odd race bugs down the road? > > +1. As I understand, Chris is suggesting to do it in addition to > storestore barrier. Ok. > > > Do we have other similar code? I am not sure, let me have a check. Best Regards, Jamsheed > > Thanks, > Vladimir > >> >> write to cache is mutex lock protected. so this code is safe. >> >> Issue is seen in weak memory order machines. lockless read of >> exception cache values fails as writes in cache get >> reordered. >> >> Best Regards, >> Jamsheed >>> >>>> On Jan 28, 2016, at 5:16 PM, Jamsheed C m >>>> wrote: >>>> >>>> Hi, >>>> >>>> Please review the fix made for issue >>>> >>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 >>>> web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ >>>> >>>> Unit tests: As its hard, none >>>> >>>> Other tests: jprt. >>>> >>>> Description of the issue: >>>> A valid pc match in exception cache returning an invalid handler >>>> makes assert to fail. >>>> This happens as ExceptionCache reads are lock free access. >>>> >>>> As a fix for this i have put a storestore mem barrier before the >>>> count is updated. >>>> >>>> Best Regards, >>>> Jamsheed >> From mikael.gerdin at oracle.com Fri Jan 29 09:17:49 2016 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Fri, 29 Jan 2016 10:17:49 +0100 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 In-Reply-To: <56AAA808.6090604@oracle.com> References: <56AAA808.6090604@oracle.com> Message-ID: <56AB2E3D.9030501@oracle.com> Hi, On 2016-01-29 00:45, Vladimir Kozlov wrote: > G1 barrier was added by Mikael Gerdin from GC. He should also look on > this change. I don't have enough C2 knowledge to decode exactly want Roland's changes achieve, but I can attempt to describe what I needed to achieve with the Op_MemBarVolatile: In the assignment o.f = a; G1 needs a post-barrier of the form: o.f = a; if (card_for(&o.f) != 32)) { #StoreLoad if (card_for(&o.f) != 0)) { card_for(&o.f) = 0 } } The #StoreLoad is needed to force the second card table load to not get reordered with the store of the field. The first load from the card table and the check for 32 is an optimization, where we know that the value 32 is idempotent, it will not change outside of safepoints. The second load from the card table must not be allowed to occur after we know that other threads see the value "a" in o.f, otherwise a concurrent refinement thread can see the old value of o.f and we will crash in interesting ways later on... /Mikael > > https://bugs.openjdk.java.net/browse/JDK-8014555 > > Also we have specialized insert_mem_bar_volatile() if we don't want wide > memory affect. Why not use it? > And we need to keep precedent edge link to oop store in case EA > eliminates related allocation. > > Thanks, > Vladimir > > On 1/28/16 4:49 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8087341/webrev.00/ >> >> C2 currently doesn?t optimize the field load in the following code: >> >> static Object field; >> >> static Object m(Object o) { >> field = o; >> return field; >> } >> >> It should return o but instead loads the value back from memory. The >> reason it misses such simple optimization is that the G1 post barrier >> has a memory barrier with a wide effect on the memory state. C2 >> doesn?t optimize this either: >> >> object.field = other_object; >> object.field = other_object; >> >> Same applies to -XX:+UseConcMarkSweepGC -XX:+UseCondCardMark >> >> That memory barrier was added to have a memory barrier instruction and >> doesn?t have to have a wide memory effect. >> >> Roland. >> From vladimir.x.ivanov at oracle.com Fri Jan 29 12:24:09 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 29 Jan 2016 15:24:09 +0300 Subject: RFR(M) : 8134102 : [TESTBUG] compiler/unsafe/UnsafeGetConstantField.java test fails in Jake In-Reply-To: References: Message-ID: <56AB59E9.9000606@oracle.com> Overall, looks good. One request: JDK-8148518 is caused by field and getter type mismatch [1] (char vs short). Such behavior is expected, since char & short loads aren't interchangeable [2]. There are different ways to fix that particular case (add new intrinsic or enhance constant folding logic to take the cast into account), but for now, please, change the filter to ignore CharUnaligned and add a comment with bug id (JDK-8148518): + if (!hasDefaultValue && (stable || g.isFinal())) { Best regards, Vladimir Ivanov [1] jdk/src/java.base/share/classes/jdk/internal/misc/Unsafe.java: @HotSpotIntrinsicCandidate public final char getCharUnaligned(Object o, long offset) { return (char)getShortUnaligned(o, offset); } [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018322.html ?I spotted a bug when field and accessor types mismatch, but the JIT still constant-folds the load. The fix made expected result detection even more complex, so I decided to get rid of it & WhiteBox hooks altogether. The test exercises different code paths and compares returned values now.? On 1/29/16 4:49 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8134102/webrev.00/ >> 134 lines changed: 84 ins; 15 del; 35 mod; > Hi all, > > could you please review the patch for compiler/unsafe/UnsafeGetConstantField.java test? > > the test fails in jake, because the test class is in java.lang.invoke package, which is already defined in java.base module. Instead of using ?patch? mechanism, which allows to add classes into existing modules, I decided to remove direct usage of package-private members from j.l.i: > - @DontInline was changed by a corresponding -XX:CompileCommand > - direct usage of Stable.class replaced w/ Class.forName > - UnsafeGetConstantField is moved from java.lang.invoke package to compile.unsafe, thus all the nested classes used from generated tests are made public > > Besides changes for jake, I also slightly modified the test (originally to be sure that the test still checks that it supposed to): > - for getObject* tests, String constant is used as the field value instead of 'new Object()' > - add checks that Test::testDirect/testUnsafe return prev. value even after field's value was changed. this check fails for Unsafe::getCharUnaligned if JVM is started w/ -XX:-UseUnalignedAccesses. I?ve filed a bug for that (JDK-8148518) and temporary disabled the check which fails > - in case of failure, the generated class is dumped into workdir > > testing: run the test against 2016-01-26 jake nightly build > > JDK-8134102 : http://bugs.openjdk.java.net/browse/JDK-8134102 > JDK-8148518 : http://bugs.openjdk.java.net/browse/JDK-8148518 > > PS the patch will be integrated thru jigsaw/jake repo > > Thanks, > ? Igor > From roland.westrelin at oracle.com Fri Jan 29 13:27:23 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 29 Jan 2016 14:27:23 +0100 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 In-Reply-To: <56AA260B.8080101@redhat.com> References: <56AA260B.8080101@redhat.com> Message-ID: <434839E5-8AB1-4FEC-BDD7-AD30ABBD6C76@oracle.com> Hi Andrew, > I think this looks ok -- not sure until I try it out. However, I /am/ > fairly sure it is going to cause a problem for the AArch64 code which > optimizes volatile loads and stores. That's because it change the > characteristic shape of the subgraph searched for by the predicates > which decide whether to i) generate loads + membars or ii) plant stlr or > ladr instructions. > > I'll look into this asap. Thanks for looking at this. I?ll wait to hear back from you until I move forward with this change. Roland. From aleksey.shipilev at oracle.com Fri Jan 29 13:28:12 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 29 Jan 2016 16:28:12 +0300 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 In-Reply-To: <56AA2DE5.5050008@oracle.com> References: <56AA2DE5.5050008@oracle.com> Message-ID: <56AB68EC.6030801@oracle.com> On 01/28/2016 06:04 PM, Aleksey Shipilev wrote: > On 01/28/2016 03:49 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8087341/webrev.00/ > > This looks good, for most tests here: > http://cr.openjdk.java.net/~shade/8087341/G1BackToBackStores.java > > The generated code indeed shows commoned loads/stores with this patch, > and some other things that can be improved in the codegen -- I'll file > the separate issue(s) for that. I think this one is better to be renamed > to something more specific, e.g. "Overly wide StoreLoad barrier in G1 > breaks load/store coalescing"? Found an even more convincing example: http://cr.openjdk.java.net/~shade/8087341/G1LoopStores.java Happy to re-run once Roland has the patch with Vladimir's comments. Cheers, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roland.westrelin at oracle.com Fri Jan 29 14:04:39 2016 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 29 Jan 2016 15:04:39 +0100 Subject: RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1 In-Reply-To: <56AAA808.6090604@oracle.com> References: <56AAA808.6090604@oracle.com> Message-ID: <550334AB-0A58-41A7-B00A-974EBA3F8B0B@oracle.com> Hi Vladimir, Thanks for looking at this. > G1 barrier was added by Mikael Gerdin from GC. He should also look on this change. > > https://bugs.openjdk.java.net/browse/JDK-8014555 > > Also we have specialized insert_mem_bar_volatile() if we don't want wide memory affect. Why not use it? The membar in the change takes the entire memory state as input but only changes raw memory. I don?t think that can be achieved with insert_mem_bar_volatile(). As explained by Mikael, the membar is here to force ordering between the oop store and the card table load. That?s why I think the membar?s inputs and outputs should be set up that way. > And we need to keep precedent edge link to oop store in case EA eliminates related allocation. I missed that, indeed. Mikael, can you confirm if this is ok (eliminating the barrier if the object being stored to doesn?t escape)? Roland. > > Thanks, > Vladimir > > On 1/28/16 4:49 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8087341/webrev.00/ >> >> C2 currently doesn?t optimize the field load in the following code: >> >> static Object field; >> >> static Object m(Object o) { >> field = o; >> return field; >> } >> >> It should return o but instead loads the value back from memory. The reason it misses such simple optimization is that the G1 post barrier has a memory barrier with a wide effect on the memory state. C2 doesn?t optimize this either: >> >> object.field = other_object; >> object.field = other_object; >> >> Same applies to -XX:+UseConcMarkSweepGC -XX:+UseCondCardMark >> >> That memory barrier was added to have a memory barrier instruction and doesn?t have to have a wide memory effect. >> >> Roland. >> From tobias.hartmann at oracle.com Fri Jan 29 14:16:22 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 29 Jan 2016 15:16:22 +0100 Subject: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit Message-ID: <56AB7436.7020302@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8148490 http://cr.openjdk.java.net/~thartmann/8148490/webrev.00/ RegisterSaver::save_live_registers() and RegisterSaver::restore_live_registers() are used by the safepoint handling code to save and restore registers. The following code is emitted to save and restore XMM/YMM registers on 32 bit: Save: ... 0xf34ca12e: vmovdqu %xmm0,0xb0(%esp) 0xf34ca137: vmovdqu %xmm1,0xc0(%esp) ... 0xf34ca16d: vmovdqu %xmm7,0x120(%esp) 0xf34ca176: sub $0x80,%esp 0xf34ca17c: vextractf128 $0x1,%ymm0,(%esp) 0xf34ca183: vextractf128 $0x1,%ymm1,0x10(%esp) ... 0xf34ca1b3: vextractf128 $0x1,%ymm7,0x70(%esp) ... Restore: ... 0xf34ca202: vinsertf128 $0x1,(%esp),%ymm0,%ymm0 0xf34ca209: vinsertf128 $0x1,0x10(%esp),%ymm1,%ymm1 ... 0xf34ca239: vinsertf128 $0x1,0x70(%esp),%ymm7,%ymm7 0xf34ca241: add $0x80,%esp 0xf34ca247: vmovdqu 0x130(%esp),%xmm0 0xf34ca250: vmovdqu 0x140(%esp),%xmm1 ... 0xf34ca286: vmovdqu 0x1a0(%esp),%xmm7 ... The stack offsets for the vmovdqu instructions are wrong, causing the XMM registers to contain random values after a safepoint. The problem is that "additional_frame_bytes" is added to the stack offset although the stack pointer is incremented just before: 283 __ addptr(rsp, additional_frame_bytes); // Save upper half of YMM registers The regression test fails with "Test failed: array[0] = 1973.0 but should be 10.000" because the vectorized loop returns a wrong result. I spotted and fixed the following other problems: - the vmovdqu instructions should be emitted before restoring YMM and ZMM because they zero the upper part of the XMM registers (i.e. YMM/ZMM) - if 'UseAVX > 2' is set/available, we save the ZMM registers as well but we do not increment 'additional_frame_words' accordingly (we need another 8*32 bytes of stack space) Unfortunately, I don't have access to a CPU with the AVX-512 instruction set to test the "UseAVX > 2" related changes. Michael, could you verify the changes? The problems were introduced by the fix for JDK-8142980. Thanks, Tobias From doug.simon at oracle.com Fri Jan 29 15:34:41 2016 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 29 Jan 2016 16:34:41 +0100 Subject: RFR: 8148507: [JVMCI] mitigate deadlocks related to JVMCI compiler under -Xbatch Message-ID: <845F1D56-3194-49AE-95C1-79545F8C50AC@oracle.com> Please review this small change to further mitigate deadlocks that can be caused by JVMCI when BackgroundCompilation is disabled. https://bugs.openjdk.java.net/browse/JDK-8148507 http://cr.openjdk.java.net/~dnsimon/8148507 -Doug From michael.c.berg at intel.com Fri Jan 29 18:36:47 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 29 Jan 2016 18:36:47 +0000 Subject: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit In-Reply-To: <56AB7436.7020302@oracle.com> References: <56AB7436.7020302@oracle.com> Message-ID: Ok, I will take a look. -Michael -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Friday, January 29, 2016 6:16 AM To: hotspot-compiler-dev at openjdk.java.net Cc: Berg, Michael C Subject: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8148490 http://cr.openjdk.java.net/~thartmann/8148490/webrev.00/ RegisterSaver::save_live_registers() and RegisterSaver::restore_live_registers() are used by the safepoint handling code to save and restore registers. The following code is emitted to save and restore XMM/YMM registers on 32 bit: Save: ... 0xf34ca12e: vmovdqu %xmm0,0xb0(%esp) 0xf34ca137: vmovdqu %xmm1,0xc0(%esp) ... 0xf34ca16d: vmovdqu %xmm7,0x120(%esp) 0xf34ca176: sub $0x80,%esp 0xf34ca17c: vextractf128 $0x1,%ymm0,(%esp) 0xf34ca183: vextractf128 $0x1,%ymm1,0x10(%esp) ... 0xf34ca1b3: vextractf128 $0x1,%ymm7,0x70(%esp) ... Restore: ... 0xf34ca202: vinsertf128 $0x1,(%esp),%ymm0,%ymm0 0xf34ca209: vinsertf128 $0x1,0x10(%esp),%ymm1,%ymm1 ... 0xf34ca239: vinsertf128 $0x1,0x70(%esp),%ymm7,%ymm7 0xf34ca241: add $0x80,%esp 0xf34ca247: vmovdqu 0x130(%esp),%xmm0 0xf34ca250: vmovdqu 0x140(%esp),%xmm1 ... 0xf34ca286: vmovdqu 0x1a0(%esp),%xmm7 ... The stack offsets for the vmovdqu instructions are wrong, causing the XMM registers to contain random values after a safepoint. The problem is that "additional_frame_bytes" is added to the stack offset although the stack pointer is incremented just before: 283 __ addptr(rsp, additional_frame_bytes); // Save upper half of YMM registers The regression test fails with "Test failed: array[0] = 1973.0 but should be 10.000" because the vectorized loop returns a wrong result. I spotted and fixed the following other problems: - the vmovdqu instructions should be emitted before restoring YMM and ZMM because they zero the upper part of the XMM registers (i.e. YMM/ZMM) - if 'UseAVX > 2' is set/available, we save the ZMM registers as well but we do not increment 'additional_frame_words' accordingly (we need another 8*32 bytes of stack space) Unfortunately, I don't have access to a CPU with the AVX-512 instruction set to test the "UseAVX > 2" related changes. Michael, could you verify the changes? The problems were introduced by the fix for JDK-8142980. Thanks, Tobias From dean.long at oracle.com Fri Jan 29 19:19:16 2016 From: dean.long at oracle.com (Dean Long) Date: Fri, 29 Jan 2016 11:19:16 -0800 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56AB0868.2080307@oracle.com> References: <56AA3ED7.4030407@oracle.com> <56AAE63A.4060905@oracle.com> <56AB0868.2080307@oracle.com> Message-ID: <56ABBB34.80002@oracle.com> On 1/28/2016 10:36 PM, Jamsheed C m wrote: > Hi Dean, > > On 1/29/2016 9:40 AM, Dean Long wrote: >> As you noticed, for this kind of bug the memory is going to >> consistent by the time the core file is written. >> So to help debug this assert it if happens again, could you change it >> to something like: >> >> #ifdef ASSERT >> address computed_address = >> SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, >> force_unwind, true); >> vmassert(handler_address == computed_address, PTR_FORMAT " != " >> PTR_FORMAT, p2i(handler_address), p2i(computed_address)); >> #endif > I got handler_address value in this case. This value was inconsistent > with value in ExceptionCache. > It was having initial value and that was helpful in figuring out what > would have went wrong. > In the bug report, you said all data in the core file was consistent, so I'm just wondering where you saw it inconsistent. Just to confirm what was going wrong, you suspect that _count was being updated before the handler? dl > I will make this change. > > Best Regards, > Jamsheed >> >> dl >> >> On 1/28/2016 8:16 AM, Jamsheed C m wrote: >>> Hi, >>> >>> Please review the fix made for issue >>> >>> bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 >>> web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ >>> >>> Unit tests: As its hard, none >>> >>> Other tests: jprt. >>> >>> Description of the issue: >>> A valid pc match in exception cache returning an invalid handler >>> makes assert to fail. >>> This happens as ExceptionCache reads are lock free access. >>> >>> As a fix for this i have put a storestore mem barrier before the >>> count is updated. >>> >>> Best Regards, >>> Jamsheed >> > From vladimir.kozlov at oracle.com Fri Jan 29 19:39:45 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Jan 2016 11:39:45 -0800 Subject: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit In-Reply-To: <56AB7436.7020302@oracle.com> References: <56AB7436.7020302@oracle.com> Message-ID: <56ABC001.6080302@oracle.com> Tobias, please verify that 64-bit code works correctly. About 32-bit code. Please verify correctness of next asserts: assert(UseAVX > 0, "512bit vectors are supported only with EVEX"); assert(MaxVectorSize == 64, "only 512bit vectors are supported now"); Originally we could have vectors even with only 64bit XMM registers. MaxVectorSize and UseAVX can be set on command line - what happens in such case? No vectorization? May be it is done because we save whole 128bit XMM always. Still MaxVectorSize == 64 condition is strange. Thanks, Vladimir On 1/29/16 6:16 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8148490 > http://cr.openjdk.java.net/~thartmann/8148490/webrev.00/ > > RegisterSaver::save_live_registers() and RegisterSaver::restore_live_registers() are used by the safepoint handling code to save and restore registers. The following code is emitted to save and restore XMM/YMM registers on 32 bit: > > Save: > ... > 0xf34ca12e: vmovdqu %xmm0,0xb0(%esp) > 0xf34ca137: vmovdqu %xmm1,0xc0(%esp) > ... > 0xf34ca16d: vmovdqu %xmm7,0x120(%esp) > 0xf34ca176: sub $0x80,%esp > 0xf34ca17c: vextractf128 $0x1,%ymm0,(%esp) > 0xf34ca183: vextractf128 $0x1,%ymm1,0x10(%esp) > ... > 0xf34ca1b3: vextractf128 $0x1,%ymm7,0x70(%esp) > ... > > Restore: > ... > 0xf34ca202: vinsertf128 $0x1,(%esp),%ymm0,%ymm0 > 0xf34ca209: vinsertf128 $0x1,0x10(%esp),%ymm1,%ymm1 > ... > 0xf34ca239: vinsertf128 $0x1,0x70(%esp),%ymm7,%ymm7 > 0xf34ca241: add $0x80,%esp > 0xf34ca247: vmovdqu 0x130(%esp),%xmm0 > 0xf34ca250: vmovdqu 0x140(%esp),%xmm1 > ... > 0xf34ca286: vmovdqu 0x1a0(%esp),%xmm7 > ... > > The stack offsets for the vmovdqu instructions are wrong, causing the XMM registers to contain random values after a safepoint. The problem is that "additional_frame_bytes" is added to the stack offset although the stack pointer is incremented just before: > > 283 __ addptr(rsp, additional_frame_bytes); // Save upper half of YMM registers > > The regression test fails with "Test failed: array[0] = 1973.0 but should be 10.000" because the vectorized loop returns a wrong result. > > I spotted and fixed the following other problems: > - the vmovdqu instructions should be emitted before restoring YMM and ZMM because they zero the upper part of the XMM registers (i.e. YMM/ZMM) > - if 'UseAVX > 2' is set/available, we save the ZMM registers as well but we do not increment 'additional_frame_words' accordingly (we need another 8*32 bytes of stack space) > > Unfortunately, I don't have access to a CPU with the AVX-512 instruction set to test the "UseAVX > 2" related changes. Michael, could you verify the changes? > > The problems were introduced by the fix for JDK-8142980. > > Thanks, > Tobias > From michael.c.berg at intel.com Fri Jan 29 22:28:19 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 29 Jan 2016 22:28:19 +0000 Subject: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit In-Reply-To: <56ABC001.6080302@oracle.com> References: <56AB7436.7020302@oracle.com> <56ABC001.6080302@oracle.com> Message-ID: Tobias/Vladimir: I would change the two asserts to in the 64bit code to make the check clear: assert(UseAVX > 0, "up to 512bit vectors are supported with EVEX"); assert(MaxVectorSize <= 64, "up to 512bit vectors are supported now"); As for testing with the patch applied to hotspot on a current jdk(01-29-16): Windows sde 32-bit: skx - pass, also ran and passed part of specjvm2008 Windows 32-bit: hsw - pass, also ran and passed all of specjvm2008 Windows sde 64-bit: skx - pass, also ran and passed part of specjvm2008 Windows 64-bit: hsw -pass, also ran and passed all of specjvm2008 : caveat Linux on skx: 32-bit - pass, also ran and passed all of specjvm2008 Linux on skx:64-bit - pass, also ran and passed all of specjvm2008 We should proceed with checkin in the changelist after the usual testing. Note: The above tests were done with the asserts changed on windows only. The 64bit changes are mostly cosmetic. It's the change to the additional_frame_bytes that makes it correct, we used equivalent constants in the stack adjustment beforehand, they had not been mapped to the movdqu for the non-vector case for a few iterations on the file. Early on I did have that code though. Caveat: xml.transform fails with the changelist and without, I checked this against a 12-21-15 built jdk which is 1 month old, so we have a new bug that is causing this app to fail as well (on windows for 64bit) on hsw. I checked recent jbs traffic, the occurrence does not appear to be tracked at this time. -Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Friday, January 29, 2016 11:40 AM To: hotspot-compiler-dev at openjdk.java.net Cc: Berg, Michael C Subject: Re: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit Tobias, please verify that 64-bit code works correctly. About 32-bit code. Please verify correctness of next asserts: assert(UseAVX > 0, "512bit vectors are supported only with EVEX"); assert(MaxVectorSize == 64, "only 512bit vectors are supported now"); Originally we could have vectors even with only 64bit XMM registers. MaxVectorSize and UseAVX can be set on command line - what happens in such case? No vectorization? May be it is done because we save whole 128bit XMM always. Still MaxVectorSize == 64 condition is strange. Thanks, Vladimir On 1/29/16 6:16 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8148490 > http://cr.openjdk.java.net/~thartmann/8148490/webrev.00/ > > RegisterSaver::save_live_registers() and RegisterSaver::restore_live_registers() are used by the safepoint handling code to save and restore registers. The following code is emitted to save and restore XMM/YMM registers on 32 bit: > > Save: > ... > 0xf34ca12e: vmovdqu %xmm0,0xb0(%esp) > 0xf34ca137: vmovdqu %xmm1,0xc0(%esp) > ... > 0xf34ca16d: vmovdqu %xmm7,0x120(%esp) > 0xf34ca176: sub $0x80,%esp > 0xf34ca17c: vextractf128 $0x1,%ymm0,(%esp) > 0xf34ca183: vextractf128 $0x1,%ymm1,0x10(%esp) > ... > 0xf34ca1b3: vextractf128 $0x1,%ymm7,0x70(%esp) > ... > > Restore: > ... > 0xf34ca202: vinsertf128 $0x1,(%esp),%ymm0,%ymm0 > 0xf34ca209: vinsertf128 $0x1,0x10(%esp),%ymm1,%ymm1 > ... > 0xf34ca239: vinsertf128 $0x1,0x70(%esp),%ymm7,%ymm7 > 0xf34ca241: add $0x80,%esp > 0xf34ca247: vmovdqu 0x130(%esp),%xmm0 > 0xf34ca250: vmovdqu 0x140(%esp),%xmm1 > ... > 0xf34ca286: vmovdqu 0x1a0(%esp),%xmm7 > ... > > The stack offsets for the vmovdqu instructions are wrong, causing the XMM registers to contain random values after a safepoint. The problem is that "additional_frame_bytes" is added to the stack offset although the stack pointer is incremented just before: > > 283 __ addptr(rsp, additional_frame_bytes); // Save upper half of YMM registers > > The regression test fails with "Test failed: array[0] = 1973.0 but should be 10.000" because the vectorized loop returns a wrong result. > > I spotted and fixed the following other problems: > - the vmovdqu instructions should be emitted before restoring YMM and ZMM because they zero the upper part of the XMM registers (i.e. YMM/ZMM) > - if 'UseAVX > 2' is set/available, we save the ZMM registers as well but we do not increment 'additional_frame_words' accordingly (we need another 8*32 bytes of stack space) > > Unfortunately, I don't have access to a CPU with the AVX-512 instruction set to test the "UseAVX > 2" related changes. Michael, could you verify the changes? > > The problems were introduced by the fix for JDK-8142980. > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Sat Jan 30 01:38:54 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Jan 2016 17:38:54 -0800 Subject: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit In-Reply-To: References: <56AB7436.7020302@oracle.com> <56ABC001.6080302@oracle.com> Message-ID: <56AC142E.6010309@oracle.com> Michael, Thank you for testing changes. Please, file JBS bug for xml.transform problem. Thanks, Vladimir On 1/29/16 2:28 PM, Berg, Michael C wrote: > Tobias/Vladimir: > > I would change the two asserts to in the 64bit code to make the check clear: > > assert(UseAVX > 0, "up to 512bit vectors are supported with EVEX"); > assert(MaxVectorSize <= 64, "up to 512bit vectors are supported now"); > > As for testing with the patch applied to hotspot on a current jdk(01-29-16): > > Windows sde 32-bit: skx - pass, also ran and passed part of specjvm2008 > Windows 32-bit: hsw - pass, also ran and passed all of specjvm2008 > Windows sde 64-bit: skx - pass, also ran and passed part of specjvm2008 > Windows 64-bit: hsw -pass, also ran and passed all of specjvm2008 : caveat > Linux on skx: 32-bit - pass, also ran and passed all of specjvm2008 > Linux on skx:64-bit - pass, also ran and passed all of specjvm2008 > > We should proceed with checkin in the changelist after the usual testing. > > Note: The above tests were done with the asserts changed on windows only. The 64bit changes are mostly cosmetic. It's the change to the additional_frame_bytes that makes it correct, we used > equivalent constants in the stack adjustment beforehand, they had not been mapped to the movdqu for the non-vector case for a few iterations on the file. Early on I did have that code though. > > Caveat: xml.transform fails with the changelist and without, I checked this against a 12-21-15 built jdk which is 1 month old, so we have a new bug that is causing this app to fail as well (on windows for 64bit) on hsw. > I checked recent jbs traffic, the occurrence does not appear to be tracked at this time. > > -Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Friday, January 29, 2016 11:40 AM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Berg, Michael C > Subject: Re: [9] RFR(S): 8148490: RegisterSaver::restore_live_registers() fails to restore xmm registers on 32 bit > > Tobias, please verify that 64-bit code works correctly. > About 32-bit code. > > Please verify correctness of next asserts: > > assert(UseAVX > 0, "512bit vectors are supported only with EVEX"); > assert(MaxVectorSize == 64, "only 512bit vectors are supported now"); > > Originally we could have vectors even with only 64bit XMM registers. MaxVectorSize and UseAVX can be set on command line > - what happens in such case? No vectorization? > > May be it is done because we save whole 128bit XMM always. Still MaxVectorSize == 64 condition is strange. > > Thanks, > Vladimir > > On 1/29/16 6:16 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8148490 >> http://cr.openjdk.java.net/~thartmann/8148490/webrev.00/ >> >> RegisterSaver::save_live_registers() and RegisterSaver::restore_live_registers() are used by the safepoint handling code to save and restore registers. The following code is emitted to save and restore XMM/YMM registers on 32 bit: >> >> Save: >> ... >> 0xf34ca12e: vmovdqu %xmm0,0xb0(%esp) >> 0xf34ca137: vmovdqu %xmm1,0xc0(%esp) >> ... >> 0xf34ca16d: vmovdqu %xmm7,0x120(%esp) >> 0xf34ca176: sub $0x80,%esp >> 0xf34ca17c: vextractf128 $0x1,%ymm0,(%esp) >> 0xf34ca183: vextractf128 $0x1,%ymm1,0x10(%esp) >> ... >> 0xf34ca1b3: vextractf128 $0x1,%ymm7,0x70(%esp) >> ... >> >> Restore: >> ... >> 0xf34ca202: vinsertf128 $0x1,(%esp),%ymm0,%ymm0 >> 0xf34ca209: vinsertf128 $0x1,0x10(%esp),%ymm1,%ymm1 >> ... >> 0xf34ca239: vinsertf128 $0x1,0x70(%esp),%ymm7,%ymm7 >> 0xf34ca241: add $0x80,%esp >> 0xf34ca247: vmovdqu 0x130(%esp),%xmm0 >> 0xf34ca250: vmovdqu 0x140(%esp),%xmm1 >> ... >> 0xf34ca286: vmovdqu 0x1a0(%esp),%xmm7 >> ... >> >> The stack offsets for the vmovdqu instructions are wrong, causing the XMM registers to contain random values after a safepoint. The problem is that "additional_frame_bytes" is added to the stack offset although the stack pointer is incremented just before: >> >> 283 __ addptr(rsp, additional_frame_bytes); // Save upper half of YMM registers >> >> The regression test fails with "Test failed: array[0] = 1973.0 but should be 10.000" because the vectorized loop returns a wrong result. >> >> I spotted and fixed the following other problems: >> - the vmovdqu instructions should be emitted before restoring YMM and ZMM because they zero the upper part of the XMM registers (i.e. YMM/ZMM) >> - if 'UseAVX > 2' is set/available, we save the ZMM registers as well but we do not increment 'additional_frame_words' accordingly (we need another 8*32 bytes of stack space) >> >> Unfortunately, I don't have access to a CPU with the AVX-512 instruction set to test the "UseAVX > 2" related changes. Michael, could you verify the changes? >> >> The problems were introduced by the fix for JDK-8142980. >> >> Thanks, >> Tobias >> From jamsheed.c.m at oracle.com Sat Jan 30 04:08:44 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Sat, 30 Jan 2016 09:38:44 +0530 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56ABBB34.80002@oracle.com> References: <56AA3ED7.4030407@oracle.com> <56AAE63A.4060905@oracle.com> <56AB0868.2080307@oracle.com> <56ABBB34.80002@oracle.com> Message-ID: <56AC374C.9050800@oracle.com> Hi Dean, On 1/30/2016 12:49 AM, Dean Long wrote: > On 1/28/2016 10:36 PM, Jamsheed C m wrote: >> Hi Dean, >> >> On 1/29/2016 9:40 AM, Dean Long wrote: >>> As you noticed, for this kind of bug the memory is going to >>> consistent by the time the core file is written. >>> So to help debug this assert it if happens again, could you change >>> it to something like: >>> >>> #ifdef ASSERT >>> address computed_address = >>> SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, >>> force_unwind, true); >>> vmassert(handler_address == computed_address, PTR_FORMAT " != " >>> PTR_FORMAT, p2i(handler_address), p2i(computed_address)); >>> #endif >> I got handler_address value in this case. This value was inconsistent >> with value in ExceptionCache. >> It was having initial value and that was helpful in figuring out what >> would have went wrong. >> > > In the bug report, you said all data in the core file was consistent, > so I'm just wondering where you saw > it inconsistent. Just to confirm what was going wrong, you suspect > that _count was being updated before the handler? i meant ExceptionCache(heap) and ExecptionHandlerTable(heap) contents were consistent at the time core file was written. handler_address(local variable) had already captured failing value. handler_address(local variable) was inconsistent with ExceptionCache(heap) hanlder_address in core file. there were two failing case. 1) Only one entry in exception cache and failing -here i suspect handler_address in exception cache write code got reordered well below count and and even ExceptioCache pointer update in nm. 2)Two entries in exception cache for an exception and second entry causing failure. - here i suspect handler_address in exception cache write code got reordered below count. These reordering happens in very small window, as this is code is already lock protected ( and has a mem barrier below). Best, Jamsheed > > dl > >> I will make this change. >> >> Best Regards, >> Jamsheed >>> >>> dl >>> >>> On 1/28/2016 8:16 AM, Jamsheed C m wrote: >>>> Hi, >>>> >>>> Please review the fix made for issue >>>> >>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 >>>> web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ >>>> >>>> Unit tests: As its hard, none >>>> >>>> Other tests: jprt. >>>> >>>> Description of the issue: >>>> A valid pc match in exception cache returning an invalid handler >>>> makes assert to fail. >>>> This happens as ExceptionCache reads are lock free access. >>>> >>>> As a fix for this i have put a storestore mem barrier before the >>>> count is updated. >>>> >>>> Best Regards, >>>> Jamsheed >>> >> > From jamsheed.c.m at oracle.com Sat Jan 30 05:19:09 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Sat, 30 Jan 2016 10:49:09 +0530 Subject: RFR(XS): 8143897 :Weblogic12medrec assert(handler_address == SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, force_unwind, true)) failed: Must be the same In-Reply-To: <56AC374C.9050800@oracle.com> References: <56AA3ED7.4030407@oracle.com> <56AAE63A.4060905@oracle.com> <56AB0868.2080307@oracle.com> <56ABBB34.80002@oracle.com> <56AC374C.9050800@oracle.com> Message-ID: <56AC47CD.4040700@oracle.com> On 1/30/2016 9:38 AM, Jamsheed C m wrote: > > Hi Dean, > > On 1/30/2016 12:49 AM, Dean Long wrote: >> On 1/28/2016 10:36 PM, Jamsheed C m wrote: >>> Hi Dean, >>> >>> On 1/29/2016 9:40 AM, Dean Long wrote: >>>> As you noticed, for this kind of bug the memory is going to >>>> consistent by the time the core file is written. >>>> So to help debug this assert it if happens again, could you change >>>> it to something like: >>>> >>>> #ifdef ASSERT >>>> address computed_address = >>>> SharedRuntime::compute_compiled_exc_handler(nm, pc, exception, >>>> force_unwind, true); >>>> vmassert(handler_address == computed_address, PTR_FORMAT " != " >>>> PTR_FORMAT, p2i(handler_address), p2i(computed_address)); >>>> #endif >>> I got handler_address value in this case. This value was >>> inconsistent with value in ExceptionCache. >>> It was having initial value and that was helpful in figuring out >>> what would have went wrong. >>> >> >> In the bug report, you said all data in the core file was consistent, >> so I'm just wondering where you saw >> it inconsistent. Just to confirm what was going wrong, you suspect >> that _count was being updated before the handler? > i meant ExceptionCache(heap) and ExecptionHandlerTable(heap) contents > were consistent at the time core file was written. > handler_address(local variable) had already captured failing value. > handler_address(local variable) was inconsistent with > ExceptionCache(heap) hanlder_address in core file. > > there were two failing case. > 1) Only one entry in exception cache and failing > > -here i suspect handler_address in exception cache write code > got reordered well below count and and even ExceptioCache pointer > update in nm. > 2)Two entries in exception cache for an exception and second entry > causing failure. > > - here i suspect handler_address in exception cache write code > got reordered below count. > > These reordering happens in very small window, as this code is already > lock protected ( and has a mem barrier below). i have removed the ambiguity in the bug report. Best Regards, Jamsheed > > Best, > Jamsheed > >> >> dl >> >>> I will make this change. >>> >>> Best Regards, >>> Jamsheed >>>> >>>> dl >>>> >>>> On 1/28/2016 8:16 AM, Jamsheed C m wrote: >>>>> Hi, >>>>> >>>>> Please review the fix made for issue >>>>> >>>>> bug url: https://bugs.openjdk.java.net/browse/JDK-8143897 >>>>> web rev: http://cr.openjdk.java.net/~thartmann/8143897/webrev.00/ >>>>> >>>>> Unit tests: As its hard, none >>>>> >>>>> Other tests: jprt. >>>>> >>>>> Description of the issue: >>>>> A valid pc match in exception cache returning an invalid handler >>>>> makes assert to fail. >>>>> This happens as ExceptionCache reads are lock free access. >>>>> >>>>> As a fix for this i have put a storestore mem barrier before the >>>>> count is updated. >>>>> >>>>> Best Regards, >>>>> Jamsheed >>>> >>> >> > From christian.thalinger at oracle.com Sun Jan 31 13:48:04 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Sun, 31 Jan 2016 14:48:04 +0100 Subject: RFR: 8148507: [JVMCI] mitigate deadlocks related to JVMCI compiler under -Xbatch In-Reply-To: <845F1D56-3194-49AE-95C1-79545F8C50AC@oracle.com> References: <845F1D56-3194-49AE-95C1-79545F8C50AC@oracle.com> Message-ID: Looks good. > On Jan 29, 2016, at 4:34 PM, Doug Simon wrote: > > Please review this small change to further mitigate deadlocks that can be caused by JVMCI when BackgroundCompilation is disabled. > > https://bugs.openjdk.java.net/browse/JDK-8148507 > http://cr.openjdk.java.net/~dnsimon/8148507 > > -Doug