From aph at redhat.com Wed Jul 1 11:57:49 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 01 Jul 2015 12:57:49 +0100 Subject: RFR: 8130150: RSA Acceleration In-Reply-To: <5592D6AA.8020509@redhat.com> References: <557ABD2E.7050608@redhat.com> <557EFF94.5000006@oracle.com> <557F042D.4060707@redhat.com> <558033C4.8040104@redhat.com> <5582F936.5020008@oracle.com> <5582FACA.4060103@redhat.com> <5582FDCA.8010507@oracle.com> <55831BC8.9060001@oracle.com> <5583D414.5050502@redhat.com> <558D7D02.6070303@redhat.com> <559103D8.1010302@oracle.com> <559110BF.4090804@redhat.com> <5592D6AA.8020509@redhat.com> Message-ID: <5593D5BD.8000809@redhat.com> On 06/30/2015 06:49 PM, Andrew Haley wrote: > New webrevs: > > http://cr.openjdk.java.net/~aph/8130150-jdk > http://cr.openjdk.java.net/~aph/8130150-hs/ I made an error when preparing these webrevs. Please ignore. Andrew. From aph at redhat.com Wed Jul 1 14:56:40 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 01 Jul 2015 15:56:40 +0100 Subject: RFR: 8130150: RSA Acceleration In-Reply-To: <5593D5BD.8000809@redhat.com> References: <557ABD2E.7050608@redhat.com> <557EFF94.5000006@oracle.com> <557F042D.4060707@redhat.com> <558033C4.8040104@redhat.com> <5582F936.5020008@oracle.com> <5582FACA.4060103@redhat.com> <5582FDCA.8010507@oracle.com> <55831BC8.9060001@oracle.com> <5583D414.5050502@redhat.com> <558D7D02.6070303@redhat.com> <559103D8.1010302@oracle.com> <559110BF.4090804@redhat.com> <5592D6AA.8020509@redhat.com> <5593D5BD.8000809@redhat.com> Message-ID: <5593FFA8.8090506@redhat.com> Sorry for the mistake. New webrevs: http://cr.openjdk.java.net/~aph/8130150-hs-1/ http://cr.openjdk.java.net/~aph/8130150-jdk-1/ Andrew. From zoltan.majo at oracle.com Wed Jul 1 15:22:59 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 01 Jul 2015 17:22:59 +0200 Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms Message-ID: <559405D3.9070900@oracle.com> Hi, please review the patch for JDK-8130120. Bug: https://bugs.openjdk.java.net/browse/JDK-8130120 Problem: Currently, the JVM prints different warning messages when SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints "SHA intrinsics are not available on this CPU" and x86 prints "SHA instructions are not available on this CPU"). Also, there are flag combinations that result in a warning on some platforms but not on other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a warning on x86 but it does not on aarch64 and on sparc). Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same way on x86, aarch64, and sparc. Change warning messages to be consistent among the previously mentioned platforms and also to better match the flag's description. Update the tests in test/compiler/intrinsics/sha to match the new functionality. Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/ Testing: - full JPRT run (includes the updated tests that were executed on x86 and sparc), all tests pass; - locally executed the test/compiler/intrinsics/sha tests on aarch64; all tests pass. Thank you and best regards, Zoltan From vladimir.kozlov at oracle.com Wed Jul 1 18:47:37 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 01 Jul 2015 11:47:37 -0700 Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms In-Reply-To: <559405D3.9070900@oracle.com> References: <559405D3.9070900@oracle.com> Message-ID: <559435C9.3040508@oracle.com> Looks good but I would keep "on this CPU" at the end of messages to clear indicate that it is due to instructions are not available. Thanks, Vladimir On 7/1/15 8:22 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for JDK-8130120. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8130120 > > Problem: Currently, the JVM prints different warning messages when > SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints > "SHA intrinsics are not available on this CPU" and x86 prints "SHA > instructions are not available on this CPU"). Also, there are flag > combinations that result in a warning on some platforms but not on other > platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a warning on > x86 but it does not on aarch64 and on sparc). > > Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, > UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same way > on x86, aarch64, and sparc. Change warning messages to be consistent > among the previously mentioned platforms and also to better match the > flag's description. Update the tests in test/compiler/intrinsics/sha to > match the new functionality. > > Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/ > > Testing: > - full JPRT run (includes the updated tests that were executed on x86 > and sparc), all tests pass; > - locally executed the test/compiler/intrinsics/sha tests on aarch64; > all tests pass. > > Thank you and best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Wed Jul 1 18:56:10 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 01 Jul 2015 11:56:10 -0700 Subject: RFR: 8130150: RSA Acceleration In-Reply-To: <5593FFA8.8090506@redhat.com> References: <557ABD2E.7050608@redhat.com> <557EFF94.5000006@oracle.com> <557F042D.4060707@redhat.com> <558033C4.8040104@redhat.com> <5582F936.5020008@oracle.com> <5582FACA.4060103@redhat.com> <5582FDCA.8010507@oracle.com> <55831BC8.9060001@oracle.com> <5583D414.5050502@redhat.com> <558D7D02.6070303@redhat.com> <559103D8.1010302@oracle.com> <559110BF.4090804@redhat.com> <5592D6AA.8020509@redhat.com> <5593D5BD.8000809@redhat.com> <5593FFA8.8090506@redhat.com> Message-ID: <559437CA.2070303@oracle.com> Looks good to me. Thanks, Vladimir On 7/1/15 7:56 AM, Andrew Haley wrote: > Sorry for the mistake. New webrevs: > > http://cr.openjdk.java.net/~aph/8130150-hs-1/ > http://cr.openjdk.java.net/~aph/8130150-jdk-1/ > > Andrew. > From michael.c.berg at intel.com Wed Jul 1 22:57:18 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 1 Jul 2015 22:57:18 +0000 Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms In-Reply-To: <559435C9.3040508@oracle.com> References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com> Message-ID: Looks good, once Vladimir's note is added. Thanks, -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Wednesday, July 01, 2015 11:48 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms Looks good but I would keep "on this CPU" at the end of messages to clear indicate that it is due to instructions are not available. Thanks, Vladimir On 7/1/15 8:22 AM, Zolt?n Maj? wrote: > Hi, > > > please review the patch for JDK-8130120. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8130120 > > Problem: Currently, the JVM prints different warning messages when > SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints > "SHA intrinsics are not available on this CPU" and x86 prints "SHA > instructions are not available on this CPU"). Also, there are flag > combinations that result in a warning on some platforms but not on > other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a > warning on > x86 but it does not on aarch64 and on sparc). > > Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, > UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same > way on x86, aarch64, and sparc. Change warning messages to be > consistent among the previously mentioned platforms and also to better > match the flag's description. Update the tests in > test/compiler/intrinsics/sha to match the new functionality. > > Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/ > > Testing: > - full JPRT run (includes the updated tests that were executed on x86 > and sparc), all tests pass; > - locally executed the test/compiler/intrinsics/sha tests on aarch64; > all tests pass. > > Thank you and best regards, > > > Zoltan > From zoltan.majo at oracle.com Thu Jul 2 12:17:00 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 02 Jul 2015 14:17:00 +0200 Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms In-Reply-To: References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com> Message-ID: <55952BBC.4090605@oracle.com> Thank you, Vladimir and Michael, for the feedback! Here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.01/ All JPRT tests pass. I plan to push the newest webrev (webrev.01) on Friday (July 3) if no other issues come up by then. Thank you and best regards, Zoltan On 07/02/2015 12:57 AM, Berg, Michael C wrote: > Looks good, once Vladimir's note is added. > > Thanks, > -Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Wednesday, July 01, 2015 11:48 AM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms > > Looks good but I would keep "on this CPU" at the end of messages to clear indicate that it is due to instructions are not available. > > Thanks, > Vladimir > > On 7/1/15 8:22 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the patch for JDK-8130120. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120 >> >> Problem: Currently, the JVM prints different warning messages when >> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints >> "SHA intrinsics are not available on this CPU" and x86 prints "SHA >> instructions are not available on this CPU"). Also, there are flag >> combinations that result in a warning on some platforms but not on >> other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a >> warning on >> x86 but it does not on aarch64 and on sparc). >> >> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, >> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same >> way on x86, aarch64, and sparc. Change warning messages to be >> consistent among the previously mentioned platforms and also to better >> match the flag's description. Update the tests in >> test/compiler/intrinsics/sha to match the new functionality. >> >> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/ >> >> Testing: >> - full JPRT run (includes the updated tests that were executed on x86 >> and sparc), all tests pass; >> - locally executed the test/compiler/intrinsics/sha tests on aarch64; >> all tests pass. >> >> Thank you and best regards, >> >> >> Zoltan >> From vladimir.kozlov at oracle.com Thu Jul 2 15:04:32 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 02 Jul 2015 08:04:32 -0700 Subject: hg: jdk9/hs-comp/hotspot: 2 new changesets In-Reply-To: <201507021037.t62AbE9c014701@aojmv0008.oracle.com> References: <201507021037.t62AbE9c014701@aojmv0008.oracle.com> Message-ID: <55955300.7000304@oracle.com> Andrew, Did someone sponsored this push for you? It has shared code changes - it should go through JPRT. Thanks, Vladimir On 7/2/15 3:37 AM, aph at redhat.com wrote: > Changeset: 9fcbb6768a78 > Author: aph > Date: 2015-06-16 17:31 +0100 > URL: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/9fcbb6768a78 > > 8130150: Implement BigInteger.montgomeryMultiply intrinsic > Summary: Add montgomeryMultiply intrinsics > Reviewed-by: kvn > > ! src/cpu/x86/vm/sharedRuntime_x86_64.cpp > ! src/cpu/x86/vm/stubGenerator_x86_64.cpp > ! src/cpu/x86/vm/vm_version_x86.cpp > ! src/share/vm/classfile/vmSymbols.hpp > ! src/share/vm/opto/c2_globals.hpp > ! src/share/vm/opto/escape.cpp > ! src/share/vm/opto/library_call.cpp > ! src/share/vm/opto/runtime.cpp > ! src/share/vm/opto/runtime.hpp > ! src/share/vm/runtime/sharedRuntime.hpp > ! src/share/vm/runtime/stubRoutines.cpp > ! src/share/vm/runtime/stubRoutines.hpp > + test/compiler/intrinsics/montgomerymultiply/MontgomeryMultiplyTest.java > > Changeset: d30647171e49 > Author: aph > Date: 2015-07-02 11:12 +0100 > URL: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/d30647171e49 > > Merge > > ! src/cpu/x86/vm/sharedRuntime_x86_64.cpp > ! src/cpu/x86/vm/stubGenerator_x86_64.cpp > ! src/cpu/x86/vm/vm_version_x86.cpp > ! src/share/vm/classfile/vmSymbols.hpp > ! src/share/vm/opto/c2_globals.hpp > ! src/share/vm/opto/escape.cpp > ! src/share/vm/opto/library_call.cpp > ! src/share/vm/opto/runtime.cpp > ! src/share/vm/opto/runtime.hpp > ! src/share/vm/runtime/stubRoutines.cpp > ! src/share/vm/runtime/stubRoutines.hpp > - test/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForSupportedSparcCPU.java > - test/compiler/intrinsics/sha/cli/testcases/UseSHAIntrinsicsSpecificTestCaseForUnsupportedSparcCPU.java > - test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForSupportedSparcCPU.java > - test/compiler/intrinsics/sha/cli/testcases/UseSHASpecificTestCaseForUnsupportedSparcCPU.java > From vladimir.kozlov at oracle.com Thu Jul 2 15:09:16 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 02 Jul 2015 08:09:16 -0700 Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms In-Reply-To: <55952BBC.4090605@oracle.com> References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com> <55952BBC.4090605@oracle.com> Message-ID: <5595541C.1030402@oracle.com> Looks good. Thanks, Vladimir On 7/2/15 5:17 AM, Zolt?n Maj? wrote: > Thank you, Vladimir and Michael, for the feedback! > > Here is the updated webrev: > http://cr.openjdk.java.net/~zmajo/8130120/webrev.01/ > > All JPRT tests pass. > > I plan to push the newest webrev (webrev.01) on Friday (July 3) if no other issues come up by then. > > Thank you and best regards, > > > Zoltan > > > On 07/02/2015 12:57 AM, Berg, Michael C wrote: >> Looks good, once Vladimir's note is added. >> >> Thanks, >> -Michael >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov >> Sent: Wednesday, July 01, 2015 11:48 AM >> To: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms >> >> Looks good but I would keep "on this CPU" at the end of messages to clear indicate that it is due to instructions are >> not available. >> >> Thanks, >> Vladimir >> >> On 7/1/15 8:22 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the patch for JDK-8130120. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120 >>> >>> Problem: Currently, the JVM prints different warning messages when >>> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints >>> "SHA intrinsics are not available on this CPU" and x86 prints "SHA >>> instructions are not available on this CPU"). Also, there are flag >>> combinations that result in a warning on some platforms but not on >>> other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a >>> warning on >>> x86 but it does not on aarch64 and on sparc). >>> >>> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, >>> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same >>> way on x86, aarch64, and sparc. Change warning messages to be >>> consistent among the previously mentioned platforms and also to better >>> match the flag's description. Update the tests in >>> test/compiler/intrinsics/sha to match the new functionality. >>> >>> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/ >>> >>> Testing: >>> - full JPRT run (includes the updated tests that were executed on x86 >>> and sparc), all tests pass; >>> - locally executed the test/compiler/intrinsics/sha tests on aarch64; >>> all tests pass. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> > From zoltan.majo at oracle.com Thu Jul 2 15:24:22 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Thu, 02 Jul 2015 17:24:22 +0200 Subject: [9] RFR(S): 8130120: Handling of SHA intrinsics inconsistent across platforms In-Reply-To: <5595541C.1030402@oracle.com> References: <559405D3.9070900@oracle.com> <559435C9.3040508@oracle.com> <55952BBC.4090605@oracle.com> <5595541C.1030402@oracle.com> Message-ID: <559557A6.3010207@oracle.com> Thank you, Vladimir! Best regards, Zoltan On 07/02/2015 05:09 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/2/15 5:17 AM, Zolt?n Maj? wrote: >> Thank you, Vladimir and Michael, for the feedback! >> >> Here is the updated webrev: >> http://cr.openjdk.java.net/~zmajo/8130120/webrev.01/ >> >> All JPRT tests pass. >> >> I plan to push the newest webrev (webrev.01) on Friday (July 3) if no >> other issues come up by then. >> >> Thank you and best regards, >> >> >> Zoltan >> >> >> On 07/02/2015 12:57 AM, Berg, Michael C wrote: >>> Looks good, once Vladimir's note is added. >>> >>> Thanks, >>> -Michael >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >>> Vladimir Kozlov >>> Sent: Wednesday, July 01, 2015 11:48 AM >>> To: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: [9] RFR(S): 8130120: Handling of SHA intrinsics >>> inconsistent across platforms >>> >>> Looks good but I would keep "on this CPU" at the end of messages to >>> clear indicate that it is due to instructions are >>> not available. >>> >>> Thanks, >>> Vladimir >>> >>> On 7/1/15 8:22 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the patch for JDK-8130120. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130120 >>>> >>>> Problem: Currently, the JVM prints different warning messages when >>>> SHA-based intrinsics are attempted to be enabled (e.g., aarch64 prints >>>> "SHA intrinsics are not available on this CPU" and x86 prints "SHA >>>> instructions are not available on this CPU"). Also, there are flag >>>> combinations that result in a warning on some platforms but not on >>>> other platforms (e.g., -XX:-UseSHA -XX:+UseSHA1Intrinsics prints a >>>> warning on >>>> x86 but it does not on aarch64 and on sparc). >>>> >>>> Solution: Change the handling of the UseSHA, UseSHA1Intrinsics, >>>> UseSHA256Intrinsics, and UseSHA512Intrinsics flags to work the same >>>> way on x86, aarch64, and sparc. Change warning messages to be >>>> consistent among the previously mentioned platforms and also to better >>>> match the flag's description. Update the tests in >>>> test/compiler/intrinsics/sha to match the new functionality. >>>> >>>> Webrev: http://cr.openjdk.java.net/~zmajo/8130120/webrev.00/ >>>> >>>> Testing: >>>> - full JPRT run (includes the updated tests that were executed on x86 >>>> and sparc), all tests pass; >>>> - locally executed the test/compiler/intrinsics/sha tests on aarch64; >>>> all tests pass. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> From vladimir.kozlov at oracle.com Thu Jul 2 21:34:55 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 02 Jul 2015 14:34:55 -0700 Subject: [9] RFR (S) 8080012: JVM times out with vdbench on SPARC M7-16 Message-ID: <5595AE7F.5040903@oracle.com> The author is Igor Veresov. http://cr.openjdk.java.net/~kvn/8080012/webrev/ On big SPARC machines (>1k cores) request for cacheline in PICL takes a lot of time since we asking it for each core. PICL daemon can't handle such stream of requests especially when several JVMs start at the same time (vdbench test). We were told that on sun4v machines (SPARC-T and -M series) you can't have different chips with different cache line size. So we can ask only one core. I reviewed the fix. The fix was tested on machine which showed the problem. Thanks, Vladimir From serkan at hazelcast.com Sat Jul 4 18:06:41 2015 From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=) Date: Sat, 4 Jul 2015 21:06:41 +0300 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: Message-ID: Hi, I have added some logs to show that problem is caused by double scaling of offset (index) Here is my updated (log messages added) reproducer code: int count = 100000; long size = count * 8L; long baseAddress = unsafe.allocateMemory(size); System.out.println("Start address: " + Long.toHexString(baseAddress) + ", End address: " + Long.toHexString(baseAddress + size)); for (int i = 0; i < count; i++) { long address = baseAddress + (i * 8L); System.out.println( "Normal: " + Long.toHexString(address) + ", " + "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * 8L))); long expected = i; unsafe.putLong(address, expected); unsafe.getLong(address); } After sometime it crashes as ... Current thread (0x0000000002068800): JavaThread "main" [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)] siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 ... ... And here is output of the execution until crash: Start address: 58bbcfa0, End address: 58c804a0 Normal: 58bbcfa0, If double scaled: 58bbcfa0 Normal: 58bbcfa8, If double scaled: 58bbcfe0 Normal: 58bbcfb0, If double scaled: 58bbd020 ... ... Normal: 58c517b0, If double scaled: 59061020 As seen from the logs and crash dump, double scaled version of target address (*If double scaled: 59061020*) is the same with the problematic address (*siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020*) that causes to crash while accessing it. So I think, it is obvious that the crash is caused by wrong optimization of index value since index is scaled two times (for *Unsafe::put* and *Unsafe::get*) instead of only one time. Then double scaled index points to invalid memory address. Regards. On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal wrote: > Hi all, > > I had dived into the issue with JDK-HotSpot commits and > the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a > > Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: > void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { > if (OptimizeUnsafes) do_UnsafeRawOp(x); > tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > > void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { > if (OptimizeUnsafes) do_UnsafeRawOp(x); > tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > > > So I run the test by calculating address as > - *"int * long"* (int is index and long is 8l) > - *"long * long"* (the first long is index and the second long is 8l) > - *"int * int"* (the first int is index and the second int is 8) > > Here are the logs: > *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 > *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 > *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 > > As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. > One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to > same *"base"* and *"index"* instructions. > This means that address is scaled one more time because there should be only one scale. > > > When I debugged the non-problematic run (*"int * int"*), > I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always. > So there is no scaling. > static bool match_index_and_scale(Instruction* instr, > Instruction** index, > int* log2_scale) { > ... > > ArithmeticOp* arith = instr->as_ArithmeticOp(); > if (arith != NULL) { > ... > } > > return false; > } > > > Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: > void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { > Instruction* base = NULL; > Instruction* index = NULL; > int log2_scale; > > if (match(x, &base, &index, &log2_scale)) { > x->set_base(base); > x->set_index(index); // The fix attempt here // ///////////////////////////// > if (index != NULL) { > if (index->is_pinned()) { > log2_scale = 0; > } else { > if (log2_scale != 0) { > index->pin(); > } > } > } // ///////////////////////////// > x->set_log2_scale(log2_scale); > if (PrintUnsafeOptimization) { > tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > } > } > In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction > and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. > > After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs: > *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 > > I am not sure my fix attempt is a really fix or maybe there are better fixes. > > Regards. > > -- > > Serkan ?ZAL > > >> Btw, (thanks to one my colleagues), when address calculation in the loop is >> converted to >> long address = baseAddress + (i * 8) >> test passes. Only difference is next long pointer is calculated using >> integer 8 instead of long 8. >> ``` >> for (int i = 0; i < count; i++) { >> long address = baseAddress + (i * 8); // <--- here, integer 8 instead >> of long 8 >> long expected = i; >> unsafe.putLong(address, expected); >> long actual = unsafe.getLong(address); >> if (expected != actual) { >> throw new AssertionError("Expected: " + expected + ", Actual: " + >> actual); >> } >> } >> ``` >> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: >> >* Hi all, >> *> >> >* While I was testing my app using java 8, I encountered the previously >> *>* reported sun.misc.Unsafe issue. >> *> >> >* https://bugs.openjdk.java.net/browse/JDK-8076445 >> *> >> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >> *> >> >* Issue status says it's resolved with resolution "Cannot Reproduce". But >> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and >> *>* "1.9.0-ea-b67". >> *> >> >* Test is very simple: >> *> >> >* ``` >> *>* public static void main(String[] args) throws Exception { >> *>* Unsafe unsafe = findUnsafe(); >> *>* // 10000 pass >> *>* // 100000 jvm crash >> *>* // 1000000 fail >> *>* int count = 100000; >> *>* long size = count * 8L; >> *>* long baseAddress = unsafe.allocateMemory(size); >> *> >> >* try { >> *>* for (int i = 0; i < count; i++) { >> *>* long address = baseAddress + (i * 8L); >> *> >> >* long expected = i; >> *>* unsafe.putLong(address, expected); >> *> >> >* long actual = unsafe.getLong(address); >> *> >> >* if (expected != actual) { >> *>* throw new AssertionError("Expected: " + expected + ", >> *>* Actual: " + actual); >> *>* } >> *>* } >> *>* } finally { >> *>* unsafe.freeMemory(baseAddress); >> *>* } >> *>* } >> *>* ``` >> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is >> *>* failing constantly. >> *> >> >* - With iteration count 10000, test is passing. >> *>* - With iteration count 100000, jvm is crashing with SIGSEGV. >> *>* - With iteration count 1000000, test is failing with AssertionError. >> *> >> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or >> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not >> *>* failing at all. >> *> >> >* I tested on platforms: >> *>* - Centos-7/openjdk-1.8.0.45 >> *>* - OSX/oraclejdk-1.8.0.40 >> *>* - OSX/oraclejdk-1.8.0.45 >> *>* - OSX/oraclejdk-1.8.0_60-ea-b18 >> *>* - OSX/oraclejdk-1.9.0-ea-b67 >> *> >> >* Previous issue comment ( >> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) >> *>* says "Cannot reproduce based on the latest version". I hope that latest >> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because >> *>* both are failing. >> *> >> >* I'm looking forward to hearing from you. >> *> >> >* Thanks, >> *>* -Mehmet Dogan- >> *>* -- >> *> >> >* @mmdogan >> *> > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > -- Serkan ?ZAL Remotest Software Engineer GSM: +90 542 680 39 18 Twitter: @serkan_ozal -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Jul 6 18:59:11 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 06 Jul 2015 11:59:11 -0700 Subject: [8u60] backport (S) 8080012: JVM times out with vdbench on SPARC M7-16 Message-ID: <559ACFFF.4000107@oracle.com> I would like to backport this to 8u60 (through 8u). Patch was applied cleanly. Backport was approved by release team. https://bugs.openjdk.java.net/browse/JDK-8080012 jdk9 webrev: http://cr.openjdk.java.net/~kvn/8080012/webrev/ On big SPARC machines (>1k cores) request for cacheline in PICL takes a lot of time since we asking it for each core. PICL daemon can't handle such stream of requests especially when several JVMs start at the same time (vdbench test). We were told that on sun4v machines (SPARC-T and -M series) you can't have different chips with different cache line size. So we can ask only one core. Thanks, Vladimir From igor.veresov at oracle.com Mon Jul 6 19:09:28 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 6 Jul 2015 12:09:28 -0700 Subject: [8u60] backport (S) 8080012: JVM times out with vdbench on SPARC M7-16 In-Reply-To: <559ACFFF.4000107@oracle.com> References: <559ACFFF.4000107@oracle.com> Message-ID: <55E2BF6D-2443-42F4-8A05-8E2BD529769F@oracle.com> Good. Thanks! igor > On Jul 6, 2015, at 11:59 AM, Vladimir Kozlov wrote: > > I would like to backport this to 8u60 (through 8u). Patch was applied cleanly. Backport was approved by release team. > > https://bugs.openjdk.java.net/browse/JDK-8080012 > jdk9 webrev: http://cr.openjdk.java.net/~kvn/8080012/webrev/ > > On big SPARC machines (>1k cores) request for cacheline in PICL takes a lot of time since we asking it for each core. PICL daemon can't handle such stream of requests especially when several JVMs start at the same time (vdbench test). > > We were told that on sun4v machines (SPARC-T and -M series) you can't have different chips with different cache line size. So we can ask only one core. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Mon Jul 6 19:17:12 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 06 Jul 2015 12:17:12 -0700 Subject: [8u60] backport (S) 8080012: JVM times out with vdbench on SPARC M7-16 In-Reply-To: <55E2BF6D-2443-42F4-8A05-8E2BD529769F@oracle.com> References: <559ACFFF.4000107@oracle.com> <55E2BF6D-2443-42F4-8A05-8E2BD529769F@oracle.com> Message-ID: <559AD438.9090309@oracle.com> Thank you, Igor. Looks like Poonam is pushing it already. So everything is good. Thanks, Vladimir On 7/6/15 12:09 PM, Igor Veresov wrote: > Good. Thanks! > > igor > >> On Jul 6, 2015, at 11:59 AM, Vladimir Kozlov wrote: >> >> I would like to backport this to 8u60 (through 8u). Patch was applied cleanly. Backport was approved by release team. >> >> https://bugs.openjdk.java.net/browse/JDK-8080012 >> jdk9 webrev: http://cr.openjdk.java.net/~kvn/8080012/webrev/ >> >> On big SPARC machines (>1k cores) request for cacheline in PICL takes a lot of time since we asking it for each core. PICL daemon can't handle such stream of requests especially when several JVMs start at the same time (vdbench test). >> >> We were told that on sun4v machines (SPARC-T and -M series) you can't have different chips with different cache line size. So we can ask only one core. >> >> Thanks, >> Vladimir > From vladimir.x.ivanov at oracle.com Tue Jul 7 14:29:01 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 07 Jul 2015 17:29:01 +0300 Subject: [9] RFR (M): VM should constant fold Unsafe.get*() loads from final fields In-Reply-To: <558AA9A1.8070603@oracle.com> References: <5581A26C.6090303@oracle.com> <810DE23B-6616-4465-B91D-4CD9A8FB267D@oracle.com> <558AA9A1.8070603@oracle.com> Message-ID: <559BE22D.9030206@oracle.com> Any volunteers to review updated version? Thanks! Best regards, Vladimir Ivanov On 6/24/15 3:59 PM, Vladimir Ivanov wrote: > John, Paul, thanks for review! > > Updated webrev: > http://cr.openjdk.java.net/~vlivanov/8078629/webrev.01/ > > I spotted a bug when field and accessor types mismatch, but the JIT > still constant-folds the load. The fix made expected result detection > even more complex, so I decided to get rid of it & WhiteBox hooks > altogether. The test exercises different code paths and compares > returned values now. > >> WB.isCompileConstant is a nice little thing. We should consider using >> it in java.lang.invoke >> to gate aggressive object-folding optimizations. That's one reason to >> consider putting it >> somewhere more central that WB. I can't propose a good place yet. >> (Unsafe is not quite right.) > Actually, there's already j.l.i.MethodHandleImpl.isCompileConstant. > Probably, compiler-specific interface is the right place for such > things. But, as I wrote before, I decided to avoid WB hooks. > >> The gating logic in library_call includes this extra term: && >> alias_type->field()->is_constant() >> Why not just drop it and let make_constant do the test (which it does)? > I wanted to stress that make_constant depends on whether the field is > constant or not. I failed to come up with a better method name > (try_make_constant? make_constant_attempt), so I decided to keep the > extra condition. > >> You have some lines with "/*require_const=*/" in two places; that >> can't be right. >> This is the result of functions with too many misc. arguments to keep >> track of. >> I don't have the code under my fingers, so I'm just guessing, but here >> are more suggestions: > Thanks! I tried to address all your suggestions in updated version. > > Best regards, > Vladimir Ivanov > >> >> I wish the is_autobox_cache condition could be more localized. Could >> we omit the boolean >> flag (almost always false), and where it is true, post-process the >> node? Would that make >> the code simpler? >> >> This leads me to notice that make_constant is not related strongly to >> GraphKit; it is really >> a call to the Type and CI modules to look for a singleton type, ending >> with either a NULL >> or a call to GraphKit::makecon. So you might consider changing Node* >> GK::make_constant >> to const Type* Type::make_constant. >> >> Now to pick at the argument salad we have in push_constant: The >> effect of is_autobox_cache >> could be transferred to a method Type[Ary]::cast_to_autobox_cache(true). >> And the effect of stable_type on make_constant(ciCon,bool,bool,Type*), >> could also be factored out, as post-processing step >> contype=contype->Type::join(stabletype). > > From edward.nevill at gmail.com Tue Jul 7 16:06:08 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 07 Jul 2015 17:06:08 +0100 Subject: RFR: 8130687: aarch64: add support for hardware crc32c Message-ID: <1436285168.1592.14.camel@mylittlepony.linaroharston> Hi, http://cr.openjdk.java.net/~enevill/8130687/webrev/hotspot.changeset adds support for crc32c on aarch64. This has previously been added for Sparc (see https://bugs.openjdk.java.net/browse/JDK-8073583) Performance measurements shows the throughput goes from ~620MB/s to 2938 MB/s == approx 4.7x performance improvement. Tested before and after with jtreg hotspot. In both cases Test results: passed: 867; failed: 3; error: 7 Please review. Thanks, Ed. From aph at redhat.com Tue Jul 7 16:08:25 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 07 Jul 2015 17:08:25 +0100 Subject: RFR: 8130687: aarch64: add support for hardware crc32c In-Reply-To: <1436285168.1592.14.camel@mylittlepony.linaroharston> References: <1436285168.1592.14.camel@mylittlepony.linaroharston> Message-ID: <559BF979.3010204@redhat.com> On 07/07/2015 05:06 PM, Edward Nevill wrote: > Please review. Looks good. Andrew. From vladimir.kozlov at oracle.com Tue Jul 7 17:33:16 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 07 Jul 2015 10:33:16 -0700 Subject: RFR: 8130687: aarch64: add support for hardware crc32c In-Reply-To: <1436285168.1592.14.camel@mylittlepony.linaroharston> References: <1436285168.1592.14.camel@mylittlepony.linaroharston> Message-ID: <559C0D5C.70106@oracle.com> Looks good. Thanks, Vladimir On 7/7/15 9:06 AM, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8130687/webrev/hotspot.changeset > > adds support for crc32c on aarch64. This has previously been added for Sparc (see https://bugs.openjdk.java.net/browse/JDK-8073583) > > Performance measurements shows the throughput goes from ~620MB/s to 2938 MB/s == approx 4.7x performance improvement. > > Tested before and after with jtreg hotspot. In both cases > > Test results: passed: 867; failed: 3; error: 7 > > Please review. > > Thanks, > Ed. > > From vladimir.kozlov at oracle.com Tue Jul 7 17:48:48 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 07 Jul 2015 10:48:48 -0700 Subject: [9] RFR (M): VM should constant fold Unsafe.get*() loads from final fields In-Reply-To: <559BE22D.9030206@oracle.com> References: <5581A26C.6090303@oracle.com> <810DE23B-6616-4465-B91D-4CD9A8FB267D@oracle.com> <558AA9A1.8070603@oracle.com> <559BE22D.9030206@oracle.com> Message-ID: <559C1100.7000503@oracle.com> Looks reasonable to me. graphKit.* files are listed without changes. Thanks, Vladimir K On 7/7/15 7:29 AM, Vladimir Ivanov wrote: > Any volunteers to review updated version? Thanks! > > Best regards, > Vladimir Ivanov > > On 6/24/15 3:59 PM, Vladimir Ivanov wrote: >> John, Paul, thanks for review! >> >> Updated webrev: >> http://cr.openjdk.java.net/~vlivanov/8078629/webrev.01/ >> >> I spotted a bug when field and accessor types mismatch, but the JIT >> still constant-folds the load. The fix made expected result detection >> even more complex, so I decided to get rid of it & WhiteBox hooks >> altogether. The test exercises different code paths and compares >> returned values now. >> >>> WB.isCompileConstant is a nice little thing. We should consider using >>> it in java.lang.invoke >>> to gate aggressive object-folding optimizations. That's one reason to >>> consider putting it >>> somewhere more central that WB. I can't propose a good place yet. >>> (Unsafe is not quite right.) >> Actually, there's already j.l.i.MethodHandleImpl.isCompileConstant. >> Probably, compiler-specific interface is the right place for such >> things. But, as I wrote before, I decided to avoid WB hooks. >> >>> The gating logic in library_call includes this extra term: && >>> alias_type->field()->is_constant() >>> Why not just drop it and let make_constant do the test (which it does)? >> I wanted to stress that make_constant depends on whether the field is >> constant or not. I failed to come up with a better method name >> (try_make_constant? make_constant_attempt), so I decided to keep the >> extra condition. >> >>> You have some lines with "/*require_const=*/" in two places; that >>> can't be right. >>> This is the result of functions with too many misc. arguments to keep >>> track of. >>> I don't have the code under my fingers, so I'm just guessing, but here >>> are more suggestions: >> Thanks! I tried to address all your suggestions in updated version. >> >> Best regards, >> Vladimir Ivanov >> >>> >>> I wish the is_autobox_cache condition could be more localized. Could >>> we omit the boolean >>> flag (almost always false), and where it is true, post-process the >>> node? Would that make >>> the code simpler? >>> >>> This leads me to notice that make_constant is not related strongly to >>> GraphKit; it is really >>> a call to the Type and CI modules to look for a singleton type, ending >>> with either a NULL >>> or a call to GraphKit::makecon. So you might consider changing Node* >>> GK::make_constant >>> to const Type* Type::make_constant. >>> >>> Now to pick at the argument salad we have in push_constant: The >>> effect of is_autobox_cache >>> could be transferred to a method Type[Ary]::cast_to_autobox_cache(true). >>> And the effect of stable_type on make_constant(ciCon,bool,bool,Type*), >>> could also be factored out, as post-processing step >>> contype=contype->Type::join(stabletype). >> >> From vladimir.x.ivanov at oracle.com Tue Jul 7 19:43:12 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 07 Jul 2015 22:43:12 +0300 Subject: [9] RFR (M): VM should constant fold Unsafe.get*() loads from final fields In-Reply-To: <559C1100.7000503@oracle.com> References: <5581A26C.6090303@oracle.com> <810DE23B-6616-4465-B91D-4CD9A8FB267D@oracle.com> <558AA9A1.8070603@oracle.com> <559BE22D.9030206@oracle.com> <559C1100.7000503@oracle.com> Message-ID: <559C2BD0.4030704@oracle.com> Thanks, Vladimir! Empty files are a webrev artifact when it works with a set of patches containing negating changes. Best regards, Vladimir Ivanov On 7/7/15 8:48 PM, Vladimir Kozlov wrote: > Looks reasonable to me. > > graphKit.* files are listed without changes. > > Thanks, > Vladimir K > > On 7/7/15 7:29 AM, Vladimir Ivanov wrote: >> Any volunteers to review updated version? Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> On 6/24/15 3:59 PM, Vladimir Ivanov wrote: >>> John, Paul, thanks for review! >>> >>> Updated webrev: >>> http://cr.openjdk.java.net/~vlivanov/8078629/webrev.01/ >>> >>> I spotted a bug when field and accessor types mismatch, but the JIT >>> still constant-folds the load. The fix made expected result detection >>> even more complex, so I decided to get rid of it & WhiteBox hooks >>> altogether. The test exercises different code paths and compares >>> returned values now. >>> >>>> WB.isCompileConstant is a nice little thing. We should consider using >>>> it in java.lang.invoke >>>> to gate aggressive object-folding optimizations. That's one reason to >>>> consider putting it >>>> somewhere more central that WB. I can't propose a good place yet. >>>> (Unsafe is not quite right.) >>> Actually, there's already j.l.i.MethodHandleImpl.isCompileConstant. >>> Probably, compiler-specific interface is the right place for such >>> things. But, as I wrote before, I decided to avoid WB hooks. >>> >>>> The gating logic in library_call includes this extra term: && >>>> alias_type->field()->is_constant() >>>> Why not just drop it and let make_constant do the test (which it does)? >>> I wanted to stress that make_constant depends on whether the field is >>> constant or not. I failed to come up with a better method name >>> (try_make_constant? make_constant_attempt), so I decided to keep the >>> extra condition. >>> >>>> You have some lines with "/*require_const=*/" in two places; that >>>> can't be right. >>>> This is the result of functions with too many misc. arguments to keep >>>> track of. >>>> I don't have the code under my fingers, so I'm just guessing, but here >>>> are more suggestions: >>> Thanks! I tried to address all your suggestions in updated version. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> I wish the is_autobox_cache condition could be more localized. Could >>>> we omit the boolean >>>> flag (almost always false), and where it is true, post-process the >>>> node? Would that make >>>> the code simpler? >>>> >>>> This leads me to notice that make_constant is not related strongly to >>>> GraphKit; it is really >>>> a call to the Type and CI modules to look for a singleton type, ending >>>> with either a NULL >>>> or a call to GraphKit::makecon. So you might consider changing Node* >>>> GK::make_constant >>>> to const Type* Type::make_constant. >>>> >>>> Now to pick at the argument salad we have in push_constant: The >>>> effect of is_autobox_cache >>>> could be transferred to a method >>>> Type[Ary]::cast_to_autobox_cache(true). >>>> And the effect of stable_type on make_constant(ciCon,bool,bool,Type*), >>>> could also be factored out, as post-processing step >>>> contype=contype->Type::join(stabletype). >>> >>> From roland.westrelin at oracle.com Wed Jul 8 08:24:20 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 8 Jul 2015 10:24:20 +0200 Subject: RFR: 8129920 - Vectorized loop unrolling In-Reply-To: References: <5591B1C4.302@oracle.com> Message-ID: <54A75B9E-CD4F-4546-9235-07655FB9E821@oracle.com> > Vladimir, please have a look at http://cr.openjdk.java.net/~mcberg/8129920/webrev.02 That looks good to me. Roland. From michael.haupt at oracle.com Thu Jul 9 14:46:21 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Thu, 9 Jul 2015 16:46:21 +0200 Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool Message-ID: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com> Dear all, please review and sponsor this change. RFE: https://bugs.openjdk.java.net/browse/JDK-6900757 Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00 This affects the LogCompilation tool sources *only*, with one exception in compileBroker.cpp, where an extension was necessary to properly attribute the compiler in the log message. Tested manually on various compilation logs. Thanks, Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | LangTools Team | Nashorn Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Thu Jul 9 15:34:08 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 9 Jul 2015 17:34:08 +0200 Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not allowed any more Message-ID: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com> http://cr.openjdk.java.net/~roland/8130858/webrev.00/ It used to be possible to run with CICompilerCount=1 -XX:-TieredCompilation which I find useful during debugging so trace output of compiler threads are not intermixed. This was broken by 8122937. Roland. From vladimir.x.ivanov at oracle.com Thu Jul 9 15:37:27 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 09 Jul 2015 18:37:27 +0300 Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not allowed any more In-Reply-To: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com> References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com> Message-ID: <559E9537.9010401@oracle.com> Looks good. Best regards, Vladimir Ivanov On 7/9/15 6:34 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8130858/webrev.00/ > > It used to be possible to run with CICompilerCount=1 -XX:-TieredCompilation which I find useful during debugging so trace output of compiler threads are not intermixed. This was broken by 8122937. > > Roland. > From vladimir.kozlov at oracle.com Thu Jul 9 16:09:05 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 09 Jul 2015 09:09:05 -0700 Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not allowed any more In-Reply-To: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com> References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com> Message-ID: <559E9CA1.2050505@oracle.com> Nice! Thanks, Vladimir On 7/9/15 8:34 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8130858/webrev.00/ > > It used to be possible to run with CICompilerCount=1 -XX:-TieredCompilation which I find useful during debugging so trace output of compiler threads are not intermixed. This was broken by 8122937. > > Roland. > From vladimir.kozlov at oracle.com Thu Jul 9 16:23:58 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 09 Jul 2015 09:23:58 -0700 Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool In-Reply-To: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com> References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com> Message-ID: <559EA01E.20608@oracle.com> Very nice work. Thank you for comments you added and new functionality. I think it is good for integration. Thanks, Vladimir On 7/9/15 7:46 AM, Michael Haupt wrote: > Dear all, > > please review and sponsor this change. > RFE: https://bugs.openjdk.java.net/browse/JDK-6900757 > Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00 > > This affects the LogCompilation tool sources *only*, with one exception in compileBroker.cpp, where an extension was > necessary to properly attribute the compiler in the log message. > > Tested manually on various compilation logs. > > Thanks, > > Michael > > -- > > Oracle > Dr. Michael Haupt | Principal Member of Technical Staff > Phone: +49 331 200 7277 | Fax: +49 331 200 7561 > OracleJava Platform Group | LangTools Team | Nashorn > Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany > Green Oracle Oracle is committed to developing practices and products that help > protect the environment > > From cnewland at chrisnewland.com Thu Jul 9 20:10:40 2015 From: cnewland at chrisnewland.com (Chris Newland) Date: Thu, 9 Jul 2015 21:10:40 +0100 Subject: Making PrintEscapeAnalysis a diagnostic option on product VM? In-Reply-To: <558430C6.1090102@oracle.com> References: <5583F6EE.7070901@oracle.com> <558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com> Message-ID: Hi, I've found a way to output EA information via LogCompilation from the product VM including BCIs so that I can annotate bytecode but it relies on the following block from opto/compile.cpp being entered: void Compile::Init(int aliaslevel) { ... if (debug_info()->recording_non_safepoints()) { set_node_note_array(new(comp_arena()) GrowableArray (comp_arena(), 8, 0, NULL)); set_default_node_notes(Node_Notes::make(this)); } Without this, the Note_Notes are not present in the ideal nodes and I can't fully identify what was eliminated. It looks like this will only execute when DebugNonSafepoints is true and this is a diagnostic VM option. Does anyone have an alternative method for getting BCIs for eliminated allocs without using Note_Notes? Thanks, Chris On Fri, June 19, 2015 16:09, Vladimir Kozlov wrote: > Agree. > > > Vladimir K > > > On 6/19/15 5:10 AM, Vladimir Ivanov wrote: > >>> What do you think the next step is? >>> >>> >>> I'm not a committer but I'd be happy to submit a patch/webrev that >>> outputs LogCompilation XML for the kind of EA info I think would be >>> useful. >> Go for it. If you are a Contributor (signed OCA), we'll review and >> accept your patch with gratitude. Keep in mind, that when you touch >> LogCompilation output format, you should update logc tool >> (src/share/tools/LogCompilation/ [1]) as well. >> >> >>> I've just seen Vitaly's post and I agree a tty 1-liner for each >>> elimination would also be nice. >> Feel free to enhance -XX:+PrintEscapeAnalysis output as well, if you >> find it useful. >> >> Best regards, >> Vladimir Ivanov >> >> >> [1] >> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/tools/L >> ogCompilation >>> >>> Thanks, >>> >>> >>> Chris >>> >>> >>> On Fri, June 19, 2015 12:03, Vladimir Ivanov wrote: >>> >>>> Chris, >>>> >>>> >>>> >>>> I'd suggest to look into enhancing LogCompilation output instead of >>>> parsing VM output. It doesn't require any flag changes and fits >>>> nicely into existing LogCompilation functionality, so we can >>>> integrate it into the product, relieving you and JITWatch users from >>>> building a companion VM. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> >>>> >>>> On 6/19/15 1:16 PM, Chris Newland wrote: >>>> >>>> >>>>> Hi, hope this is the correct list (perhaps serviceability?) >>>>> >>>>> >>>>> >>>>> I'm experimenting with some HotSpot changes that log escape >>>>> analysis decisions so that I can visualise eliminated allocations >>>>> at the source and bytecode levels in JITWatch[1]. >>>>> >>>>> My plan was to build a companion VM for JITWatch based on the >>>>> product VM >>>>> that would allow users to inspect some of the deeper workings such >>>>> as EA and DCE that are not present in the LogCompilation output. >>>>> >>>>> I mentioned this to some performance guys at Devoxx and they >>>>> didn't like the custom VM idea and suggested I put in a request to >>>>> consider making -XX:+PrintEscapeAnalysis available under >>>>> -XX:+UnlockDiagnosticVMOptions on >>>>> the product VM (it's currently a notproduct option). >>>>> >>>>> If this is something you would consider than could I also request >>>>> consideration of -XX:+PrintEliminateAllocations. >>>>> >>>>> All I would need is the class, method, and bci of each NoEscape >>>>> detected. >>>>> >>>>> Kind regards, >>>>> >>>>> >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>> [1] https://github.com/AdoptOpenJDK/jitwatch >>>>> >>>>> >>>>> >>>> >>> >>> > From cnewland at chrisnewland.com Thu Jul 9 20:49:56 2015 From: cnewland at chrisnewland.com (Chris Newland) Date: Thu, 9 Jul 2015 21:49:56 +0100 Subject: Making PrintEscapeAnalysis a diagnostic option on product VM? In-Reply-To: References: <5583F6EE.7070901@oracle.com> <558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com> Message-ID: Please ignore previous message. All this time I've been digging around the connection graph logging in escape.cpp when I should have been looking in macro.cpp. I believe the BCI information I need is actually already output by PhaseMacroExpand::eliminate_allocate_node() so I'll attempt to visualise this in JITWatch and submit a patch if I need anything further. Thanks, Chris On Thu, July 9, 2015 21:10, Chris Newland wrote: > Hi, > > > I've found a way to output EA information via LogCompilation from the > product VM including BCIs so that I can annotate bytecode but it relies on > the following block from opto/compile.cpp being entered: > > void Compile::Init(int aliaslevel) { ... > > > if (debug_info()->recording_non_safepoints()) { > set_node_note_array(new(comp_arena()) GrowableArray > (comp_arena(), 8, 0, NULL)); > set_default_node_notes(Node_Notes::make(this)); > } > > > Without this, the Note_Notes are not present in the ideal nodes and I > can't fully identify what was eliminated. > > It looks like this will only execute when DebugNonSafepoints is true and > this is a diagnostic VM option. > > Does anyone have an alternative method for getting BCIs for eliminated > allocs without using Note_Notes? > > Thanks, > > > Chris > > > > > > On Fri, June 19, 2015 16:09, Vladimir Kozlov wrote: > >> Agree. >> >> >> >> Vladimir K >> >> >> >> On 6/19/15 5:10 AM, Vladimir Ivanov wrote: >> >> >>>> What do you think the next step is? >>>> >>>> >>>> >>>> I'm not a committer but I'd be happy to submit a patch/webrev that >>>> outputs LogCompilation XML for the kind of EA info I think would be >>>> useful. >>> Go for it. If you are a Contributor (signed OCA), we'll review and >>> accept your patch with gratitude. Keep in mind, that when you touch >>> LogCompilation output format, you should update logc tool >>> (src/share/tools/LogCompilation/ [1]) as well. >>> >>> >>> >>>> I've just seen Vitaly's post and I agree a tty 1-liner for each >>>> elimination would also be nice. >>> Feel free to enhance -XX:+PrintEscapeAnalysis output as well, if you >>> find it useful. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> >>> >>> [1] >>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/tools/ >>> L >>> ogCompilation >>>> >>>> Thanks, >>>> >>>> >>>> >>>> Chris >>>> >>>> >>>> >>>> On Fri, June 19, 2015 12:03, Vladimir Ivanov wrote: >>>> >>>> >>>>> Chris, >>>>> >>>>> >>>>> >>>>> >>>>> I'd suggest to look into enhancing LogCompilation output instead >>>>> of parsing VM output. It doesn't require any flag changes and fits >>>>> nicely into existing LogCompilation functionality, so we can >>>>> integrate it into the product, relieving you and JITWatch users >>>>> from building a companion VM. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> >>>>> >>>>> >>>>> On 6/19/15 1:16 PM, Chris Newland wrote: >>>>> >>>>> >>>>> >>>>>> Hi, hope this is the correct list (perhaps serviceability?) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I'm experimenting with some HotSpot changes that log escape >>>>>> analysis decisions so that I can visualise eliminated >>>>>> allocations at the source and bytecode levels in JITWatch[1]. >>>>>> >>>>>> My plan was to build a companion VM for JITWatch based on the >>>>>> product VM that would allow users to inspect some of the deeper >>>>>> workings such as EA and DCE that are not present in the >>>>>> LogCompilation output. >>>>>> >>>>>> >>>>>> I mentioned this to some performance guys at Devoxx and they >>>>>> didn't like the custom VM idea and suggested I put in a request >>>>>> to consider making -XX:+PrintEscapeAnalysis available under >>>>>> -XX:+UnlockDiagnosticVMOptions on >>>>>> the product VM (it's currently a notproduct option). >>>>>> >>>>>> If this is something you would consider than could I also >>>>>> request consideration of -XX:+PrintEliminateAllocations. >>>>>> >>>>>> All I would need is the class, method, and bci of each NoEscape >>>>>> detected. >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [1] https://github.com/AdoptOpenJDK/jitwatch >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >> > > > From vladimir.kozlov at oracle.com Thu Jul 9 21:00:45 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 09 Jul 2015 14:00:45 -0700 Subject: Making PrintEscapeAnalysis a diagnostic option on product VM? In-Reply-To: References: <5583F6EE.7070901@oracle.com> <558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com> Message-ID: <559EE0FD.10502@oracle.com> Thank you for doing this, Chris Each call node (and allocation is call node) has associated jvm state (the information needed to deoptimize compiled code on the return from the call). Deoptimization info allow to restart execution in interpreter so it has all bci and inlined methods information. Regards, Vladimir On 7/9/15 1:49 PM, Chris Newland wrote: > Please ignore previous message. > > All this time I've been digging around the connection graph logging in > escape.cpp when I should have been looking in macro.cpp. > > I believe the BCI information I need is actually already output by > PhaseMacroExpand::eliminate_allocate_node() so I'll attempt to visualise > this in JITWatch and submit a patch if I need anything further. > > Thanks, > > Chris > > On Thu, July 9, 2015 21:10, Chris Newland wrote: >> Hi, >> >> >> I've found a way to output EA information via LogCompilation from the >> product VM including BCIs so that I can annotate bytecode but it relies on >> the following block from opto/compile.cpp being entered: >> >> void Compile::Init(int aliaslevel) { ... >> >> >> if (debug_info()->recording_non_safepoints()) { >> set_node_note_array(new(comp_arena()) GrowableArray >> (comp_arena(), 8, 0, NULL)); >> set_default_node_notes(Node_Notes::make(this)); >> } >> >> >> Without this, the Note_Notes are not present in the ideal nodes and I >> can't fully identify what was eliminated. >> >> It looks like this will only execute when DebugNonSafepoints is true and >> this is a diagnostic VM option. >> >> Does anyone have an alternative method for getting BCIs for eliminated >> allocs without using Note_Notes? >> >> Thanks, >> >> >> Chris >> >> >> >> >> >> On Fri, June 19, 2015 16:09, Vladimir Kozlov wrote: >> >>> Agree. >>> >>> >>> >>> Vladimir K >>> >>> >>> >>> On 6/19/15 5:10 AM, Vladimir Ivanov wrote: >>> >>> >>>>> What do you think the next step is? >>>>> >>>>> >>>>> >>>>> I'm not a committer but I'd be happy to submit a patch/webrev that >>>>> outputs LogCompilation XML for the kind of EA info I think would be >>>>> useful. >>>> Go for it. If you are a Contributor (signed OCA), we'll review and >>>> accept your patch with gratitude. Keep in mind, that when you touch >>>> LogCompilation output format, you should update logc tool >>>> (src/share/tools/LogCompilation/ [1]) as well. >>>> >>>> >>>> >>>>> I've just seen Vitaly's post and I agree a tty 1-liner for each >>>>> elimination would also be nice. >>>> Feel free to enhance -XX:+PrintEscapeAnalysis output as well, if you >>>> find it useful. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> >>>> >>>> [1] >>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/tools/ >>>> L >>>> ogCompilation >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>> On Fri, June 19, 2015 12:03, Vladimir Ivanov wrote: >>>>> >>>>> >>>>>> Chris, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I'd suggest to look into enhancing LogCompilation output instead >>>>>> of parsing VM output. It doesn't require any flag changes and fits >>>>>> nicely into existing LogCompilation functionality, so we can >>>>>> integrate it into the product, relieving you and JITWatch users >>>>>> from building a companion VM. >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 6/19/15 1:16 PM, Chris Newland wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi, hope this is the correct list (perhaps serviceability?) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I'm experimenting with some HotSpot changes that log escape >>>>>>> analysis decisions so that I can visualise eliminated >>>>>>> allocations at the source and bytecode levels in JITWatch[1]. >>>>>>> >>>>>>> My plan was to build a companion VM for JITWatch based on the >>>>>>> product VM that would allow users to inspect some of the deeper >>>>>>> workings such as EA and DCE that are not present in the >>>>>>> LogCompilation output. >>>>>>> >>>>>>> >>>>>>> I mentioned this to some performance guys at Devoxx and they >>>>>>> didn't like the custom VM idea and suggested I put in a request >>>>>>> to consider making -XX:+PrintEscapeAnalysis available under >>>>>>> -XX:+UnlockDiagnosticVMOptions on >>>>>>> the product VM (it's currently a notproduct option). >>>>>>> >>>>>>> If this is something you would consider than could I also >>>>>>> request consideration of -XX:+PrintEliminateAllocations. >>>>>>> >>>>>>> All I would need is the class, method, and bci of each NoEscape >>>>>>> detected. >>>>>>> >>>>>>> Kind regards, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] https://github.com/AdoptOpenJDK/jitwatch >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>> >> >> >> > > From anthony.scarpino at oracle.com Thu Jul 9 21:07:39 2015 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Thu, 09 Jul 2015 14:07:39 -0700 Subject: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException Message-ID: <559EE29B.6030307@oracle.com> Hi all, I need a review of my bug fix. Most of the lines are a test update, with a small important change to the 32bit assembly to save & restore registers around the GHASH op. http://cr.openjdk.java.net/~ascarpino/8130341/webrev/ thanks Tony From vladimir.kozlov at oracle.com Thu Jul 9 21:15:05 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 09 Jul 2015 14:15:05 -0700 Subject: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException In-Reply-To: <559EE29B.6030307@oracle.com> References: <559EE29B.6030307@oracle.com> Message-ID: <559EE459.4000202@oracle.com> Looks good. Thank you for fixing this. Thanks, Vladimir On 7/9/15 2:07 PM, Anthony Scarpino wrote: > Hi all, > > I need a review of my bug fix. Most of the lines are a test update, with a small important change to the 32bit assembly > to save & restore registers around the GHASH op. > > http://cr.openjdk.java.net/~ascarpino/8130341/webrev/ > > thanks > > Tony From michael.c.berg at intel.com Thu Jul 9 21:19:06 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 9 Jul 2015 21:19:06 +0000 Subject: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException In-Reply-To: <559EE459.4000202@oracle.com> References: <559EE29B.6030307@oracle.com> <559EE459.4000202@oracle.com> Message-ID: Looks good Anthony. Thanks, -Michael -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Thursday, July 09, 2015 2:15 PM To: Anthony Scarpino; hotspot-compiler-dev at openjdk.java.net compiler Subject: Re: RFR: 8130341 GHASH 32bit intrinsics has AEADBadTagException Looks good. Thank you for fixing this. Thanks, Vladimir On 7/9/15 2:07 PM, Anthony Scarpino wrote: > Hi all, > > I need a review of my bug fix. Most of the lines are a test update, > with a small important change to the 32bit assembly to save & restore registers around the GHASH op. > > http://cr.openjdk.java.net/~ascarpino/8130341/webrev/ > > thanks > > Tony From anthony.scarpino at oracle.com Thu Jul 9 21:30:23 2015 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Thu, 09 Jul 2015 14:30:23 -0700 Subject: build error in jdk9/hs-comp/? Message-ID: <559EE7EF.3070005@oracle.com> Anyone else seeing the below error with CompileDemos.gmk in hs-comp? I updated my repo and brought over a new fresh one, but got the same failure. "--enable-deploy=no" works around the problem. $ make images Compiling 5 files for BUILD_GENMODULESLIST Building target 'images' in configuration 'linux-x86_64-normal-server-release' Compiling 8 files for BUILD_TOOLS_LANGTOOLS make[3]: CompileDemos.gmk: No such file or directory make[3]: *** No rule to make target 'CompileDemos.gmk'. Stop. make[2]: *** [demos-deploy] Error 1 Tony From john.r.rose at oracle.com Thu Jul 9 21:34:58 2015 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Jul 2015 14:34:58 -0700 Subject: Making PrintEscapeAnalysis a diagnostic option on product VM? In-Reply-To: References: <5583F6EE.7070901@oracle.com> <558406A0.6060301@oracle.com> <558430C6.1090102@oracle.com> Message-ID: <1BE18DE5-716C-47B8-9B07-640C080758F6@oracle.com> On Jul 9, 2015, at 1:10 PM, Chris Newland wrote: > > It looks like this will only execute when DebugNonSafepoints is true and > this is a diagnostic VM option. > > Does anyone have an alternative method for getting BCIs for eliminated > allocs without using Note_Notes? The node notes are incomplete and approximate mappings from optimized IR back to source (BCI) locations. They may be useful when nothing more accurate is available, but you can't rely on them to always tell the truth. They were introduced to give hints to instruction-level profile tools about where the instructions come from. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Jul 9 21:45:10 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 09 Jul 2015 14:45:10 -0700 Subject: build error in jdk9/hs-comp/? In-Reply-To: <559EE7EF.3070005@oracle.com> References: <559EE7EF.3070005@oracle.com> Message-ID: <559EEB66.4010406@oracle.com> No. We passed JPRT builds. And I saw your test JPRT job passed too today. I would suggest to create fresh clone of hs-comp forest and build it from scratch. Vladimir On 7/9/15 2:30 PM, Anthony Scarpino wrote: > Anyone else seeing the below error with CompileDemos.gmk in hs-comp? I updated my repo and brought over a new fresh > one, but got the same failure. "--enable-deploy=no" works around the problem. > > $ make images > Compiling 5 files for BUILD_GENMODULESLIST > Building target 'images' in configuration 'linux-x86_64-normal-server-release' > Compiling 8 files for BUILD_TOOLS_LANGTOOLS > make[3]: CompileDemos.gmk: No such file or directory > make[3]: *** No rule to make target 'CompileDemos.gmk'. Stop. > make[2]: *** [demos-deploy] Error 1 > > Tony From serkan at hazelcast.com Sun Jul 12 10:29:34 2015 From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=) Date: Sun, 12 Jul 2015 13:29:34 +0300 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: Message-ID: Hi all, I have created a webrev for review including the patch and shared for public access from here: https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html Regards. On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal wrote: > Hi, > > I have added some logs to show that problem is caused by double scaling of > offset (index) > > Here is my updated (log messages added) reproducer code: > > > int count = 100000; > long size = count * 8L; > long baseAddress = unsafe.allocateMemory(size); > System.out.println("Start address: " + Long.toHexString(baseAddress) + > ", End address: " + Long.toHexString(baseAddress + > size)); > > for (int i = 0; i < count; i++) { > long address = baseAddress + (i * 8L); > System.out.println( > "Normal: " + Long.toHexString(address) + ", " + > "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * > 8L))); > long expected = i; > unsafe.putLong(address, expected); > unsafe.getLong(address); > } > > > After sometime it crashes as > > > ... > Current thread (0x0000000002068800): JavaThread "main" [_thread_in_Java, > id=10412, stack(0x00000000023f0000,0x00000000024f0000)] > > siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 > ... > ... > > > And here is output of the execution until crash: > > Start address: 58bbcfa0, End address: 58c804a0 > Normal: 58bbcfa0, If double scaled: 58bbcfa0 > Normal: 58bbcfa8, If double scaled: 58bbcfe0 > Normal: 58bbcfb0, If double scaled: 58bbd020 > ... > ... > Normal: 58c517b0, If double scaled: 59061020 > > > As seen from the logs and crash dump, double scaled version of target > address (*If double scaled: 59061020*) is the same with the problematic > address (*siginfo: ExceptionCode=0xc0000005, reading address > 0x0000000059061020*) that causes to crash while accessing it. > > So I think, it is obvious that the crash is caused by wrong optimization > of index value since index is scaled two times (for *Unsafe::put* and > *Unsafe::get*) instead of only one time. Then double scaled index points > to invalid memory address. > > Regards. > > On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal wrote: > >> Hi all, >> >> I had dived into the issue with JDK-HotSpot commits and >> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >> >> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> } >> >> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> } >> >> >> So I run the test by calculating address as >> - *"int * long"* (int is index and long is 8l) >> - *"long * long"* (the first long is index and the second long is 8l) >> - *"int * int"* (the first int is index and the second int is 8) >> >> Here are the logs: >> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 >> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 >> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 >> >> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. >> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to >> same *"base"* and *"index"* instructions. >> This means that address is scaled one more time because there should be only one scale. >> >> >> When I debugged the non-problematic run (*"int * int"*), >> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always. >> So there is no scaling. >> static bool match_index_and_scale(Instruction* instr, >> Instruction** index, >> int* log2_scale) { >> ... >> >> ArithmeticOp* arith = instr->as_ArithmeticOp(); >> if (arith != NULL) { >> ... >> } >> >> return false; >> } >> >> >> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: >> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >> Instruction* base = NULL; >> Instruction* index = NULL; >> int log2_scale; >> >> if (match(x, &base, &index, &log2_scale)) { >> x->set_base(base); >> x->set_index(index); // The fix attempt here // ///////////////////////////// >> if (index != NULL) { >> if (index->is_pinned()) { >> log2_scale = 0; >> } else { >> if (log2_scale != 0) { >> index->pin(); >> } >> } >> } // ///////////////////////////// >> x->set_log2_scale(log2_scale); >> if (PrintUnsafeOptimization) { >> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> } >> } >> } >> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction >> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. >> >> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs: >> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 >> >> I am not sure my fix attempt is a really fix or maybe there are better fixes. >> >> Regards. >> >> -- >> >> Serkan ?ZAL >> >> >>> Btw, (thanks to one my colleagues), when address calculation in the loop is >>> converted to >>> long address = baseAddress + (i * 8) >>> test passes. Only difference is next long pointer is calculated using >>> integer 8 instead of long 8. >>> ``` >>> for (int i = 0; i < count; i++) { >>> long address = baseAddress + (i * 8); // <--- here, integer 8 instead >>> of long 8 >>> long expected = i; >>> unsafe.putLong(address, expected); >>> long actual = unsafe.getLong(address); >>> if (expected != actual) { >>> throw new AssertionError("Expected: " + expected + ", Actual: " + >>> actual); >>> } >>> } >>> ``` >>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: >>> >* Hi all, >>> *> >>> >* While I was testing my app using java 8, I encountered the previously >>> *>* reported sun.misc.Unsafe issue. >>> *> >>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 >>> *> >>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >>> *> >>> >* Issue status says it's resolved with resolution "Cannot Reproduce". But >>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and >>> *>* "1.9.0-ea-b67". >>> *> >>> >* Test is very simple: >>> *> >>> >* ``` >>> *>* public static void main(String[] args) throws Exception { >>> *>* Unsafe unsafe = findUnsafe(); >>> *>* // 10000 pass >>> *>* // 100000 jvm crash >>> *>* // 1000000 fail >>> *>* int count = 100000; >>> *>* long size = count * 8L; >>> *>* long baseAddress = unsafe.allocateMemory(size); >>> *> >>> >* try { >>> *>* for (int i = 0; i < count; i++) { >>> *>* long address = baseAddress + (i * 8L); >>> *> >>> >* long expected = i; >>> *>* unsafe.putLong(address, expected); >>> *> >>> >* long actual = unsafe.getLong(address); >>> *> >>> >* if (expected != actual) { >>> *>* throw new AssertionError("Expected: " + expected + ", >>> *>* Actual: " + actual); >>> *>* } >>> *>* } >>> *>* } finally { >>> *>* unsafe.freeMemory(baseAddress); >>> *>* } >>> *>* } >>> *>* ``` >>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is >>> *>* failing constantly. >>> *> >>> >* - With iteration count 10000, test is passing. >>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV. >>> *>* - With iteration count 1000000, test is failing with AssertionError. >>> *> >>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or >>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not >>> *>* failing at all. >>> *> >>> >* I tested on platforms: >>> *>* - Centos-7/openjdk-1.8.0.45 >>> *>* - OSX/oraclejdk-1.8.0.40 >>> *>* - OSX/oraclejdk-1.8.0.45 >>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18 >>> *>* - OSX/oraclejdk-1.9.0-ea-b67 >>> *> >>> >* Previous issue comment ( >>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) >>> *>* says "Cannot reproduce based on the latest version". I hope that latest >>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because >>> *>* both are failing. >>> *> >>> >* I'm looking forward to hearing from you. >>> *> >>> >* Thanks, >>> *>* -Mehmet Dogan- >>> *>* -- >>> *> >>> >* @mmdogan >>> *> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal >> > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > -- Serkan ?ZAL Remotest Software Engineer GSM: +90 542 680 39 18 Twitter: @serkan_ozal -------------- next part -------------- An HTML attachment was scrubbed... URL: From martijnverburg at gmail.com Sun Jul 12 11:54:55 2015 From: martijnverburg at gmail.com (Martijn Verburg) Date: Sun, 12 Jul 2015 12:54:55 +0100 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: Message-ID: Non reviewer here, but I'd add to the comment *why* you don't want to scale again. Cheers, Martijn On 12 July 2015 at 11:29, Serkan ?zal wrote: > Hi all, > > I have created a webrev for review including the patch and shared for > public access from here: > https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html > > Regards. > > On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal wrote: > >> Hi, >> >> I have added some logs to show that problem is caused by double scaling >> of offset (index) >> >> Here is my updated (log messages added) reproducer code: >> >> >> int count = 100000; >> long size = count * 8L; >> long baseAddress = unsafe.allocateMemory(size); >> System.out.println("Start address: " + Long.toHexString(baseAddress) + >> ", End address: " + Long.toHexString(baseAddress + >> size)); >> >> for (int i = 0; i < count; i++) { >> long address = baseAddress + (i * 8L); >> System.out.println( >> "Normal: " + Long.toHexString(address) + ", " + >> "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * >> 8L))); >> long expected = i; >> unsafe.putLong(address, expected); >> unsafe.getLong(address); >> } >> >> >> After sometime it crashes as >> >> >> ... >> Current thread (0x0000000002068800): JavaThread "main" [_thread_in_Java, >> id=10412, stack(0x00000000023f0000,0x00000000024f0000)] >> >> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 >> ... >> ... >> >> >> And here is output of the execution until crash: >> >> Start address: 58bbcfa0, End address: 58c804a0 >> Normal: 58bbcfa0, If double scaled: 58bbcfa0 >> Normal: 58bbcfa8, If double scaled: 58bbcfe0 >> Normal: 58bbcfb0, If double scaled: 58bbd020 >> ... >> ... >> Normal: 58c517b0, If double scaled: 59061020 >> >> >> As seen from the logs and crash dump, double scaled version of target >> address (*If double scaled: 59061020*) is the same with the problematic >> address (*siginfo: ExceptionCode=0xc0000005, reading address >> 0x0000000059061020*) that causes to crash while accessing it. >> >> So I think, it is obvious that the crash is caused by wrong optimization >> of index value since index is scaled two times (for *Unsafe::put* and >> *Unsafe::get*) instead of only one time. Then double scaled index points >> to invalid memory address. >> >> Regards. >> >> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal >> wrote: >> >>> Hi all, >>> >>> I had dived into the issue with JDK-HotSpot commits and >>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >>> >>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>> } >>> >>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>> } >>> >>> >>> So I run the test by calculating address as >>> - *"int * long"* (int is index and long is 8l) >>> - *"long * long"* (the first long is index and the second long is 8l) >>> - *"int * int"* (the first int is index and the second int is 8) >>> >>> Here are the logs: >>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 >>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 >>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 >>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 >>> >>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. >>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to >>> same *"base"* and *"index"* instructions. >>> This means that address is scaled one more time because there should be only one scale. >>> >>> >>> When I debugged the non-problematic run (*"int * int"*), >>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always. >>> So there is no scaling. >>> static bool match_index_and_scale(Instruction* instr, >>> Instruction** index, >>> int* log2_scale) { >>> ... >>> >>> ArithmeticOp* arith = instr->as_ArithmeticOp(); >>> if (arith != NULL) { >>> ... >>> } >>> >>> return false; >>> } >>> >>> >>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: >>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >>> Instruction* base = NULL; >>> Instruction* index = NULL; >>> int log2_scale; >>> >>> if (match(x, &base, &index, &log2_scale)) { >>> x->set_base(base); >>> x->set_index(index); // The fix attempt here // ///////////////////////////// >>> if (index != NULL) { >>> if (index->is_pinned()) { >>> log2_scale = 0; >>> } else { >>> if (log2_scale != 0) { >>> index->pin(); >>> } >>> } >>> } // ///////////////////////////// >>> x->set_log2_scale(log2_scale); >>> if (PrintUnsafeOptimization) { >>> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", >>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>> } >>> } >>> } >>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction >>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. >>> >>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs: >>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 >>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 >>> >>> I am not sure my fix attempt is a really fix or maybe there are better fixes. >>> >>> Regards. >>> >>> -- >>> >>> Serkan ?ZAL >>> >>> >>>> Btw, (thanks to one my colleagues), when address calculation in the loop is >>>> converted to >>>> long address = baseAddress + (i * 8) >>>> test passes. Only difference is next long pointer is calculated using >>>> integer 8 instead of long 8. >>>> ``` >>>> for (int i = 0; i < count; i++) { >>>> long address = baseAddress + (i * 8); // <--- here, integer 8 instead >>>> of long 8 >>>> long expected = i; >>>> unsafe.putLong(address, expected); >>>> long actual = unsafe.getLong(address); >>>> if (expected != actual) { >>>> throw new AssertionError("Expected: " + expected + ", Actual: " + >>>> actual); >>>> } >>>> } >>>> ``` >>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: >>>> >* Hi all, >>>> *> >>>> >* While I was testing my app using java 8, I encountered the previously >>>> *>* reported sun.misc.Unsafe issue. >>>> *> >>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 >>>> *> >>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >>>> *> >>>> >* Issue status says it's resolved with resolution "Cannot Reproduce". But >>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and >>>> *>* "1.9.0-ea-b67". >>>> *> >>>> >* Test is very simple: >>>> *> >>>> >* ``` >>>> *>* public static void main(String[] args) throws Exception { >>>> *>* Unsafe unsafe = findUnsafe(); >>>> *>* // 10000 pass >>>> *>* // 100000 jvm crash >>>> *>* // 1000000 fail >>>> *>* int count = 100000; >>>> *>* long size = count * 8L; >>>> *>* long baseAddress = unsafe.allocateMemory(size); >>>> *> >>>> >* try { >>>> *>* for (int i = 0; i < count; i++) { >>>> *>* long address = baseAddress + (i * 8L); >>>> *> >>>> >* long expected = i; >>>> *>* unsafe.putLong(address, expected); >>>> *> >>>> >* long actual = unsafe.getLong(address); >>>> *> >>>> >* if (expected != actual) { >>>> *>* throw new AssertionError("Expected: " + expected + ", >>>> *>* Actual: " + actual); >>>> *>* } >>>> *>* } >>>> *>* } finally { >>>> *>* unsafe.freeMemory(baseAddress); >>>> *>* } >>>> *>* } >>>> *>* ``` >>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is >>>> *>* failing constantly. >>>> *> >>>> >* - With iteration count 10000, test is passing. >>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV. >>>> *>* - With iteration count 1000000, test is failing with AssertionError. >>>> *> >>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or >>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not >>>> *>* failing at all. >>>> *> >>>> >* I tested on platforms: >>>> *>* - Centos-7/openjdk-1.8.0.45 >>>> *>* - OSX/oraclejdk-1.8.0.40 >>>> *>* - OSX/oraclejdk-1.8.0.45 >>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18 >>>> *>* - OSX/oraclejdk-1.9.0-ea-b67 >>>> *> >>>> >* Previous issue comment ( >>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) >>>> *>* says "Cannot reproduce based on the latest version". I hope that latest >>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because >>>> *>* both are failing. >>>> *> >>>> >* I'm looking forward to hearing from you. >>>> *> >>>> >* Thanks, >>>> *>* -Mehmet Dogan- >>>> *>* -- >>>> *> >>>> >* @mmdogan >>>> *> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> Twitter: @serkan_ozal >>> >> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal >> > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > -------------- next part -------------- An HTML attachment was scrubbed... URL: From serkan at hazelcast.com Sun Jul 12 12:07:06 2015 From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=) Date: Sun, 12 Jul 2015 15:07:06 +0300 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: Message-ID: Hi Martjin, Thanks for your interest and comment for making this thread a little bit more hot. >From my previous message ( http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html ): I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { if (OptimizeUnsafes) do_UnsafeRawOp(x); tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); } void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { if (OptimizeUnsafes) do_UnsafeRawOp(x); tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); } So I run the test by calculating address as: - *"int * long"* (int is index and long is 8l) - *"long * long"* (the first long is index and the second long is 8l) - *"int * int"* (the first int is index and the second int is 8) Here are the logs: *int * long:* Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 *long * long:* Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 *int * int:* Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to same *"base"* and *"index"* instructions. This means that address is scaled one more time because there should be only one scale. With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction. Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory. On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg wrote: > Non reviewer here, but I'd add to the comment *why* you don't want to > scale again. > > Cheers, > Martijn > > On 12 July 2015 at 11:29, Serkan ?zal wrote: > >> Hi all, >> >> I have created a webrev for review including the patch and shared for >> public access from here: >> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html >> >> Regards. >> >> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal wrote: >> >>> Hi, >>> >>> I have added some logs to show that problem is caused by double scaling >>> of offset (index) >>> >>> Here is my updated (log messages added) reproducer code: >>> >>> >>> int count = 100000; >>> long size = count * 8L; >>> long baseAddress = unsafe.allocateMemory(size); >>> System.out.println("Start address: " + Long.toHexString(baseAddress) + >>> ", End address: " + Long.toHexString(baseAddress + >>> size)); >>> >>> for (int i = 0; i < count; i++) { >>> long address = baseAddress + (i * 8L); >>> System.out.println( >>> "Normal: " + Long.toHexString(address) + ", " + >>> "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * >>> 8L))); >>> long expected = i; >>> unsafe.putLong(address, expected); >>> unsafe.getLong(address); >>> } >>> >>> >>> After sometime it crashes as >>> >>> >>> ... >>> Current thread (0x0000000002068800): JavaThread "main" >>> [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)] >>> >>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 >>> ... >>> ... >>> >>> >>> And here is output of the execution until crash: >>> >>> Start address: 58bbcfa0, End address: 58c804a0 >>> Normal: 58bbcfa0, If double scaled: 58bbcfa0 >>> Normal: 58bbcfa8, If double scaled: 58bbcfe0 >>> Normal: 58bbcfb0, If double scaled: 58bbd020 >>> ... >>> ... >>> Normal: 58c517b0, If double scaled: 59061020 >>> >>> >>> As seen from the logs and crash dump, double scaled version of target >>> address (*If double scaled: 59061020*) is the same with the problematic >>> address (*siginfo: ExceptionCode=0xc0000005, reading address >>> 0x0000000059061020*) that causes to crash while accessing it. >>> >>> So I think, it is obvious that the crash is caused by wrong optimization >>> of index value since index is scaled two times (for *Unsafe::put* and >>> *Unsafe::get*) instead of only one time. Then double scaled index >>> points to invalid memory address. >>> >>> Regards. >>> >>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal >>> wrote: >>> >>>> Hi all, >>>> >>>> I had dived into the issue with JDK-HotSpot commits and >>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >>>> >>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >>>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>>> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>> } >>>> >>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >>>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>>> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>> } >>>> >>>> >>>> So I run the test by calculating address as >>>> - *"int * long"* (int is index and long is 8l) >>>> - *"long * long"* (the first long is index and the second long is 8l) >>>> - *"int * int"* (the first int is index and the second int is 8) >>>> >>>> Here are the logs: >>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 >>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 >>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 >>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 >>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 >>>> >>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. >>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to >>>> same *"base"* and *"index"* instructions. >>>> This means that address is scaled one more time because there should be only one scale. >>>> >>>> >>>> When I debugged the non-problematic run (*"int * int"*), >>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always. >>>> So there is no scaling. >>>> static bool match_index_and_scale(Instruction* instr, >>>> Instruction** index, >>>> int* log2_scale) { >>>> ... >>>> >>>> ArithmeticOp* arith = instr->as_ArithmeticOp(); >>>> if (arith != NULL) { >>>> ... >>>> } >>>> >>>> return false; >>>> } >>>> >>>> >>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: >>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >>>> Instruction* base = NULL; >>>> Instruction* index = NULL; >>>> int log2_scale; >>>> >>>> if (match(x, &base, &index, &log2_scale)) { >>>> x->set_base(base); >>>> x->set_index(index); // The fix attempt here // ///////////////////////////// >>>> if (index != NULL) { >>>> if (index->is_pinned()) { >>>> log2_scale = 0; >>>> } else { >>>> if (log2_scale != 0) { >>>> index->pin(); >>>> } >>>> } >>>> } // ///////////////////////////// >>>> x->set_log2_scale(log2_scale); >>>> if (PrintUnsafeOptimization) { >>>> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", >>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>> } >>>> } >>>> } >>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction >>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. >>>> >>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs: >>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 >>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 >>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 >>>> >>>> I am not sure my fix attempt is a really fix or maybe there are better fixes. >>>> >>>> Regards. >>>> >>>> -- >>>> >>>> Serkan ?ZAL >>>> >>>> >>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is >>>>> converted to >>>>> long address = baseAddress + (i * 8) >>>>> test passes. Only difference is next long pointer is calculated using >>>>> integer 8 instead of long 8. >>>>> ``` >>>>> for (int i = 0; i < count; i++) { >>>>> long address = baseAddress + (i * 8); // <--- here, integer 8 instead >>>>> of long 8 >>>>> long expected = i; >>>>> unsafe.putLong(address, expected); >>>>> long actual = unsafe.getLong(address); >>>>> if (expected != actual) { >>>>> throw new AssertionError("Expected: " + expected + ", Actual: " + >>>>> actual); >>>>> } >>>>> } >>>>> ``` >>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: >>>>> >* Hi all, >>>>> *> >>>>> >* While I was testing my app using java 8, I encountered the previously >>>>> *>* reported sun.misc.Unsafe issue. >>>>> *> >>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 >>>>> *> >>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >>>>> *> >>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce". But >>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and >>>>> *>* "1.9.0-ea-b67". >>>>> *> >>>>> >* Test is very simple: >>>>> *> >>>>> >* ``` >>>>> *>* public static void main(String[] args) throws Exception { >>>>> *>* Unsafe unsafe = findUnsafe(); >>>>> *>* // 10000 pass >>>>> *>* // 100000 jvm crash >>>>> *>* // 1000000 fail >>>>> *>* int count = 100000; >>>>> *>* long size = count * 8L; >>>>> *>* long baseAddress = unsafe.allocateMemory(size); >>>>> *> >>>>> >* try { >>>>> *>* for (int i = 0; i < count; i++) { >>>>> *>* long address = baseAddress + (i * 8L); >>>>> *> >>>>> >* long expected = i; >>>>> *>* unsafe.putLong(address, expected); >>>>> *> >>>>> >* long actual = unsafe.getLong(address); >>>>> *> >>>>> >* if (expected != actual) { >>>>> *>* throw new AssertionError("Expected: " + expected + ", >>>>> *>* Actual: " + actual); >>>>> *>* } >>>>> *>* } >>>>> *>* } finally { >>>>> *>* unsafe.freeMemory(baseAddress); >>>>> *>* } >>>>> *>* } >>>>> *>* ``` >>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is >>>>> *>* failing constantly. >>>>> *> >>>>> >* - With iteration count 10000, test is passing. >>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV. >>>>> *>* - With iteration count 1000000, test is failing with AssertionError. >>>>> *> >>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or >>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not >>>>> *>* failing at all. >>>>> *> >>>>> >* I tested on platforms: >>>>> *>* - Centos-7/openjdk-1.8.0.45 >>>>> *>* - OSX/oraclejdk-1.8.0.40 >>>>> *>* - OSX/oraclejdk-1.8.0.45 >>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18 >>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67 >>>>> *> >>>>> >* Previous issue comment ( >>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) >>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest >>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because >>>>> *>* both are failing. >>>>> *> >>>>> >* I'm looking forward to hearing from you. >>>>> *> >>>>> >* Thanks, >>>>> *>* -Mehmet Dogan- >>>>> *>* -- >>>>> *> >>>>> >* @mmdogan >>>>> *> >>>> >>>> >>>> -- >>>> Serkan ?ZAL >>>> Remotest Software Engineer >>>> GSM: +90 542 680 39 18 >>>> Twitter: @serkan_ozal >>>> >>> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> Twitter: @serkan_ozal >>> >> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal >> > > -- Serkan ?ZAL Remotest Software Engineer GSM: +90 542 680 39 18 Twitter: @serkan_ozal -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.haupt at oracle.com Mon Jul 13 07:04:08 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Mon, 13 Jul 2015 09:04:08 +0200 Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool In-Reply-To: <559EA01E.20608@oracle.com> References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com> <559EA01E.20608@oracle.com> Message-ID: Hi Vladimir, thank you. Does this require another review? If not, could a sponsor please step up? :-) Best, Michael > Am 09.07.2015 um 18:23 schrieb Vladimir Kozlov : > > Very nice work. Thank you for comments you added and new functionality. > I think it is good for integration. > > Thanks, > Vladimir > > On 7/9/15 7:46 AM, Michael Haupt wrote: >> Dear all, >> >> please review and sponsor this change. >> RFE: https://bugs.openjdk.java.net/browse/JDK-6900757 >> Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00 >> >> This affects the LogCompilation tool sources *only*, with one exception in compileBroker.cpp, where an extension was >> necessary to properly attribute the compiler in the log message. >> >> Tested manually on various compilation logs. >> >> Thanks, >> >> Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | LangTools Team | Nashorn Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerard.ziemski at oracle.com Mon Jul 13 14:17:01 2015 From: gerard.ziemski at oracle.com (gerard ziemski) Date: Mon, 13 Jul 2015 09:17:01 -0500 Subject: RFR (XXS): 8079156: 32 bit Java 9-fastdebug hit assertion in client mode with StackShadowPages flag value from 32 to 50 In-Reply-To: <55A34DF9.5050905@oracle.com> References: <55A03481.2040709@oracle.com> <55A34DF9.5050905@oracle.com> Message-ID: <55A3C85D.1060109@oracle.com> hi David, On 07/13/2015 12:34 AM, David Holmes wrote: > Hi Gerard, > > On 11/07/2015 7:09 AM, gerard ziemski wrote: >> (resending - forgot to include the issue number in the title) >> >> Hi all, >> >> Please review this very small fix: >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8079156 >> webrev: http://cr.openjdk.java.net/~gziemski/8079156_rev0 > > I'd like to hear from the compiler folk about this bug as I'm unclear > on why StackBanging adds to the code buffer, and why this suddenly > seems to be a new problem - did something change? Or have we not > exercised the range of values before? My best educated quess is that the issue was always in there, but we never exercised that path - Dmity's only added his testing framework for exercising the ranges after we got the range/constraints check feature in recently and that's when we found it. > > I'd also like to understand whether the code buffer resizing should be > rounded up (or whether it will be rounded up internally)? e.g. power > of two, multiple of nK for some n etc. I checked the sizes of existing CodeBuffers instr sizes and some values I saw are: 416,536,544,560,568, so I'm not sure what the constraint here is if there really is one (other than divisible by 4, which 112 is as well) > > The 112 value seems odd for 50 pages - is this 2 bytes (words?) per > page plus some fixed overhead? Can it be expressed as a function of > StackShadowPages rather than hardwiring to 112 which only works for > values < 50? Hardcoding the value for 50 pages should be OK here since that's the max value that StackShadowPages can take. Expressing it as some function would not be all that simple - you would need to take in account that the default size is enough for some StackShadowPages (ie.32), then find out the fixed size for the stack banging function. In the end you would end up with some hardcoded values anyhow, so why not make it super simple as we did here? The other way is to calculate things dynamically and I actually did that: my first fix was based on creating a temp CodeBuffer and feeding it only shadow stack banging code to find out the exact size requirement for that code, but I was told that this might confuse some compiler code later that wouldn't expect it. The other unknown was whether the temp code buffer code actually made it into in the cache (is it flushed by the destructor?). I tried to find a way to wipe out the instr section before the destructor, but couldn't find any APIs for doing so. I don't know the answers to those issues, so even though I liked the idea of using a temp buffer to find out precisely how much more memory we used, in the end I settled on the simplest solution that works. Would folks from the compiler like to comment? cheers From zoltan.majo at oracle.com Mon Jul 13 16:48:56 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 13 Jul 2015 18:48:56 +0200 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics Message-ID: <55A3EBF8.2020208@oracle.com> Hi, please review the following patch for JDK-8130832. Bug: https://bugs.openjdk.java.net/browse/JDK-8130832 Problem: Currently, compiler-related tests frequently use the architecture descriptor string (e.g., "aarch64", "x86") to determine if a certain compiler feature is available on the platform where the JVM is executing. If the tested features is a compiler intrinsic, using architecture descriptor strings is an inaccurate way of determining if the intrinsic is available. The reason is that the availability of compiler intrinsics is guided by many factors (e.g., the value of command line flags and instructions available on the platform) and as a result a test might expect an intrinsic to be available when it is in fact not available. Solution: This enhancement proposes adding a new WhiteBox method, is_compiled_intrinsic_available(Executable method, int compileLevel) that returns true if an intrinsic for method 'method' is available at compile level 'compileLevel' (the final API might differ from the proposed API). To test the new API, a new test, hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is added. Moreover, existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity are be updated to use the newly added WhiteBox method. Webrev: - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/ - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/ Testing: - all JPRT tests in the hotspot testset (incl. all tests in hotspot/compiler/intrinsics/mathexact), all tests pass; - all hotspot JTREG tests executed locally on Linux x86_64, all tests pass that pass with an unmodified VM; all tests were executed also with -Xcomp; - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and all tests in hotspot/compiler/intrinsics/mathexact executed on arm64, all tests pass; - manual testing on Linux x86_64 to verify that the functionality of the DisableIntrinsic flag is preserved. Thank you and best regards, Zoltan From anthony.scarpino at oracle.com Mon Jul 13 18:28:27 2015 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Mon, 13 Jul 2015 11:28:27 -0700 Subject: RFR 8131078: typos in ghash cpu message Message-ID: <55A4034B.2090005@oracle.com> Hi, I need a quick review of the typos Andreas Kohn saw. http://cr.openjdk.java.net/~ascarpino/8131078/webrev/ Tony From goetz.lindenmaier at sap.com Mon Jul 13 18:41:20 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 13 Jul 2015 18:41:20 +0000 Subject: RFR 8131078: typos in ghash cpu message In-Reply-To: <55A4034B.2090005@oracle.com> References: <55A4034B.2090005@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap> Hi Tony, I would use singular for 'instruction' as at the other three occurrences of the same string. Besides that: Reviewed. Best regards, Goetz. -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino Sent: Monday, July 13, 2015 8:28 PM To: hotspot-compiler-dev at openjdk.java.net compiler Subject: RFR 8131078: typos in ghash cpu message Hi, I need a quick review of the typos Andreas Kohn saw. http://cr.openjdk.java.net/~ascarpino/8131078/webrev/ Tony From anthony.scarpino at oracle.com Mon Jul 13 18:48:23 2015 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Mon, 13 Jul 2015 11:48:23 -0700 Subject: RFR 8131078: typos in ghash cpu message In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap> References: <55A4034B.2090005@oracle.com> <4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap> Message-ID: <55A407F7.7010908@oracle.com> Sounds reasonable.. I updated the webrev in place.. Tony On 07/13/2015 11:41 AM, Lindenmaier, Goetz wrote: > Hi Tony, > > I would use singular for 'instruction' as at the other three > occurrences of the same string. > > Besides that: Reviewed. > > Best regards, > Goetz. > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino > Sent: Monday, July 13, 2015 8:28 PM > To: hotspot-compiler-dev at openjdk.java.net compiler > Subject: RFR 8131078: typos in ghash cpu message > > Hi, > > I need a quick review of the typos Andreas Kohn saw. > > http://cr.openjdk.java.net/~ascarpino/8131078/webrev/ > > Tony > From goetz.lindenmaier at sap.com Mon Jul 13 18:53:17 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 13 Jul 2015 18:53:17 +0000 Subject: RFR 8131078: typos in ghash cpu message In-Reply-To: <55A407F7.7010908@oracle.com> References: <55A4034B.2090005@oracle.com> <4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap> <55A407F7.7010908@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D005F48@DEWDFEMB12A.global.corp.sap> That's good, thanks! Best regards, Goetz. -----Original Message----- From: Anthony Scarpino [mailto:anthony.scarpino at oracle.com] Sent: Monday, July 13, 2015 8:48 PM To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler' Subject: Re: RFR 8131078: typos in ghash cpu message Sounds reasonable.. I updated the webrev in place.. Tony On 07/13/2015 11:41 AM, Lindenmaier, Goetz wrote: > Hi Tony, > > I would use singular for 'instruction' as at the other three > occurrences of the same string. > > Besides that: Reviewed. > > Best regards, > Goetz. > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino > Sent: Monday, July 13, 2015 8:28 PM > To: hotspot-compiler-dev at openjdk.java.net compiler > Subject: RFR 8131078: typos in ghash cpu message > > Hi, > > I need a quick review of the typos Andreas Kohn saw. > > http://cr.openjdk.java.net/~ascarpino/8131078/webrev/ > > Tony > From vladimir.kozlov at oracle.com Mon Jul 13 18:56:35 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Jul 2015 11:56:35 -0700 Subject: RFR 8131078: typos in ghash cpu message In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D005F48@DEWDFEMB12A.global.corp.sap> References: <55A4034B.2090005@oracle.com> <4295855A5C1DE049A61835A1887419CC2D004F22@DEWDFEMB12A.global.corp.sap> <55A407F7.7010908@oracle.com> <4295855A5C1DE049A61835A1887419CC2D005F48@DEWDFEMB12A.global.corp.sap> Message-ID: <55A409E3.9000006@oracle.com> Looks good. Thanks, Vladimir On 7/13/15 11:53 AM, Lindenmaier, Goetz wrote: > That's good, thanks! > > Best regards, > Goetz. > > -----Original Message----- > From: Anthony Scarpino [mailto:anthony.scarpino at oracle.com] > Sent: Monday, July 13, 2015 8:48 PM > To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler' > Subject: Re: RFR 8131078: typos in ghash cpu message > > Sounds reasonable.. I updated the webrev in place.. > > Tony > > On 07/13/2015 11:41 AM, Lindenmaier, Goetz wrote: >> Hi Tony, >> >> I would use singular for 'instruction' as at the other three >> occurrences of the same string. >> >> Besides that: Reviewed. >> >> Best regards, >> Goetz. >> >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Anthony Scarpino >> Sent: Monday, July 13, 2015 8:28 PM >> To: hotspot-compiler-dev at openjdk.java.net compiler >> Subject: RFR 8131078: typos in ghash cpu message >> >> Hi, >> >> I need a quick review of the typos Andreas Kohn saw. >> >> http://cr.openjdk.java.net/~ascarpino/8131078/webrev/ >> >> Tony >> > From vladimir.kozlov at oracle.com Tue Jul 14 03:41:43 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Jul 2015 20:41:43 -0700 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55A3EBF8.2020208@oracle.com> References: <55A3EBF8.2020208@oracle.com> Message-ID: <55A484F7.4010705@oracle.com> Nice work, Zoltan I am worry about CompilerOracle::has_option_value() call and name() == vmSymbols:: checks. WB code does ThreadInVMfromNative transition and calls is_intrinsic_available_for() in VM state. Compiler thread did that only when calling into CompilerOracle (ciMethod::has_option_value()), otherwise it use CI. But with your change compiler does not make transition - it calls is_intrinsic_available_for() in Native state. It may cause problems. Compiler should not access directly VM data. And, on other hand, to access CI data (cySymbol) we need to set up ciEnv but WB current thread is not compiler thread and the thread is already in VM state anyway. I thought how to solve this problem but nothing simple come up. The ugly solution is to have :has_option_value() call and name() == vmSymbols:: checks outside is_intrinsic_available_for(). I mean to duplicate in Compile::make_vm_intrinsic() and C2Compiler::is_intrinsic_available_for(). But I would like to avoid cloning code. An other approach is Compile::make_vm_intrinsic() can go into VM state for is_intrinsic_available_for() call: bool is_available = false; { VM_ENTRY_MARK; methodHandle mh(THREAD, m->get_Method()); methodHandle ct(THREAD, method()->get_Method()); is_available = is_intrinsic_available_for(mh, ct, is_virtual); } But we usually do that in CI and not in compiler code. Which means we have to move the method to CI but we have Matcher calls. I would ask John's opinion on this problem. You can simplify a little too. Methods intrinsic_does_virtual_dispatch_for() and intrinsic_predicates_needed_for() can get just intrinsic id. Similar for methods in C1. There are several method->method_holder()->name() calls which could be done only once. Use different name for 'method' local to avoid using Compile:: + Method* method = m->get_Method(); + Method* compilation_context = Compile::method()->get_Method(); Please, add comment to new test explaining why you chose crc32 intrinsic. Why? Thanks, Vladimir On 7/13/15 9:48 AM, Zolt?n Maj? wrote: > Hi, > > > please review the following patch for JDK-8130832. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8130832 > > Problem: Currently, compiler-related tests frequently use the architecture descriptor string (e.g., "aarch64", "x86") to > determine if a certain compiler feature is available on the platform where the JVM is executing. If the tested features > is a compiler intrinsic, using architecture descriptor strings is an inaccurate way of determining if the intrinsic is > available. The reason is that the availability of compiler intrinsics is guided by many factors (e.g., the value of > command line flags and instructions available on the platform) and as a result a test might expect an intrinsic to be > available when it is in fact not available. > > > Solution: This enhancement proposes adding a new WhiteBox method, is_compiled_intrinsic_available(Executable method, int > compileLevel) that returns true if an intrinsic for method 'method' is available at compile level 'compileLevel' (the > final API might differ from the proposed API). > > To test the new API, a new test, hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is added. Moreover, > existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity are be updated to use the newly added WhiteBox method. > > Webrev: > - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/ > - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/ > > Testing: > - all JPRT tests in the hotspot testset (incl. all tests in hotspot/compiler/intrinsics/mathexact), all tests pass; > - all hotspot JTREG tests executed locally on Linux x86_64, all tests pass that pass with an unmodified VM; all tests > were executed also with -Xcomp; > - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and all tests in hotspot/compiler/intrinsics/mathexact > executed on arm64, all tests pass; > - manual testing on Linux x86_64 to verify that the functionality of the DisableIntrinsic flag is preserved. > > Thank you and best regards, > > > Zoltan > From vladimir.x.ivanov at oracle.com Tue Jul 14 08:22:50 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 14 Jul 2015 11:22:50 +0300 Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool In-Reply-To: References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com> <559EA01E.20608@oracle.com> Message-ID: <55A4C6DA.70409@oracle.com> Looks good! I'll push it for you. Best regards, Vladimir Ivanov On 7/13/15 10:04 AM, Michael Haupt wrote: > Hi Vladimir, > > thank you. Does this require another review? If not, could a sponsor > please step up? :-) > > Best, > > Michael > >> Am 09.07.2015 um 18:23 schrieb Vladimir Kozlov >> >: >> >> Very nice work. Thank you for comments you added and new functionality. >> I think it is good for integration. >> >> Thanks, >> Vladimir >> >> On 7/9/15 7:46 AM, Michael Haupt wrote: >>> Dear all, >>> >>> please review and sponsor this change. >>> RFE: https://bugs.openjdk.java.net/browse/JDK-6900757 >>> Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00 >>> >>> This affects the LogCompilation tool sources *only*, with one >>> exception in compileBroker.cpp, where an extension was >>> necessary to properly attribute the compiler in the log message. >>> >>> Tested manually on various compilation logs. >>> >>> Thanks, >>> >>> Michael > > > -- > > Oracle > Dr. Michael Haupt | Principal Member of Technical Staff > Phone: +49 331 200 7277 | Fax: +49 331 200 7561 > OracleJava Platform Group | LangTools Team | Nashorn > Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, > Germany > Green Oracle Oracle is committed to > developing practices and products that help protect the environment > > From michael.haupt at oracle.com Tue Jul 14 08:36:56 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Tue, 14 Jul 2015 10:36:56 +0200 Subject: RFR(L): 6900757: minor bug fixes to LogCompilation tool In-Reply-To: <55A4C6DA.70409@oracle.com> References: <2C7A8387-3043-4A31-A178-F86E9C143D26@oracle.com> <559EA01E.20608@oracle.com> <55A4C6DA.70409@oracle.com> Message-ID: <81928C50-8011-4D09-88D5-1CDA59ECD95D@oracle.com> Hi Vladimir, thank you. I'll send the export your way. Best, Michael > Am 14.07.2015 um 10:22 schrieb Vladimir Ivanov : > > Looks good! I'll push it for you. > > Best regards, > Vladimir Ivanov > > On 7/13/15 10:04 AM, Michael Haupt wrote: >> Hi Vladimir, >> >> thank you. Does this require another review? If not, could a sponsor >> please step up? :-) >> >> Best, >> >> Michael >> >>> Am 09.07.2015 um 18:23 schrieb Vladimir Kozlov >>> >: >>> >>> Very nice work. Thank you for comments you added and new functionality. >>> I think it is good for integration. >>> >>> Thanks, >>> Vladimir >>> >>> On 7/9/15 7:46 AM, Michael Haupt wrote: >>>> Dear all, >>>> >>>> please review and sponsor this change. >>>> RFE: https://bugs.openjdk.java.net/browse/JDK-6900757 >>>> Webrev: http://cr.openjdk.java.net/~mhaupt/6900757/webrev.00 >>>> >>>> This affects the LogCompilation tool sources *only*, with one >>>> exception in compileBroker.cpp, where an extension was >>>> necessary to properly attribute the compiler in the log message. >>>> >>>> Tested manually on various compilation logs. >>>> >>>> Thanks, >>>> >>>> Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | LangTools Team | Nashorn Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Tue Jul 14 16:27:14 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 14 Jul 2015 18:27:14 +0200 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55A484F7.4010705@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> Message-ID: <55A53862.7040308@oracle.com> Hi Vladimir, On 07/14/2015 05:41 AM, Vladimir Kozlov wrote: > Nice work, Zoltan thank you! > > I am worry about CompilerOracle::has_option_value() call and name() == > vmSymbols:: checks. > WB code does ThreadInVMfromNative transition and calls > is_intrinsic_available_for() in VM state. > Compiler thread did that only when calling into CompilerOracle > (ciMethod::has_option_value()), otherwise it use CI. But with your > change compiler does not make transition - it calls > is_intrinsic_available_for() in Native state. It may cause problems. > Compiler should not access directly VM data. And, on other hand, to > access CI data (cySymbol) we need to set up ciEnv but WB current > thread is not compiler thread and the thread is already in VM state > anyway. thank you for catching that problem, I did not think of it before. > > I thought how to solve this problem but nothing simple come up. The > ugly solution is to have :has_option_value() call and name() == > vmSymbols:: checks outside is_intrinsic_available_for(). I mean to > duplicate in Compile::make_vm_intrinsic() and > C2Compiler::is_intrinsic_available_for(). But I would like to avoid > cloning code. I agree with you. > > An other approach is Compile::make_vm_intrinsic() can go into VM state > for is_intrinsic_available_for() call: > > bool is_available = false; > { > VM_ENTRY_MARK; > methodHandle mh(THREAD, m->get_Method()); > methodHandle ct(THREAD, method()->get_Method()); > is_available = is_intrinsic_available_for(mh, ct, is_virtual); > } > > > But we usually do that in CI and not in compiler code. Which means we > have to move the method to CI but we have Matcher calls. > > I would ask John's opinion on this problem. I would prefer to keep is_intrinsic_available in compiler code (instead of moving it to CI) because the compiler itself is in a better position than the CI to determine which method it can intrinsify under which conditions. But I'll ask John as well what he thinks and then we can decide then how to proceed. Until then, I've updated Compile::make_vm_intrinsic() to go into VM state as you've suggested (please see webrev below). > > You can simplify a little too. Methods > intrinsic_does_virtual_dispatch_for() and > intrinsic_predicates_needed_for() can get just intrinsic id. Similar > for methods in C1. I changed that for both C1 and C2. > > There are several method->method_holder()->name() calls which could be > done only once. I changed that as well. > > Use different name for 'method' local to avoid using Compile:: > > + Method* method = m->get_Method(); > + Method* compilation_context = Compile::method()->get_Method(); I've updated the variable name. > > Please, add comment to new test explaining why you chose crc32 > intrinsic. Why? I hoped that using a single test method for all compilation levels tested will keep the test simple. The crc32 is intrinsic is available with both C1 and C2 and both intrinsics can be controlled with a single flag, UseCRC32Intrinsics, in both product- and fastdebug builds. I updated the test to clarify that. Here is the updated webrev (for hotspot): http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/ I tested with - the hotspot testset with JPRT - all hotspot/compiler JTREG tests All tests pass. Thank you and best regards, Zoltan > > Thanks, > Vladimir > > On 7/13/15 9:48 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the following patch for JDK-8130832. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832 >> >> Problem: Currently, compiler-related tests frequently use the >> architecture descriptor string (e.g., "aarch64", "x86") to >> determine if a certain compiler feature is available on the platform >> where the JVM is executing. If the tested features >> is a compiler intrinsic, using architecture descriptor strings is an >> inaccurate way of determining if the intrinsic is >> available. The reason is that the availability of compiler intrinsics >> is guided by many factors (e.g., the value of >> command line flags and instructions available on the platform) and as >> a result a test might expect an intrinsic to be >> available when it is in fact not available. >> >> >> Solution: This enhancement proposes adding a new WhiteBox method, >> is_compiled_intrinsic_available(Executable method, int >> compileLevel) that returns true if an intrinsic for method 'method' >> is available at compile level 'compileLevel' (the >> final API might differ from the proposed API). >> >> To test the new API, a new test, >> hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is >> added. Moreover, >> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity >> are be updated to use the newly added WhiteBox method. >> >> Webrev: >> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/ >> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/ >> >> Testing: >> - all JPRT tests in the hotspot testset (incl. all tests in >> hotspot/compiler/intrinsics/mathexact), all tests pass; >> - all hotspot JTREG tests executed locally on Linux x86_64, all tests >> pass that pass with an unmodified VM; all tests >> were executed also with -Xcomp; >> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and >> all tests in hotspot/compiler/intrinsics/mathexact >> executed on arm64, all tests pass; >> - manual testing on Linux x86_64 to verify that the functionality of >> the DisableIntrinsic flag is preserved. >> >> Thank you and best regards, >> >> >> Zoltan >> From vladimir.kozlov at oracle.com Tue Jul 14 17:11:44 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 14 Jul 2015 10:11:44 -0700 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55A53862.7040308@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> Message-ID: <55A542D0.50100@oracle.com> Pass methodHandle parameters to Compile::is_intrinsic_available_for() in C2. CompilerOracle::has_option_value() has methodHandle parameter. And it is safer to use methodHandle. And you forgot 'hg add' for new test - it is not in webrev. Thanks, Vladimir On 7/14/15 9:27 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > On 07/14/2015 05:41 AM, Vladimir Kozlov wrote: >> Nice work, Zoltan > > thank you! > >> >> I am worry about CompilerOracle::has_option_value() call and name() == vmSymbols:: checks. >> WB code does ThreadInVMfromNative transition and calls is_intrinsic_available_for() in VM state. >> Compiler thread did that only when calling into CompilerOracle (ciMethod::has_option_value()), otherwise it use CI. >> But with your change compiler does not make transition - it calls is_intrinsic_available_for() in Native state. It may >> cause problems. Compiler should not access directly VM data. And, on other hand, to access CI data (cySymbol) we need >> to set up ciEnv but WB current thread is not compiler thread and the thread is already in VM state anyway. > > thank you for catching that problem, I did not think of it before. > >> >> I thought how to solve this problem but nothing simple come up. The ugly solution is to have :has_option_value() call >> and name() == vmSymbols:: checks outside is_intrinsic_available_for(). I mean to duplicate in >> Compile::make_vm_intrinsic() and C2Compiler::is_intrinsic_available_for(). But I would like to avoid cloning code. > > I agree with you. > >> >> An other approach is Compile::make_vm_intrinsic() can go into VM state for is_intrinsic_available_for() call: >> >> bool is_available = false; >> { >> VM_ENTRY_MARK; >> methodHandle mh(THREAD, m->get_Method()); >> methodHandle ct(THREAD, method()->get_Method()); >> is_available = is_intrinsic_available_for(mh, ct, is_virtual); >> } >> >> >> But we usually do that in CI and not in compiler code. Which means we have to move the method to CI but we have >> Matcher calls. >> >> I would ask John's opinion on this problem. > > I would prefer to keep is_intrinsic_available in compiler code (instead of moving it to CI) because the compiler itself > is in a better position than the CI to determine which method it can intrinsify under which conditions. But I'll ask > John as well what he thinks and then we can decide then how to proceed. > > Until then, I've updated Compile::make_vm_intrinsic() to go into VM state as you've suggested (please see webrev below). > >> >> You can simplify a little too. Methods intrinsic_does_virtual_dispatch_for() and intrinsic_predicates_needed_for() can >> get just intrinsic id. Similar for methods in C1. > > I changed that for both C1 and C2. > >> >> There are several method->method_holder()->name() calls which could be done only once. > > I changed that as well. > >> >> Use different name for 'method' local to avoid using Compile:: >> >> + Method* method = m->get_Method(); >> + Method* compilation_context = Compile::method()->get_Method(); > > I've updated the variable name. > >> >> Please, add comment to new test explaining why you chose crc32 intrinsic. Why? > > I hoped that using a single test method for all compilation levels tested will keep the test simple. The crc32 is > intrinsic is available with both C1 and C2 and both intrinsics can be controlled with a single flag, UseCRC32Intrinsics, > in both product- and fastdebug builds. I updated the test to clarify that. > > Here is the updated webrev (for hotspot): > http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/ > > I tested with > - the hotspot testset with JPRT > - all hotspot/compiler JTREG tests > > All tests pass. > > Thank you and best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 7/13/15 9:48 AM, Zolt?n Maj? wrote: >>> Hi, >>> >>> >>> please review the following patch for JDK-8130832. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832 >>> >>> Problem: Currently, compiler-related tests frequently use the architecture descriptor string (e.g., "aarch64", "x86") to >>> determine if a certain compiler feature is available on the platform where the JVM is executing. If the tested features >>> is a compiler intrinsic, using architecture descriptor strings is an inaccurate way of determining if the intrinsic is >>> available. The reason is that the availability of compiler intrinsics is guided by many factors (e.g., the value of >>> command line flags and instructions available on the platform) and as a result a test might expect an intrinsic to be >>> available when it is in fact not available. >>> >>> >>> Solution: This enhancement proposes adding a new WhiteBox method, is_compiled_intrinsic_available(Executable method, int >>> compileLevel) that returns true if an intrinsic for method 'method' is available at compile level 'compileLevel' (the >>> final API might differ from the proposed API). >>> >>> To test the new API, a new test, hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is added. Moreover, >>> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity are be updated to use the newly added WhiteBox >>> method. >>> >>> Webrev: >>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/ >>> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/ >>> >>> Testing: >>> - all JPRT tests in the hotspot testset (incl. all tests in hotspot/compiler/intrinsics/mathexact), all tests pass; >>> - all hotspot JTREG tests executed locally on Linux x86_64, all tests pass that pass with an unmodified VM; all tests >>> were executed also with -Xcomp; >>> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and all tests in hotspot/compiler/intrinsics/mathexact >>> executed on arm64, all tests pass; >>> - manual testing on Linux x86_64 to verify that the functionality of the DisableIntrinsic flag is preserved. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> > From thomas.schatzl at oracle.com Wed Jul 15 14:19:42 2015 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 15 Jul 2015 16:19:42 +0200 Subject: RFR (XXS): 8131344: Missing klass.inline.hpp include in compiler files Message-ID: <1436969982.2282.41.camel@oracle.com> Hi all, while working on changes for PLAB handling improvements I found that with these changes compilation would not be successful any more due to some compiler code missing some includes to oops/Klass.inline.hpp because of the use of the Klass::encode_klass() method. This change adds the additional includes to these files. I intend to push this through the hs-rt tree since all other changes will be pushed through that tree, but it changes compiler files only, so I have it out for review here. CR: https://bugs.openjdk.java.net/browse/JDK-8131344 Webrev: http://cr.openjdk.java.net/~tschatzl/8131344/webrev/ Testing: jprt Thanks, Thomas From zoltan.majo at oracle.com Wed Jul 15 14:54:19 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 15 Jul 2015 16:54:19 +0200 Subject: [9] RFR(S): 8131326: Enable CheckIntrinsics in all types of builds Message-ID: <55A6741B.7050607@oracle.com> Hi, please review the following patch for JDK-8131326. Bug: https://bugs.openjdk.java.net/browse/JDK-8131326 Problem: The CheckIntrinsics flag added by JDK-8076112 is currently enabled only in debug builds. As a result, users of product builds might easier oversee potential mismatches between VM-level and classfile-level intrinsics. Solution: This enhancement enables the flag in all types of builds (incl. product builds). The check for orphan methods in src/share/vm/classfile/classFileParser.cpp is also controlled by the CheckIntrinsics flag. To limit the impact of that potentially expensive check on our product builds, this enhancment proposes to include the check only in debug builds. Webrev: http://cr.openjdk.java.net/~zmajo/8131326/webrev.00/ Testing: JPRT run using the hotspot testset; all tests pass. Thank you and best regards, Zoltan From vladimir.kozlov at oracle.com Wed Jul 15 15:36:37 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2015 08:36:37 -0700 Subject: [9] RFR(S): 8131326: Enable CheckIntrinsics in all types of builds In-Reply-To: <55A6741B.7050607@oracle.com> References: <55A6741B.7050607@oracle.com> Message-ID: <55A67E05.5080400@oracle.com> Looks good. Thanks, Vladimir On 7/15/15 7:54 AM, Zolt?n Maj? wrote: > Hi, > > > please review the following patch for JDK-8131326. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8131326 > > Problem: The CheckIntrinsics flag added by JDK-8076112 is currently enabled only in debug builds. As a result, users of > product builds might easier oversee potential mismatches between VM-level and classfile-level intrinsics. > > Solution: This enhancement enables the flag in all types of builds (incl. product builds). The check for orphan methods > in src/share/vm/classfile/classFileParser.cpp is also controlled by the CheckIntrinsics flag. To limit the impact of > that potentially expensive check on our product builds, this enhancment proposes to include the check only in debug builds. > > Webrev: http://cr.openjdk.java.net/~zmajo/8131326/webrev.00/ > > Testing: JPRT run using the hotspot testset; all tests pass. > > Thank you and best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Wed Jul 15 15:39:29 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2015 08:39:29 -0700 Subject: RFR (XXS): 8131344: Missing klass.inline.hpp include in compiler files In-Reply-To: <1436969982.2282.41.camel@oracle.com> References: <1436969982.2282.41.camel@oracle.com> Message-ID: <55A67EB1.7030203@oracle.com> Looks fine to me. Thanks, Vladimir On 7/15/15 7:19 AM, Thomas Schatzl wrote: > Hi all, > > while working on changes for PLAB handling improvements I found that > with these changes compilation would not be successful any more due to > some compiler code missing some includes to oops/Klass.inline.hpp > because of the use of the Klass::encode_klass() method. > > This change adds the additional includes to these files. > > I intend to push this through the hs-rt tree since all other changes > will be pushed through that tree, but it changes compiler files only, so > I have it out for review here. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8131344 > > Webrev: > http://cr.openjdk.java.net/~tschatzl/8131344/webrev/ > > Testing: > jprt > > Thanks, > Thomas > > From edward.nevill at gmail.com Wed Jul 15 16:18:17 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 15 Jul 2015 17:18:17 +0100 Subject: RFR: 8131358: aarch64: test compiler/loopopts/superword/ProdRed_Float.java fails when run with debug VM Message-ID: <1436977097.31596.7.camel@mylittlepony.linaroharston> Hi, http://cr.openjdk.java.net/~enevill/8131358/webrev/ fixes a typo in the match rule in vsub2f which causes an assertion failure in jtreg/hotspot test compiler/loopopts/superword/ProdRed_Float.java Basically, vsub2f was matching AddVF instead of SubVF Thanks for the review, Ed. From aph at redhat.com Wed Jul 15 16:36:45 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 15 Jul 2015 17:36:45 +0100 Subject: [aarch64-port-dev ] RFR: 8131358: aarch64: test compiler/loopopts/superword/ProdRed_Float.java fails when run with debug VM In-Reply-To: <1436977097.31596.7.camel@mylittlepony.linaroharston> References: <1436977097.31596.7.camel@mylittlepony.linaroharston> Message-ID: <55A68C1D.9050006@redhat.com> On 07/15/2015 05:18 PM, Edward Nevill wrote: > http://cr.openjdk.java.net/~enevill/8131358/webrev/ > > fixes a typo in the match rule in vsub2f which causes an assertion failure in jtreg/hotspot test compiler/loopopts/superword/ProdRed_Float.java > > Basically, vsub2f was matching AddVF instead of SubVF Yes, thanks. Andrew. From vladimir.kozlov at oracle.com Wed Jul 15 16:47:54 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2015 09:47:54 -0700 Subject: RFR: 8131358: aarch64: test compiler/loopopts/superword/ProdRed_Float.java fails when run with debug VM In-Reply-To: <1436977097.31596.7.camel@mylittlepony.linaroharston> References: <1436977097.31596.7.camel@mylittlepony.linaroharston> Message-ID: <55A68EBA.9080708@oracle.com> Looks good. Thanks, Vladimir PS: for small changes like this you need only one review. On 7/15/15 9:18 AM, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8131358/webrev/ > > fixes a typo in the match rule in vsub2f which causes an assertion failure in jtreg/hotspot test compiler/loopopts/superword/ProdRed_Float.java > > Basically, vsub2f was matching AddVF instead of SubVF > > Thanks for the review, > Ed. > > From zoltan.majo at oracle.com Wed Jul 15 18:04:53 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 15 Jul 2015 20:04:53 +0200 Subject: [9] RFR(S): 8131326: Enable CheckIntrinsics in all types of builds In-Reply-To: <55A67E05.5080400@oracle.com> References: <55A6741B.7050607@oracle.com> <55A67E05.5080400@oracle.com> Message-ID: <55A6A0C5.5050708@oracle.com> Thank you, Vladimir, for the review! Best regards, Zoltan On 07/15/2015 05:36 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/15/15 7:54 AM, Zolt?n Maj? wrote: >> Hi, >> >> >> please review the following patch for JDK-8131326. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8131326 >> >> Problem: The CheckIntrinsics flag added by JDK-8076112 is currently >> enabled only in debug builds. As a result, users of >> product builds might easier oversee potential mismatches between >> VM-level and classfile-level intrinsics. >> >> Solution: This enhancement enables the flag in all types of builds >> (incl. product builds). The check for orphan methods >> in src/share/vm/classfile/classFileParser.cpp is also controlled by >> the CheckIntrinsics flag. To limit the impact of >> that potentially expensive check on our product builds, this >> enhancment proposes to include the check only in debug builds. >> >> Webrev: http://cr.openjdk.java.net/~zmajo/8131326/webrev.00/ >> >> Testing: JPRT run using the hotspot testset; all tests pass. >> >> Thank you and best regards, >> >> >> Zoltan >> From zoltan.majo at oracle.com Wed Jul 15 18:15:28 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 15 Jul 2015 20:15:28 +0200 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55A542D0.50100@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> Message-ID: <55A6A340.3050204@oracle.com> Hi Vladimir, On 07/14/2015 07:11 PM, Vladimir Kozlov wrote: > Pass methodHandle parameters to Compile::is_intrinsic_available_for() > in C2. > CompilerOracle::has_option_value() has methodHandle parameter. > > And it is safer to use methodHandle. OK, I updated the method. > > And you forgot 'hg add' for new test - it is not in webrev. Sorry for that. The test is included into the newest webrev: - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.02/ - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.02/ I ran all JPRT tests from the hotspot testset, all pass. Thank you and best regards, Zoltan > > Thanks, > Vladimir > > On 7/14/15 9:27 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> On 07/14/2015 05:41 AM, Vladimir Kozlov wrote: >>> Nice work, Zoltan >> >> thank you! >> >>> >>> I am worry about CompilerOracle::has_option_value() call and name() >>> == vmSymbols:: checks. >>> WB code does ThreadInVMfromNative transition and calls >>> is_intrinsic_available_for() in VM state. >>> Compiler thread did that only when calling into CompilerOracle >>> (ciMethod::has_option_value()), otherwise it use CI. >>> But with your change compiler does not make transition - it calls >>> is_intrinsic_available_for() in Native state. It may >>> cause problems. Compiler should not access directly VM data. And, on >>> other hand, to access CI data (cySymbol) we need >>> to set up ciEnv but WB current thread is not compiler thread and the >>> thread is already in VM state anyway. >> >> thank you for catching that problem, I did not think of it before. >> >>> >>> I thought how to solve this problem but nothing simple come up. The >>> ugly solution is to have :has_option_value() call >>> and name() == vmSymbols:: checks outside >>> is_intrinsic_available_for(). I mean to duplicate in >>> Compile::make_vm_intrinsic() and >>> C2Compiler::is_intrinsic_available_for(). But I would like to avoid >>> cloning code. >> >> I agree with you. >> >>> >>> An other approach is Compile::make_vm_intrinsic() can go into VM >>> state for is_intrinsic_available_for() call: >>> >>> bool is_available = false; >>> { >>> VM_ENTRY_MARK; >>> methodHandle mh(THREAD, m->get_Method()); >>> methodHandle ct(THREAD, method()->get_Method()); >>> is_available = is_intrinsic_available_for(mh, ct, is_virtual); >>> } >>> >>> >>> But we usually do that in CI and not in compiler code. Which means >>> we have to move the method to CI but we have >>> Matcher calls. >>> >>> I would ask John's opinion on this problem. >> >> I would prefer to keep is_intrinsic_available in compiler code >> (instead of moving it to CI) because the compiler itself >> is in a better position than the CI to determine which method it can >> intrinsify under which conditions. But I'll ask >> John as well what he thinks and then we can decide then how to proceed. >> >> Until then, I've updated Compile::make_vm_intrinsic() to go into VM >> state as you've suggested (please see webrev below). >> >>> >>> You can simplify a little too. Methods >>> intrinsic_does_virtual_dispatch_for() and >>> intrinsic_predicates_needed_for() can >>> get just intrinsic id. Similar for methods in C1. >> >> I changed that for both C1 and C2. >> >>> >>> There are several method->method_holder()->name() calls which could >>> be done only once. >> >> I changed that as well. >> >>> >>> Use different name for 'method' local to avoid using Compile:: >>> >>> + Method* method = m->get_Method(); >>> + Method* compilation_context = Compile::method()->get_Method(); >> >> I've updated the variable name. >> >>> >>> Please, add comment to new test explaining why you chose crc32 >>> intrinsic. Why? >> >> I hoped that using a single test method for all compilation levels >> tested will keep the test simple. The crc32 is >> intrinsic is available with both C1 and C2 and both intrinsics can be >> controlled with a single flag, UseCRC32Intrinsics, >> in both product- and fastdebug builds. I updated the test to clarify >> that. >> >> Here is the updated webrev (for hotspot): >> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/ >> >> I tested with >> - the hotspot testset with JPRT >> - all hotspot/compiler JTREG tests >> >> All tests pass. >> >> Thank you and best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 7/13/15 9:48 AM, Zolt?n Maj? wrote: >>>> Hi, >>>> >>>> >>>> please review the following patch for JDK-8130832. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832 >>>> >>>> Problem: Currently, compiler-related tests frequently use the >>>> architecture descriptor string (e.g., "aarch64", "x86") to >>>> determine if a certain compiler feature is available on the >>>> platform where the JVM is executing. If the tested features >>>> is a compiler intrinsic, using architecture descriptor strings is >>>> an inaccurate way of determining if the intrinsic is >>>> available. The reason is that the availability of compiler >>>> intrinsics is guided by many factors (e.g., the value of >>>> command line flags and instructions available on the platform) and >>>> as a result a test might expect an intrinsic to be >>>> available when it is in fact not available. >>>> >>>> >>>> Solution: This enhancement proposes adding a new WhiteBox method, >>>> is_compiled_intrinsic_available(Executable method, int >>>> compileLevel) that returns true if an intrinsic for method 'method' >>>> is available at compile level 'compileLevel' (the >>>> final API might differ from the proposed API). >>>> >>>> To test the new API, a new test, >>>> hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is >>>> added. Moreover, >>>> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity >>>> are be updated to use the newly added WhiteBox >>>> method. >>>> >>>> Webrev: >>>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/ >>>> - hotspot: >>>> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/ >>>> >>>> Testing: >>>> - all JPRT tests in the hotspot testset (incl. all tests in >>>> hotspot/compiler/intrinsics/mathexact), all tests pass; >>>> - all hotspot JTREG tests executed locally on Linux x86_64, all >>>> tests pass that pass with an unmodified VM; all tests >>>> were executed also with -Xcomp; >>>> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and >>>> all tests in hotspot/compiler/intrinsics/mathexact >>>> executed on arm64, all tests pass; >>>> - manual testing on Linux x86_64 to verify that the functionality >>>> of the DisableIntrinsic flag is preserved. >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zoltan >>>> >> From vladimir.kozlov at oracle.com Wed Jul 15 18:44:59 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2015 11:44:59 -0700 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55A6A340.3050204@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> Message-ID: <55A6AA2B.5030005@oracle.com> Looks good to me. We still need John's opinion about state transition. Thanks, Vladimir On 7/15/15 11:15 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > On 07/14/2015 07:11 PM, Vladimir Kozlov wrote: >> Pass methodHandle parameters to Compile::is_intrinsic_available_for() >> in C2. >> CompilerOracle::has_option_value() has methodHandle parameter. >> >> And it is safer to use methodHandle. > > OK, I updated the method. > >> >> And you forgot 'hg add' for new test - it is not in webrev. > > Sorry for that. The test is included into the newest webrev: > - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.02/ > - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.02/ > > I ran all JPRT tests from the hotspot testset, all pass. > > Thank you and best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 7/14/15 9:27 AM, Zolt?n Maj? wrote: >>> Hi Vladimir, >>> >>> >>> On 07/14/2015 05:41 AM, Vladimir Kozlov wrote: >>>> Nice work, Zoltan >>> >>> thank you! >>> >>>> >>>> I am worry about CompilerOracle::has_option_value() call and name() >>>> == vmSymbols:: checks. >>>> WB code does ThreadInVMfromNative transition and calls >>>> is_intrinsic_available_for() in VM state. >>>> Compiler thread did that only when calling into CompilerOracle >>>> (ciMethod::has_option_value()), otherwise it use CI. >>>> But with your change compiler does not make transition - it calls >>>> is_intrinsic_available_for() in Native state. It may >>>> cause problems. Compiler should not access directly VM data. And, on >>>> other hand, to access CI data (cySymbol) we need >>>> to set up ciEnv but WB current thread is not compiler thread and the >>>> thread is already in VM state anyway. >>> >>> thank you for catching that problem, I did not think of it before. >>> >>>> >>>> I thought how to solve this problem but nothing simple come up. The >>>> ugly solution is to have :has_option_value() call >>>> and name() == vmSymbols:: checks outside >>>> is_intrinsic_available_for(). I mean to duplicate in >>>> Compile::make_vm_intrinsic() and >>>> C2Compiler::is_intrinsic_available_for(). But I would like to avoid >>>> cloning code. >>> >>> I agree with you. >>> >>>> >>>> An other approach is Compile::make_vm_intrinsic() can go into VM >>>> state for is_intrinsic_available_for() call: >>>> >>>> bool is_available = false; >>>> { >>>> VM_ENTRY_MARK; >>>> methodHandle mh(THREAD, m->get_Method()); >>>> methodHandle ct(THREAD, method()->get_Method()); >>>> is_available = is_intrinsic_available_for(mh, ct, is_virtual); >>>> } >>>> >>>> >>>> But we usually do that in CI and not in compiler code. Which means >>>> we have to move the method to CI but we have >>>> Matcher calls. >>>> >>>> I would ask John's opinion on this problem. >>> >>> I would prefer to keep is_intrinsic_available in compiler code >>> (instead of moving it to CI) because the compiler itself >>> is in a better position than the CI to determine which method it can >>> intrinsify under which conditions. But I'll ask >>> John as well what he thinks and then we can decide then how to proceed. >>> >>> Until then, I've updated Compile::make_vm_intrinsic() to go into VM >>> state as you've suggested (please see webrev below). >>> >>>> >>>> You can simplify a little too. Methods >>>> intrinsic_does_virtual_dispatch_for() and >>>> intrinsic_predicates_needed_for() can >>>> get just intrinsic id. Similar for methods in C1. >>> >>> I changed that for both C1 and C2. >>> >>>> >>>> There are several method->method_holder()->name() calls which could >>>> be done only once. >>> >>> I changed that as well. >>> >>>> >>>> Use different name for 'method' local to avoid using Compile:: >>>> >>>> + Method* method = m->get_Method(); >>>> + Method* compilation_context = Compile::method()->get_Method(); >>> >>> I've updated the variable name. >>> >>>> >>>> Please, add comment to new test explaining why you chose crc32 >>>> intrinsic. Why? >>> >>> I hoped that using a single test method for all compilation levels >>> tested will keep the test simple. The crc32 is >>> intrinsic is available with both C1 and C2 and both intrinsics can be >>> controlled with a single flag, UseCRC32Intrinsics, >>> in both product- and fastdebug builds. I updated the test to clarify >>> that. >>> >>> Here is the updated webrev (for hotspot): >>> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.01/ >>> >>> I tested with >>> - the hotspot testset with JPRT >>> - all hotspot/compiler JTREG tests >>> >>> All tests pass. >>> >>> Thank you and best regards, >>> >>> >>> Zoltan >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/13/15 9:48 AM, Zolt?n Maj? wrote: >>>>> Hi, >>>>> >>>>> >>>>> please review the following patch for JDK-8130832. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8130832 >>>>> >>>>> Problem: Currently, compiler-related tests frequently use the >>>>> architecture descriptor string (e.g., "aarch64", "x86") to >>>>> determine if a certain compiler feature is available on the >>>>> platform where the JVM is executing. If the tested features >>>>> is a compiler intrinsic, using architecture descriptor strings is >>>>> an inaccurate way of determining if the intrinsic is >>>>> available. The reason is that the availability of compiler >>>>> intrinsics is guided by many factors (e.g., the value of >>>>> command line flags and instructions available on the platform) and >>>>> as a result a test might expect an intrinsic to be >>>>> available when it is in fact not available. >>>>> >>>>> >>>>> Solution: This enhancement proposes adding a new WhiteBox method, >>>>> is_compiled_intrinsic_available(Executable method, int >>>>> compileLevel) that returns true if an intrinsic for method 'method' >>>>> is available at compile level 'compileLevel' (the >>>>> final API might differ from the proposed API). >>>>> >>>>> To test the new API, a new test, >>>>> hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java, is >>>>> added. Moreover, >>>>> existing tests in hotspot/test/compiler/intrinsics/mathexact/sanity >>>>> are be updated to use the newly added WhiteBox >>>>> method. >>>>> >>>>> Webrev: >>>>> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.00/ >>>>> - hotspot: >>>>> http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.00/ >>>>> >>>>> Testing: >>>>> - all JPRT tests in the hotspot testset (incl. all tests in >>>>> hotspot/compiler/intrinsics/mathexact), all tests pass; >>>>> - all hotspot JTREG tests executed locally on Linux x86_64, all >>>>> tests pass that pass with an unmodified VM; all tests >>>>> were executed also with -Xcomp; >>>>> - hotspot/test/compiler/intrinsics/IntrinsicAvailableTest.java and >>>>> all tests in hotspot/compiler/intrinsics/mathexact >>>>> executed on arm64, all tests pass; >>>>> - manual testing on Linux x86_64 to verify that the functionality >>>>> of the DisableIntrinsic flag is preserved. >>>>> >>>>> Thank you and best regards, >>>>> >>>>> >>>>> Zoltan >>>>> >>> > From john.r.rose at oracle.com Wed Jul 15 19:11:51 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 15 Jul 2015 12:11:51 -0700 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55A6AA2B.5030005@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> Message-ID: <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> On Jul 15, 2015, at 11:44 AM, Vladimir Kozlov wrote: > > Looks good to me. We still need John's opinion about state transition. Just sent a 1-1 reply; here it is FTR. On Jul 14, 2015, at 9:42 AM, Zolt?n Maj? > wrote: > > So far, I tried Vladimir's solution of going into VM state in Compile::make_vm_intrinsic() [2] and it works well. > > What we could also do is > - (1) list in the switch statement in is_intrinsic_available_for() the intrinsic ids of all methods of interest (similarly to the way we do it for C1); that would eliminate the need to make checks based on a method's holder; > - (2) for the DisableIntrinsic checks (that need to call CompilerOracle::has_option_value()we could define a separate method that is called directly from a WhiteBox context and through the CI from make_vm_intrinsic. This is going to be a good cleanup. But it is hard to follow, so please regard my comments as tentative. Some comments: I think the term "_for" is a noise word as deprecated in: https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-NamingNaming I agree with the tendency to factor stuff (when possible) away from the guts of the compilers. Suggest Compile::intrinsic_does_virtual_dispatch_for be moved to vmIntrinsics::does_virtual_dispatch. It's really part of the vmIntrinsics contract. Same for can_trap (or whatever it is called). If it can't be wedged into vmSymbols.cpp, then at least consider abstractCompiler.cpp. Similar comment about is_intrinsic_available[_for]. Because of the dependency on the compiler tier, it has to be virtual, of course. Suggest a static vmIntrinsics::is_disabled_by_flags, to check for compiler-independent disabling logic. Method::is_intrinsic_disabled is a good thought, but I would suggest making it a static method on vmIntrinsic, because the Method* pointer is just a wrapper around the intrinsic_id. Stripping the Method* would let you avoid a VM_ENTRY_MARK in ciMethod::* if the context argument if null (true for C1?). The "flag soup" logic in C2 is frustrating, and may defeat an attempt to factor the -XX flag checking into vmIntrinsics, but I encourage you to try. The Matcher calls can be layered on separately, in C2-specific code. The vm_version checks can go in the same C1/C2-specific layer as the C2 matcher checks. (Or perhaps factored into abstractCompiler, but that may be overkill.) Regarding your original question: I would prefer that the VM_ENTRY logic be confined to the CI, but there is no functional reason the compiler itself can't do a native-to-VM transition. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From martijnverburg at gmail.com Thu Jul 16 08:36:19 2015 From: martijnverburg at gmail.com (Martijn Verburg) Date: Thu, 16 Jul 2015 09:36:19 +0100 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: Message-ID: Hi all, Is there a reviewer who can take a look at the webrev? I guess you could combine the 2nd else / nested if, but I think it reads OK this way. Cheers, Martijn On 12 July 2015 at 13:07, Serkan ?zal wrote: > Hi Martjin, > > Thanks for your interest and comment for making this thread a little bit > more hot. > > > From my previous message ( > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html > ): > > I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: > > > void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { > > if (OptimizeUnsafes) do_UnsafeRawOp(x); > > tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", > > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > > } > > > void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { > > if (OptimizeUnsafes) do_UnsafeRawOp(x); > > tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", > > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > > } > > > > So I run the test by calculating address as: > > - *"int * long"* (int is index and long is 8l) > > - *"long * long"* (the first long is index and the second long is 8l) > > - *"int * int"* (the first int is index and the second int is 8) > > Here are the logs: > > > *int * long:* > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 > > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 > > *long * long:* > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 > > Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 > > *int * int:* > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 > > As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. > > One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to > > same *"base"* and *"index"* instructions. This means that address is scaled one more time because there should be only one scale. > > > > With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction. > > Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory. > > > On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg > wrote: > >> Non reviewer here, but I'd add to the comment *why* you don't want to >> scale again. >> >> Cheers, >> Martijn >> >> On 12 July 2015 at 11:29, Serkan ?zal wrote: >> >>> Hi all, >>> >>> I have created a webrev for review including the patch and shared for >>> public access from here: >>> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html >>> >>> Regards. >>> >>> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal >>> wrote: >>> >>>> Hi, >>>> >>>> I have added some logs to show that problem is caused by double scaling >>>> of offset (index) >>>> >>>> Here is my updated (log messages added) reproducer code: >>>> >>>> >>>> int count = 100000; >>>> long size = count * 8L; >>>> long baseAddress = unsafe.allocateMemory(size); >>>> System.out.println("Start address: " + Long.toHexString(baseAddress) + >>>> ", End address: " + Long.toHexString(baseAddress + >>>> size)); >>>> >>>> for (int i = 0; i < count; i++) { >>>> long address = baseAddress + (i * 8L); >>>> System.out.println( >>>> "Normal: " + Long.toHexString(address) + ", " + >>>> "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * >>>> 8L))); >>>> long expected = i; >>>> unsafe.putLong(address, expected); >>>> unsafe.getLong(address); >>>> } >>>> >>>> >>>> After sometime it crashes as >>>> >>>> >>>> ... >>>> Current thread (0x0000000002068800): JavaThread "main" >>>> [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)] >>>> >>>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 >>>> ... >>>> ... >>>> >>>> >>>> And here is output of the execution until crash: >>>> >>>> Start address: 58bbcfa0, End address: 58c804a0 >>>> Normal: 58bbcfa0, If double scaled: 58bbcfa0 >>>> Normal: 58bbcfa8, If double scaled: 58bbcfe0 >>>> Normal: 58bbcfb0, If double scaled: 58bbd020 >>>> ... >>>> ... >>>> Normal: 58c517b0, If double scaled: 59061020 >>>> >>>> >>>> As seen from the logs and crash dump, double scaled version of target >>>> address (*If double scaled: 59061020*) is the same with the >>>> problematic address (*siginfo: ExceptionCode=0xc0000005, reading >>>> address 0x0000000059061020*) that causes to crash while accessing it. >>>> >>>> So I think, it is obvious that the crash is caused by wrong >>>> optimization of index value since index is scaled two times (for >>>> *Unsafe::put* and *Unsafe::get*) instead of only one time. Then double >>>> scaled index points to invalid memory address. >>>> >>>> Regards. >>>> >>>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I had dived into the issue with JDK-HotSpot commits and >>>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >>>>> >>>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >>>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >>>>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>>>> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>>> } >>>>> >>>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >>>>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>>>> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>>> } >>>>> >>>>> >>>>> So I run the test by calculating address as >>>>> - *"int * long"* (int is index and long is 8l) >>>>> - *"long * long"* (the first long is index and the second long is 8l) >>>>> - *"int * int"* (the first int is index and the second int is 8) >>>>> >>>>> Here are the logs: >>>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 >>>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 >>>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 >>>>> >>>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. >>>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to >>>>> same *"base"* and *"index"* instructions. >>>>> This means that address is scaled one more time because there should be only one scale. >>>>> >>>>> >>>>> When I debugged the non-problematic run (*"int * int"*), >>>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always. >>>>> So there is no scaling. >>>>> static bool match_index_and_scale(Instruction* instr, >>>>> Instruction** index, >>>>> int* log2_scale) { >>>>> ... >>>>> >>>>> ArithmeticOp* arith = instr->as_ArithmeticOp(); >>>>> if (arith != NULL) { >>>>> ... >>>>> } >>>>> >>>>> return false; >>>>> } >>>>> >>>>> >>>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: >>>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >>>>> Instruction* base = NULL; >>>>> Instruction* index = NULL; >>>>> int log2_scale; >>>>> >>>>> if (match(x, &base, &index, &log2_scale)) { >>>>> x->set_base(base); >>>>> x->set_index(index); // The fix attempt here // ///////////////////////////// >>>>> if (index != NULL) { >>>>> if (index->is_pinned()) { >>>>> log2_scale = 0; >>>>> } else { >>>>> if (log2_scale != 0) { >>>>> index->pin(); >>>>> } >>>>> } >>>>> } // ///////////////////////////// >>>>> x->set_log2_scale(log2_scale); >>>>> if (PrintUnsafeOptimization) { >>>>> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", >>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>>> } >>>>> } >>>>> } >>>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction >>>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. >>>>> >>>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs: >>>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 >>>>> >>>>> I am not sure my fix attempt is a really fix or maybe there are better fixes. >>>>> >>>>> Regards. >>>>> >>>>> -- >>>>> >>>>> Serkan ?ZAL >>>>> >>>>> >>>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is >>>>>> converted to >>>>>> long address = baseAddress + (i * 8) >>>>>> test passes. Only difference is next long pointer is calculated using >>>>>> integer 8 instead of long 8. >>>>>> ``` >>>>>> for (int i = 0; i < count; i++) { >>>>>> long address = baseAddress + (i * 8); // <--- here, integer 8 instead >>>>>> of long 8 >>>>>> long expected = i; >>>>>> unsafe.putLong(address, expected); >>>>>> long actual = unsafe.getLong(address); >>>>>> if (expected != actual) { >>>>>> throw new AssertionError("Expected: " + expected + ", Actual: " + >>>>>> actual); >>>>>> } >>>>>> } >>>>>> ``` >>>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: >>>>>> >* Hi all, >>>>>> *> >>>>>> >* While I was testing my app using java 8, I encountered the previously >>>>>> *>* reported sun.misc.Unsafe issue. >>>>>> *> >>>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 >>>>>> *> >>>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >>>>>> *> >>>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce". But >>>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and >>>>>> *>* "1.9.0-ea-b67". >>>>>> *> >>>>>> >* Test is very simple: >>>>>> *> >>>>>> >* ``` >>>>>> *>* public static void main(String[] args) throws Exception { >>>>>> *>* Unsafe unsafe = findUnsafe(); >>>>>> *>* // 10000 pass >>>>>> *>* // 100000 jvm crash >>>>>> *>* // 1000000 fail >>>>>> *>* int count = 100000; >>>>>> *>* long size = count * 8L; >>>>>> *>* long baseAddress = unsafe.allocateMemory(size); >>>>>> *> >>>>>> >* try { >>>>>> *>* for (int i = 0; i < count; i++) { >>>>>> *>* long address = baseAddress + (i * 8L); >>>>>> *> >>>>>> >* long expected = i; >>>>>> *>* unsafe.putLong(address, expected); >>>>>> *> >>>>>> >* long actual = unsafe.getLong(address); >>>>>> *> >>>>>> >* if (expected != actual) { >>>>>> *>* throw new AssertionError("Expected: " + expected + ", >>>>>> *>* Actual: " + actual); >>>>>> *>* } >>>>>> *>* } >>>>>> *>* } finally { >>>>>> *>* unsafe.freeMemory(baseAddress); >>>>>> *>* } >>>>>> *>* } >>>>>> *>* ``` >>>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is >>>>>> *>* failing constantly. >>>>>> *> >>>>>> >* - With iteration count 10000, test is passing. >>>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV. >>>>>> *>* - With iteration count 1000000, test is failing with AssertionError. >>>>>> *> >>>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or >>>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not >>>>>> *>* failing at all. >>>>>> *> >>>>>> >* I tested on platforms: >>>>>> *>* - Centos-7/openjdk-1.8.0.45 >>>>>> *>* - OSX/oraclejdk-1.8.0.40 >>>>>> *>* - OSX/oraclejdk-1.8.0.45 >>>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18 >>>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67 >>>>>> *> >>>>>> >* Previous issue comment ( >>>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) >>>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest >>>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because >>>>>> *>* both are failing. >>>>>> *> >>>>>> >* I'm looking forward to hearing from you. >>>>>> *> >>>>>> >* Thanks, >>>>>> *>* -Mehmet Dogan- >>>>>> *>* -- >>>>>> *> >>>>>> >* @mmdogan >>>>>> *> >>>>> >>>>> >>>>> -- >>>>> Serkan ?ZAL >>>>> Remotest Software Engineer >>>>> GSM: +90 542 680 39 18 >>>>> Twitter: @serkan_ozal >>>>> >>>> >>>> >>>> >>>> -- >>>> Serkan ?ZAL >>>> Remotest Software Engineer >>>> GSM: +90 542 680 39 18 >>>> Twitter: @serkan_ozal >>>> >>> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> Twitter: @serkan_ozal >>> >> >> > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Thu Jul 16 08:41:29 2015 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 16 Jul 2015 10:41:29 +0200 Subject: RFR (XXS): 8131344: Missing klass.inline.hpp include in compiler files In-Reply-To: <55A67EB1.7030203@oracle.com> References: <1436969982.2282.41.camel@oracle.com> <55A67EB1.7030203@oracle.com> Message-ID: <1437036089.2361.9.camel@oracle.com> Hi Vladimir, On Wed, 2015-07-15 at 08:39 -0700, Vladimir Kozlov wrote: > Looks fine to me. > > Thanks, > Vladimir thanks for the quick review. Thanks, Thomas From edward.nevill at gmail.com Thu Jul 16 08:46:14 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 16 Jul 2015 09:46:14 +0100 Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets Message-ID: <1437036374.31596.18.camel@mylittlepony.linaroharston> Hi, http://cr.openjdk.java.net/~enevill/8131362/webrev.01 Provides support for large spill offsets in C2 on aarch64. In general the stack offset is limited to (1<<12) * sizeof(type). This is generally sufficient. However for 128 bit vectors the limit can be as little as +256 bytes. This is because 128 bit vectors may not be 128 bit aligned, therefore they have to use a different form of load/store which only has a 9 bit signed offset instead of the 12 bit unsigned scaled offset. This webrev fixes this by allowing stack offsets up to 1<<24 in all cases. Tested before and after with jtreg hotspot & langtools. In both cases (before and after) the results were:- Hotspot: passed: 876; failed: 3; error: 7 Langtools: Test results: passed: 3,246; error: 2 I have also tested to ensure that code sequence for large offsets is correct by artificially reducing the limit at which the large code sequence is triggered. The spill calculation is now done in spill_address in macroAssembler_aarch64.cpp and this is called in all cases. Address MacroAssembler::spill_address(int size, int offset) { assert(offset >= 0, "spill to negative address?"); // Offset reachable ? // Not aligned - 9 bits signed offset // Aligned - 12 bits unsigned offset shifted Register base = sp; if ((offset & (size-1)) && offset >= (1<<8)) { add(rscratch2, base, offset & ((1<<12)-1)); base = rscratch2; offset &= ~((1<<12)-1); } if (offset >= (1<<12) * size) { add(rscratch2, base, offset & (((1<<12)-1)<<12)); base = rscratch2; offset &= ~(((1<<12)-1)<<12); } return Address(base, offset); } This can generate up to two additional instructions in the most degenerate cases (an unaligned offset larger than (1<<12) * size). Thanks for the review, Ed. From edward.nevill at gmail.com Thu Jul 16 09:25:30 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 16 Jul 2015 10:25:30 +0100 Subject: RFR: 8131483 : aarch64: illegal stlxr instructions Message-ID: <1437038730.31596.26.camel@mylittlepony.linaroharston> Hi, http://cr.openjdk.java.net/~enevill/8131483/webrev Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW. The instruction in question is stlxr(rscratch1, end, rscratch1) According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does. The relevant section from the ARM ARM is:- STLXR For a description of this instruction and the encoding, see STLXR on page C6-702. CONSTRAINED UNPREDICTABLE behavior If s == t || (pair && s == t2), then one of the following behaviors can occur: The instruction is UNDEFINED. The instruction executes as a NOP. The instruction performs the store to the specified address, but the value stored is UNKNOWN. If s == n && n != 31 then one of the following behaviors can occur: The instruction is UNDEFINED. The instruction executes as a NOP. The instruction performs the store to an UNKNOWN address. Thanks for the review, Ed. From aph at redhat.com Thu Jul 16 09:45:28 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 16 Jul 2015 10:45:28 +0100 Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <1437036374.31596.18.camel@mylittlepony.linaroharston> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> Message-ID: <55A77D38.7090106@redhat.com> On 16/07/15 09:46, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8131362/webrev.01 > > Provides support for large spill offsets in C2 on aarch64. Thanks. This is a clear improvement over what we have now. A few minor things. ~((1<<12)-1) is just -1<<12 I don't like the way that spill_address silently clobbers rscratch2 and callers of spill_address clobber rscratch1. This makes me rather nervous. We have had recent bugs which were caused by macros assuming they had exclusive use of scratch registers. Please consider passing a destination register down to spill_address and spill_copy128. I think if you do that the register usage will be much clearer to the reader. Andrew. From aph at redhat.com Thu Jul 16 09:49:37 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 16 Jul 2015 10:49:37 +0100 Subject: RFR: 8131483 : aarch64: illegal stlxr instructions In-Reply-To: <1437038730.31596.26.camel@mylittlepony.linaroharston> References: <1437038730.31596.26.camel@mylittlepony.linaroharston> Message-ID: <55A77E31.5070003@redhat.com> On 16/07/15 10:25, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8131483/webrev > > Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW. > > The instruction in question is > > stlxr(rscratch1, end, rscratch1) > > According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does. OK. Please assert this in Assembler::stlxr. Andrew. From adinn at redhat.com Thu Jul 16 09:50:59 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 16 Jul 2015 10:50:59 +0100 Subject: RFR: 8131483 : aarch64: illegal stlxr instructions In-Reply-To: <1437038730.31596.26.camel@mylittlepony.linaroharston> References: <1437038730.31596.26.camel@mylittlepony.linaroharston> Message-ID: <55A77E83.7030401@redhat.com> The fix looks ok to me (I checked places where this code gets called and it is safe to clobber rscratch2). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) On 16/07/15 10:25, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8131483/webrev > > Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW. > > The instruction in question is > > stlxr(rscratch1, end, rscratch1) > > According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does. The relevant section from the ARM ARM is:- > > STLXR > > For a description of this instruction and the encoding, see STLXR on page C6-702. > > CONSTRAINED UNPREDICTABLE behavior > > If s == t || (pair && s == t2), then one of the following behaviors can occur: > > The instruction is UNDEFINED. > The instruction executes as a NOP. > The instruction performs the store to the specified address, but the value stored is UNKNOWN. > > If s == n && n != 31 then one of the following behaviors can occur: > > The instruction is UNDEFINED. > The instruction executes as a NOP. > The instruction performs the store to an UNKNOWN address. > > Thanks for the review, > Ed. From adinn at redhat.com Thu Jul 16 10:03:03 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 16 Jul 2015 11:03:03 +0100 Subject: RFR: 8131483 : aarch64: illegal stlxr instructions In-Reply-To: <1437038730.31596.26.camel@mylittlepony.linaroharston> References: <1437038730.31596.26.camel@mylittlepony.linaroharston> Message-ID: <55A78157.2080101@redhat.com> The fix looks ok to me. I checked places where this code gets called and rscratch2 is safe to use. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) On 16/07/15 10:25, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8131483/webrev > > Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW. > > The instruction in question is > > stlxr(rscratch1, end, rscratch1) > > According to the ARM ARM this is unpredictable and an implementation may treat this as undefined which our partners HW does. The relevant section from the ARM ARM is:- > > STLXR > > For a description of this instruction and the encoding, see STLXR on page C6-702. > > CONSTRAINED UNPREDICTABLE behavior > > If s == t || (pair && s == t2), then one of the following behaviors can occur: > > The instruction is UNDEFINED. > The instruction executes as a NOP. > The instruction performs the store to the specified address, but the value stored is UNKNOWN. > > If s == n && n != 31 then one of the following behaviors can occur: > > The instruction is UNDEFINED. > The instruction executes as a NOP. > The instruction performs the store to an UNKNOWN address. > > Thanks for the review, > Ed. From goetz.lindenmaier at sap.com Thu Jul 16 12:25:40 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 16 Jul 2015 12:25:40 +0000 Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. Message-ID: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap> Hi, A new warning kills the build with gcc 4.2. Could I please get a review and a sponsor for this tiny change? http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/ Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From edward.nevill at gmail.com Thu Jul 16 13:52:55 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 16 Jul 2015 14:52:55 +0100 Subject: RFR: 8131483 : aarch64: illegal stlxr instructions In-Reply-To: <55A77E31.5070003@redhat.com> References: <1437038730.31596.26.camel@mylittlepony.linaroharston> <55A77E31.5070003@redhat.com> Message-ID: <1437054775.18306.4.camel@mylittlepony.linaroharston> On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote: > On 16/07/15 10:25, Edward Nevill wrote: > > > > Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW. > > > > The instruction in question is > > > > stlxr(rscratch1, end, rscratch1) > > > > > > Please assert this in Assembler::stlxr. OK. New webrev @ http://cr.openjdk.java.net/~enevill/8131483/webrev.01 Thanks, Ed. From aph at redhat.com Thu Jul 16 13:58:58 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 16 Jul 2015 14:58:58 +0100 Subject: RFR: 8131483 : aarch64: illegal stlxr instructions In-Reply-To: <1437054775.18306.4.camel@mylittlepony.linaroharston> References: <1437038730.31596.26.camel@mylittlepony.linaroharston> <55A77E31.5070003@redhat.com> <1437054775.18306.4.camel@mylittlepony.linaroharston> Message-ID: <55A7B8A2.3020402@redhat.com> On 07/16/2015 02:52 PM, Edward Nevill wrote: > On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote: >> On 16/07/15 10:25, Edward Nevill wrote: >>> >>> Fixes an issue reported by one of our partners where the aarch64 port is generating illegal instructions on their HW. >>> >>> The instruction in question is >>> >>> stlxr(rscratch1, end, rscratch1) >>> >>> >> >> Please assert this in Assembler::stlxr. > > OK. New webrev @ > > http://cr.openjdk.java.net/~enevill/8131483/webrev.01 + assert(Rs != Rn, "unpredicatable instruction"); \ "unpredicatable"? "unpredictable," surely? :-) The fix is ok with that spelling change. Thanks, Andrew. From edward.nevill at gmail.com Thu Jul 16 14:23:24 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 16 Jul 2015 15:23:24 +0100 Subject: [aarch64-port-dev ] RFR: 8131483 : aarch64: illegal stlxr instructions In-Reply-To: <55A7B8A2.3020402@redhat.com> References: <1437038730.31596.26.camel@mylittlepony.linaroharston> <55A77E31.5070003@redhat.com> <1437054775.18306.4.camel@mylittlepony.linaroharston> <55A7B8A2.3020402@redhat.com> Message-ID: <1437056604.18306.7.camel@mylittlepony.linaroharston> On Thu, 2015-07-16 at 14:58 +0100, Andrew Haley wrote: > On 07/16/2015 02:52 PM, Edward Nevill wrote: > > On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote: > >> On 16/07/15 10:25, Edward Nevill wrote: > >>> > + assert(Rs != Rn, "unpredicatable instruction"); \ > > "unpredicatable"? "unpredictable," surely? :-) Oh, what a predictament! New webrev http://cr.openjdk.java.net/~enevill/8131483/webrev.02 No need to respond if you are happy with this, but I do need a formal *R*eviewer please, Thanks, Ed. From koutheir at gmail.com Thu Jul 16 14:28:43 2015 From: koutheir at gmail.com (Koutheir Attouchi) Date: Thu, 16 Jul 2015 16:28:43 +0200 Subject: Ahead-Of-Time (AOT) compiler in Hotspot JVM Message-ID: Hi, As far as I know, the Hotspot JVM does not have an Ahead-Of-Time (AOT) compiler support. Are there any plans to implement this feature? Thank you. -- Koutheir Attouchi. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Jul 16 14:55:20 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 16 Jul 2015 07:55:20 -0700 Subject: RFR: 8131483 : aarch64: illegal stlxr instructions In-Reply-To: <1437056604.18306.7.camel@mylittlepony.linaroharston> References: <1437038730.31596.26.camel@mylittlepony.linaroharston> <55A77E31.5070003@redhat.com> <1437054775.18306.4.camel@mylittlepony.linaroharston> <55A7B8A2.3020402@redhat.com> <1437056604.18306.7.camel@mylittlepony.linaroharston> Message-ID: <55A7C5D8.5090907@oracle.com> Okay. Thanks, Vladimir On 7/16/15 7:23 AM, Edward Nevill wrote: > On Thu, 2015-07-16 at 14:58 +0100, Andrew Haley wrote: >> On 07/16/2015 02:52 PM, Edward Nevill wrote: >>> On Thu, 2015-07-16 at 10:49 +0100, Andrew Haley wrote: >>>> On 16/07/15 10:25, Edward Nevill wrote: >>>>> >> + assert(Rs != Rn, "unpredicatable instruction"); \ >> >> "unpredicatable"? "unpredictable," surely? :-) > > Oh, what a predictament! > > New webrev > > http://cr.openjdk.java.net/~enevill/8131483/webrev.02 > > No need to respond if you are happy with this, but I do need a formal > *R*eviewer please, > > Thanks, > Ed. > > From edward.nevill at gmail.com Thu Jul 16 15:13:14 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 16 Jul 2015 16:13:14 +0100 Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <55A77D38.7090106@redhat.com> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A77D38.7090106@redhat.com> Message-ID: <1437059594.18306.20.camel@mylittlepony.linaroharston> On Thu, 2015-07-16 at 10:45 +0100, Andrew Haley wrote: > On 16/07/15 09:46, Edward Nevill wrote: > > Hi, > > > > > > Provides support for large spill offsets in C2 on aarch64. > A few minor things. > > ~((1<<12)-1) is just -1<<12 Fixed. > > I don't like the way that spill_address silently clobbers rscratch2 > and callers of spill_address clobber rscratch1. This makes me rather > nervous. We have had recent bugs which were caused by macros assuming > they had exclusive use of scratch registers. Please consider passing > a destination register down to spill_address and spill_copy128. I > think if you do that the register usage will be much clearer to the > reader. OK. So what I have done is changed the declaration to have a 'tmp' Register which defaults to rscratch2 as follows:- Address spill_address(int size, int offset, Register tmp=rscratch2); That way people can see from the header that it needs a tmp which defaults to rscratch2. Similarly for spill_copy128 we now have void spill_copy128(int src_offset, int dst_offset, Register tmp1=rscratch1, Register tmp2=rscratch2) Is this OK? Or do you want to force people to name the tmp registers on every call. New webrev. http://cr.openjdk.java.net/~enevill/8131362/webrev.02 Regards, Ed. From aph at redhat.com Thu Jul 16 15:46:30 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 16 Jul 2015 16:46:30 +0100 Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <1437059594.18306.20.camel@mylittlepony.linaroharston> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A77D38.7090106@redhat.com> <1437059594.18306.20.camel@mylittlepony.linaroharston> Message-ID: <55A7D1D6.60508@redhat.com> On 07/16/2015 04:13 PM, Edward Nevill wrote: > OK. So what I have done is changed the declaration to have a 'tmp' Register which defaults to rscratch2 as follows:- > > Address spill_address(int size, int offset, Register tmp=rscratch2); > > That way people can see from the header that it needs a tmp which > defaults to rscratch2. > > Similarly for spill_copy128 we now have > > void spill_copy128(int src_offset, int dst_offset, > Register tmp1=rscratch1, Register tmp2=rscratch2) > > Is this OK? Or do you want to force people to name the tmp registers > on every call. OK, I can live with that. Andrew. From vladimir.kozlov at oracle.com Thu Jul 16 18:35:28 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 16 Jul 2015 11:35:28 -0700 Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap> Message-ID: <55A7F970.3020303@oracle.com> Hi Goetz, Looks good. Do you see also next problem?: hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant is too large for ?long? type hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant is too large for ?long? type Can you fix and test it too? Use CONST64: #define CPU_AVX512VL CONST64(0x100000000) On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote: > Hi, > > A new warning kills the build with gcc 4.2. > > Could I please get a review and a sponsor for this tiny change? > > http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/ > > Best regards, > > Goetz. > From vladimir.kozlov at oracle.com Thu Jul 16 18:49:05 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 16 Jul 2015 11:49:05 -0700 Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <1437036374.31596.18.camel@mylittlepony.linaroharston> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> Message-ID: <55A7FCA1.6010008@oracle.com> Hi Ed, Should it be +8 instead of +4? Or these offsets are not in bytes?: + unspill(rscratch1, true, src_offset); + spill(rscratch1, true, dst_offset); + unspill(rscratch1, true, src_offset+4); + spill(rscratch1, true, dst_offset+4); The size of each move is 8 bytes since you specified is64 = true. > Hotspot: passed: 876; failed: 3; error: 7 > Langtools: Test results: passed: 3,246; error: 2 Can you add -ignore:quiet to jtreg commands so that tests which are marked @ignore are not treated as error: http://openjdk.java.net/jtreg/command-help.html Thanks, Vladimir On 7/16/15 1:46 AM, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8131362/webrev.01 > > Provides support for large spill offsets in C2 on aarch64. > > In general the stack offset is limited to (1<<12) * sizeof(type). This is generally sufficient. > > However for 128 bit vectors the limit can be as little as +256 bytes. This is because 128 bit vectors may not be 128 bit aligned, therefore they have to use a different form of load/store which only has a 9 bit signed offset instead of the 12 bit unsigned scaled offset. > > This webrev fixes this by allowing stack offsets up to 1<<24 in all cases. > > Tested before and after with jtreg hotspot & langtools. In both cases (before and after) the results were:- > > Hotspot: passed: 876; failed: 3; error: 7 > Langtools: Test results: passed: 3,246; error: 2 > > I have also tested to ensure that code sequence for large offsets is correct by artificially reducing the limit at which the large code sequence is triggered. > > The spill calculation is now done in spill_address in macroAssembler_aarch64.cpp and this is called in all cases. > > Address MacroAssembler::spill_address(int size, int offset) > { > assert(offset >= 0, "spill to negative address?"); > // Offset reachable ? > // Not aligned - 9 bits signed offset > // Aligned - 12 bits unsigned offset shifted > Register base = sp; > if ((offset & (size-1)) && offset >= (1<<8)) { > add(rscratch2, base, offset & ((1<<12)-1)); > base = rscratch2; > offset &= ~((1<<12)-1); > } > > if (offset >= (1<<12) * size) { > add(rscratch2, base, offset & (((1<<12)-1)<<12)); > base = rscratch2; > offset &= ~(((1<<12)-1)<<12); > } > > return Address(base, offset); > } > > This can generate up to two additional instructions in the most degenerate cases (an unaligned offset larger than (1<<12) * size). > > Thanks for the review, > Ed. > > From edward.nevill at gmail.com Fri Jul 17 08:52:39 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 17 Jul 2015 09:52:39 +0100 Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <55A7FCA1.6010008@oracle.com> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> Message-ID: <1437123159.29276.16.camel@mint> On Thu, 2015-07-16 at 11:49 -0700, Vladimir Kozlov wrote: > Hi Ed, > > Should it be +8 instead of +4? Or these offsets are not in bytes?: > > + unspill(rscratch1, true, src_offset); > + spill(rscratch1, true, dst_offset); > + unspill(rscratch1, true, src_offset+4); > + spill(rscratch1, true, dst_offset+4); Ouch! Good catch. New webrev. http://cr.openjdk.java.net/~enevill/8131362/webrev.03/ > > Hotspot: passed: 876; failed: 3; error: 7 > > Langtools: Test results: passed: 3,246; error: 2 > > Can you add -ignore:quiet to jtreg commands so that tests which are > marked @ignore are not treated as error: Yes. I am using the -ignore:quiet option. Here is the command I am using for the hotspot run. /home/ed/images/jdk-spill2/bin/java -jar lib/jtreg.jar -nr -conc:48 -timeout:3 -othervm -jdk:/home/ed/images/jdk-spill2 -v1 -a -ignore:quiet /home/ed/jdk9-dev/hs-comp/hotspot/test The hotspot failures and errors are FAILED: compiler/intrinsics/classcast/NullCheckDroppingsTest.java FAILED: compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java FAILED: serviceability/sa/jmap-hashcode/Test8028623.java Error: native_sanity/JniVersion.java Error: runtime/classFileParserBug/AnnotationTag.java Error: runtime/handlerInTry/LoadHandlerInTry.java Error: runtime/jni/8033445/DefaultMethods.java Error: runtime/jni/8025979/UninitializedStrings.java Error: runtime/jni/ToStringInInterfaceTest/ToStringTest.java Error: runtime/stackMapCheck/StackMapCheck.java and the langtools errors Error: tools/javac/annotations/typeAnnotations/classfile/T8010015.java Error: tools/javac/lambda/LambdaParserTest.java In both cases the set of errors/failures is the same before and after the patch. So yes, it is not ideal that we are seeing these. The only ideal number for a regression suite is 0. However it is a separate issue and is on my list of things to look at. All the best, Ed. From goetz.lindenmaier at sap.com Fri Jul 17 08:54:47 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 17 Jul 2015 08:54:47 +0000 Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. In-Reply-To: <55A7F970.3020303@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap> <55A7F970.3020303@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap> Hi Vladimir, Thanks for the review! I can't reproduce the problem in vm_version_x86.hpp. I tried on a row of 32 and 64 bit machines with different compilers ... I fixed it anyways: http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.02/ Best regards, Goetz -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 16. Juli 2015 20:35 To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler' Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. Hi Goetz, Looks good. Do you see also next problem?: hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant is too large for 'long' type hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant is too large for 'long' type Can you fix and test it too? Use CONST64: #define CPU_AVX512VL CONST64(0x100000000) On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote: > Hi, > > A new warning kills the build with gcc 4.2. > > Could I please get a review and a sponsor for this tiny change? > > http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/ > > Best regards, > > Goetz. > From aph at redhat.com Fri Jul 17 09:01:17 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 17 Jul 2015 10:01:17 +0100 Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <1437123159.29276.16.camel@mint> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint> Message-ID: <55A8C45D.2070009@redhat.com> On 17/07/15 09:52, Edward Nevill wrote: >> > Should it be +8 instead of +4? Or these offsets are not in bytes?: >> > >> > + unspill(rscratch1, true, src_offset); >> > + spill(rscratch1, true, dst_offset); >> > + unspill(rscratch1, true, src_offset+4); >> > + spill(rscratch1, true, dst_offset+4); > Ouch! Good catch. > > New webrev. > > http://cr.openjdk.java.net/~enevill/8131362/webrev.03/ I'm a bit more concerned that this did not fail in testing. I guess there were no tests at all for stack-stack spills. Andrew. From edward.nevill at gmail.com Fri Jul 17 09:12:39 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Fri, 17 Jul 2015 10:12:39 +0100 Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <55A8C45D.2070009@redhat.com> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint> <55A8C45D.2070009@redhat.com> Message-ID: <1437124359.29276.18.camel@mint> On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote: > On 17/07/15 09:52, Edward Nevill wrote: > >> > Should it be +8 instead of +4? Or these offsets are not in bytes?: > >> > > >> > + unspill(rscratch1, true, src_offset); > >> > + spill(rscratch1, true, dst_offset); > >> > + unspill(rscratch1, true, src_offset+4); > >> > + spill(rscratch1, true, dst_offset+4); > > Ouch! Good catch. > > > > New webrev. > > > > http://cr.openjdk.java.net/~enevill/8131362/webrev.03/ > > I'm a bit more concerned that this did not fail in testing. I guess > there were no tests at all for stack-stack spills. Correct. And it would have to be a 128 bit vector stack-stack spill with an offset >= 512. How would you even provoke such a thing. Ed. From aph at redhat.com Fri Jul 17 09:29:25 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 17 Jul 2015 10:29:25 +0100 Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <1437124359.29276.18.camel@mint> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint> <55A8C45D.2070009@redhat.com> <1437124359.29276.18.camel@mint> Message-ID: <55A8CAF5.5060601@redhat.com> On 17/07/15 10:12, Edward Nevill wrote: > On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote: >> On 17/07/15 09:52, Edward Nevill wrote: >>>>> Should it be +8 instead of +4? Or these offsets are not in bytes?: >>>>> >>>>> + unspill(rscratch1, true, src_offset); >>>>> + spill(rscratch1, true, dst_offset); >>>>> + unspill(rscratch1, true, src_offset+4); >>>>> + spill(rscratch1, true, dst_offset+4); >>> Ouch! Good catch. >>> >>> New webrev. >>> >>> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/ >> >> I'm a bit more concerned that this did not fail in testing. I guess >> there were no tests at all for stack-stack spills. > > Correct. And it would have to be a 128 bit vector stack-stack spill with > an offset >= 512. How would you even provoke such a thing. With a highly-vectorizable test case with a zillion temporaries, I guess. But I don't know why HotSpot would ever do stack-stack spills. The very idea of stack-stack spilling makes no sense to me. Andrew. From aph at redhat.com Fri Jul 17 09:43:24 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 17 Jul 2015 10:43:24 +0100 Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <55A8CAF5.5060601@redhat.com> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint> <55A8C45D.2070009@redhat.com> <1437124359.29276.18.camel@mint> <55A8CAF5.5060601@redhat.com> Message-ID: <55A8CE3C.1050203@redhat.com> On 17/07/15 10:29, Andrew Haley wrote: > On 17/07/15 10:12, Edward Nevill wrote: >> On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote: >>> On 17/07/15 09:52, Edward Nevill wrote: >>>>>> Should it be +8 instead of +4? Or these offsets are not in bytes?: >>>>>> >>>>>> + unspill(rscratch1, true, src_offset); >>>>>> + spill(rscratch1, true, dst_offset); >>>>>> + unspill(rscratch1, true, src_offset+4); >>>>>> + spill(rscratch1, true, dst_offset+4); >>>> Ouch! Good catch. >>>> >>>> New webrev. >>>> >>>> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/ >>> >>> I'm a bit more concerned that this did not fail in testing. I guess >>> there were no tests at all for stack-stack spills. >> >> Correct. And it would have to be a 128 bit vector stack-stack spill with >> an offset >= 512. How would you even provoke such a thing. > > With a highly-vectorizable test case with a zillion temporaries, I guess. Thinking some more: I think I'd add some special code to test it all once, then delete the special code. If that's what it takes, there isn't much choice. Andrew. From aleksey.shipilev at oracle.com Fri Jul 17 13:29:32 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 17 Jul 2015 16:29:32 +0300 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere Message-ID: <55A9033C.2030302@oracle.com> Hi there, C1 is not very good at inlining and intrisifying methods, and hence the call performance is important there. One nit that we can see in the generated code on x86 is that C1 uses the single-byte nops, even for long nop strides. This improvement fixes that: https://bugs.openjdk.java.net/browse/JDK-8131682 http://cr.openjdk.java.net/~shade/8131682/webrev.00/ Testing: - JPRT -testset hotspot on open platforms - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 (I understand the symmetric change is going to be needed in closed parts, but let's polish the open part first). Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Fri Jul 17 15:21:04 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2015 08:21:04 -0700 Subject: RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <1437123159.29276.16.camel@mint> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint> Message-ID: <55A91D60.5090107@oracle.com> Looks good. So it is real failure in testing. Thank you for letting know. Vladimir On 7/17/15 1:52 AM, Edward Nevill wrote: > On Thu, 2015-07-16 at 11:49 -0700, Vladimir Kozlov wrote: >> Hi Ed, >> >> Should it be +8 instead of +4? Or these offsets are not in bytes?: >> >> + unspill(rscratch1, true, src_offset); >> + spill(rscratch1, true, dst_offset); >> + unspill(rscratch1, true, src_offset+4); >> + spill(rscratch1, true, dst_offset+4); > > Ouch! Good catch. > > New webrev. > > http://cr.openjdk.java.net/~enevill/8131362/webrev.03/ > >> > Hotspot: passed: 876; failed: 3; error: 7 >> > Langtools: Test results: passed: 3,246; error: 2 >> >> Can you add -ignore:quiet to jtreg commands so that tests which are >> marked @ignore are not treated as error: > > Yes. I am using the -ignore:quiet option. Here is the command I am using for the hotspot run. > > /home/ed/images/jdk-spill2/bin/java -jar lib/jtreg.jar -nr -conc:48 -timeout:3 -othervm -jdk:/home/ed/images/jdk-spill2 -v1 -a -ignore:quiet /home/ed/jdk9-dev/hs-comp/hotspot/test > > The hotspot failures and errors are > > FAILED: compiler/intrinsics/classcast/NullCheckDroppingsTest.java > FAILED: compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java > FAILED: serviceability/sa/jmap-hashcode/Test8028623.java > Error: native_sanity/JniVersion.java > Error: runtime/classFileParserBug/AnnotationTag.java > Error: runtime/handlerInTry/LoadHandlerInTry.java > Error: runtime/jni/8033445/DefaultMethods.java > Error: runtime/jni/8025979/UninitializedStrings.java > Error: runtime/jni/ToStringInInterfaceTest/ToStringTest.java > Error: runtime/stackMapCheck/StackMapCheck.java > > and the langtools errors > > Error: tools/javac/annotations/typeAnnotations/classfile/T8010015.java > Error: tools/javac/lambda/LambdaParserTest.java > > In both cases the set of errors/failures is the same before and after the patch. > > So yes, it is not ideal that we are seeing these. The only ideal number for a regression suite is 0. However it is a separate issue and is on my list of things to look at. > > All the best, > Ed. > > From vladimir.kozlov at oracle.com Fri Jul 17 15:23:59 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2015 08:23:59 -0700 Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap> References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap> <55A7F970.3020303@oracle.com> <4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap> Message-ID: <55A91E0F.2030303@oracle.com> Looks good. Thank you for fixing second issue. I will push it. Vladimir On 7/17/15 1:54 AM, Lindenmaier, Goetz wrote: > Hi Vladimir, > > Thanks for the review! > > I can't reproduce the problem in vm_version_x86.hpp. I tried on a row of > 32 and 64 bit machines with different compilers ... > I fixed it anyways: > http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.02/ > > Best regards, > Goetz > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 16. Juli 2015 20:35 > To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler' > Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. > > Hi Goetz, > > Looks good. > > Do you see also next problem?: > > hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant > is too large for 'long' type > hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant > is too large for 'long' type > > Can you fix and test it too? Use CONST64: > > #define CPU_AVX512VL CONST64(0x100000000) > > > On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> A new warning kills the build with gcc 4.2. >> >> Could I please get a review and a sponsor for this tiny change? >> >> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/ >> >> Best regards, >> >> Goetz. >> From goetz.lindenmaier at sap.com Fri Jul 17 15:24:40 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 17 Jul 2015 15:24:40 +0000 Subject: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. In-Reply-To: <55A91E0F.2030303@oracle.com> References: <4295855A5C1DE049A61835A1887419CC2D00691B@DEWDFEMB12A.global.corp.sap> <55A7F970.3020303@oracle.com> <4295855A5C1DE049A61835A1887419CC2D006BD9@DEWDFEMB12A.global.corp.sap> <55A91E0F.2030303@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D006E52@DEWDFEMB12A.global.corp.sap> Thanks! Best regards, Goetz. -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Freitag, 17. Juli 2015 17:24 To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler' Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. Looks good. Thank you for fixing second issue. I will push it. Vladimir On 7/17/15 1:54 AM, Lindenmaier, Goetz wrote: > Hi Vladimir, > > Thanks for the review! > > I can't reproduce the problem in vm_version_x86.hpp. I tried on a row of > 32 and 64 bit machines with different compilers ... > I fixed it anyways: > http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.02/ > > Best regards, > Goetz > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 16. Juli 2015 20:35 > To: Lindenmaier, Goetz; 'hotspot-compiler-dev at openjdk.java.net compiler' > Subject: Re: RFR(XS): 8131676: Fix warning 'negative int converted to unsigned' after 8085932. > > Hi Goetz, > > Looks good. > > Do you see also next problem?: > > hotspot/src/cpu/x86/vm/vm_version_x86.hpp:485: error: integer constant > is too large for 'long' type > hotspot/src/cpu/x86/vm/vm_version_x86.hpp:705: error: integer constant > is too large for 'long' type > > Can you fix and test it too? Use CONST64: > > #define CPU_AVX512VL CONST64(0x100000000) > > > On 7/16/15 5:25 AM, Lindenmaier, Goetz wrote: >> Hi, >> >> A new warning kills the build with gcc 4.2. >> >> Could I please get a review and a sponsor for this tiny change? >> >> http://cr.openjdk.java.net/~goetz/webrevs/8131676-negUInt/webrev.01/ >> >> Best regards, >> >> Goetz. >> From john.r.rose at oracle.com Fri Jul 17 19:31:04 2015 From: john.r.rose at oracle.com (John Rose) Date: Fri, 17 Jul 2015 12:31:04 -0700 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: Message-ID: <46530DA3-EF02-434A-9642-7046CF92729B@oracle.com> Thanks Serkan and Martijn for reporting and analyzing this. We had a very similar bug reported internally, and we just integrated a fix: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7 Would you mind checking if it fixes your problem also? Best wishes, ? John On Jul 12, 2015, at 5:07 AM, Serkan ?zal wrote: > > Hi Martjin, > > Thanks for your interest and comment for making this thread a little bit more hot. > > > From my previous message (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html ): > > I added some additional logs to "vm/c1/c1_Canonicalizer.cpp": > > > void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { > if (OptimizeUnsafes) do_UnsafeRawOp(x); > tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > > > void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { > if (OptimizeUnsafes) do_UnsafeRawOp(x); > tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > > > > So I run the test by calculating address as: > - "int * long" (int is index and long is 8l) > - "long * long" (the first long is index and the second long is 8l) > - "int * int" (the first int is index and the second int is 8) > > Here are the logs: > > > int * long: > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 > > long * long: > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 > > int * int: > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 > > As you can see, at the problematic runs ("int * long" and "long * long") there are two scaling. > One for "Unsafe.put" and the other one is for "Unsafe.get" and these instructions points to > same "base" and "index" instructions. This means that address is scaled one more time because there should be only one scale. > > > With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction. > > Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html ) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory. > > On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg > wrote: > Non reviewer here, but I'd add to the comment *why* you don't want to scale again. > > Cheers, > Martijn > > On 12 July 2015 at 11:29, Serkan ?zal > wrote: > Hi all, > > I have created a webrev for review including the patch and shared for public access from here: https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html > > Regards. > > On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal > wrote: > Hi, > > I have added some logs to show that problem is caused by double scaling of offset (index) > > Here is my updated (log messages added) reproducer code: > > > int count = 100000; > long size = count * 8L; > long baseAddress = unsafe.allocateMemory(size); > System.out.println("Start address: " + Long.toHexString(baseAddress) + > ", End address: " + Long.toHexString(baseAddress + size)); > > for (int i = 0; i < count; i++) { > long address = baseAddress + (i * 8L); > System.out.println( > "Normal: " + Long.toHexString(address) + ", " + > "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * 8L))); > long expected = i; > unsafe.putLong(address, expected); > unsafe.getLong(address); > } > > > After sometime it crashes as > > > ... > Current thread (0x0000000002068800): JavaThread "main" [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)] > > siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 > ... > ... > > > And here is output of the execution until crash: > > Start address: 58bbcfa0, End address: 58c804a0 > Normal: 58bbcfa0, If double scaled: 58bbcfa0 > Normal: 58bbcfa8, If double scaled: 58bbcfe0 > Normal: 58bbcfb0, If double scaled: 58bbd020 > ... > ... > Normal: 58c517b0, If double scaled: 59061020 > > > As seen from the logs and crash dump, double scaled version of target address (If double scaled: 59061020) is the same with the problematic address (siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020) that causes to crash while accessing it. > > So I think, it is obvious that the crash is caused by wrong optimization of index value since index is scaled two times (for Unsafe::put and Unsafe::get) instead of only one time. Then double scaled index points to invalid memory address. > > Regards. > > On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal > wrote: > Hi all, > > I had dived into the issue with JDK-HotSpot commits and > the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a > > Then I added some additional logs to "vm/c1/c1_Canonicalizer.cpp": > > void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { > if (OptimizeUnsafes) do_UnsafeRawOp(x); > tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > > void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { > if (OptimizeUnsafes) do_UnsafeRawOp(x); > tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > > > > So I run the test by calculating address as > - "int * long" (int is index and long is 8l) > - "long * long" (the first long is index and the second long is 8l) > - "int * int" (the first int is index and the second int is 8) > > Here are the logs: > > int * long: > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 > > long * long: > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 > > int * int: > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 > > As you can see, at the problematic runs ("int * long" and "long * long") there are two scaling. > One for "Unsafe.put" and the other one is for "Unsafe.get" and these instructions points to > same "base" and "index" instructions. > This means that address is scaled one more time because there should be only one scale. > > > > When I debugged the non-problematic run ("int * int"), > I saw that "instr->as_ArithmeticOp();" is always returns "null" then "match_index_and_scale" method returns "false" always. > So there is no scaling. > > static bool match_index_and_scale(Instruction* instr, > Instruction** index, > int* log2_scale) { > ... > > ArithmeticOp* arith = instr->as_ArithmeticOp(); > if (arith != NULL) { > ... > } > > return false; > } > > > > Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: > > void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { > Instruction* base = NULL; > Instruction* index = NULL; > int log2_scale; > > if (match(x, &base, &index, &log2_scale)) { > x->set_base(base); > x->set_index(index); > // The fix attempt here > // ///////////////////////////// > if (index != NULL) { > if (index->is_pinned()) { > log2_scale = 0; > } else { > if (log2_scale != 0) { > index->pin(); > } > } > } > // ///////////////////////////// > x->set_log2_scale(log2_scale); > if (PrintUnsafeOptimization) { > tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > } > } > } > > In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction > and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. > > After this fix, I rerun the problematic test ("int * long") and it works with these logs: > > int * long (after fix): > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 > Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 > Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 > > I am not sure my fix attempt is a really fix or maybe there are better fixes. > > Regards. > > -- > > Serkan ?ZAL > > Btw, (thanks to one my colleagues), when address calculation in the loop is > converted to > long address = baseAddress + (i * 8) > test passes. Only difference is next long pointer is calculated using > integer 8 instead of long 8. > ``` > for (int i = 0; i < count; i++) { > long address = baseAddress + (i * 8); // <--- here, integer 8 instead > of long 8 > long expected = i; > unsafe.putLong(address, expected); > long actual = unsafe.getLong(address); > if (expected != actual) { > throw new AssertionError("Expected: " + expected + ", Actual: " + > actual); > } > } > ``` > On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: > > Hi all, > > > > While I was testing my app using java 8, I encountered the previously > > reported sun.misc.Unsafe issue. > > > > https://bugs.openjdk.java.net/browse/JDK-8076445 > > > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html > > > > Issue status says it's resolved with resolution "Cannot Reproduce". But > > unfortunately it's still reproducible using "1.8.0_60-ea-b18" and > > "1.9.0-ea-b67". > > > > Test is very simple: > > > > ``` > > public static void main(String[] args) throws Exception { > > Unsafe unsafe = findUnsafe(); > > // 10000 pass > > // 100000 jvm crash > > // 1000000 fail > > int count = 100000; > > long size = count * 8L; > > long baseAddress = unsafe.allocateMemory(size); > > > > try { > > for (int i = 0; i < count; i++) { > > long address = baseAddress + (i * 8L); > > > > long expected = i; > > unsafe.putLong(address, expected); > > > > long actual = unsafe.getLong(address); > > > > if (expected != actual) { > > throw new AssertionError("Expected: " + expected + ", > > Actual: " + actual); > > } > > } > > } finally { > > unsafe.freeMemory(baseAddress); > > } > > } > > ``` > > It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is > > failing constantly. > > > > - With iteration count 10000, test is passing. > > - With iteration count 100000, jvm is crashing with SIGSEGV. > > - With iteration count 1000000, test is failing with AssertionError. > > > > When one of compilation (-Xint) or inlining (-XX:-Inline) or > > on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not > > failing at all. > > > > I tested on platforms: > > - Centos-7/openjdk-1.8.0.45 > > - OSX/oraclejdk-1.8.0.40 > > - OSX/oraclejdk-1.8.0.45 > > - OSX/oraclejdk-1.8.0_60-ea-b18 > > - OSX/oraclejdk-1.9.0-ea-b67 > > > > Previous issue comment ( > > https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) > > says "Cannot reproduce based on the latest version". I hope that latest > > version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because > > both are failing. > > > > I'm looking forward to hearing from you. > > > > Thanks, > > -Mehmet Dogan- > > -- > > > > @mmdogan > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal -------------- next part -------------- An HTML attachment was scrubbed... URL: From serkan at hazelcast.com Fri Jul 17 19:49:38 2015 From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=) Date: Fri, 17 Jul 2015 22:49:38 +0300 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: <46530DA3-EF02-434A-9642-7046CF92729B@oracle.com> References: <46530DA3-EF02-434A-9642-7046CF92729B@oracle.com> Message-ID: Hi John, Yes, I have applied your fix and it works. Thanks! Since which JDK version this patch will be there? Regards. On Fri, Jul 17, 2015 at 10:31 PM, John Rose wrote: > Thanks Serkan and Martijn for reporting and analyzing this. > > We had a very similar bug reported internally, and we just integrated a > fix: > http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7 > > Would you mind checking if it fixes your problem also? > > Best wishes, > ? John > > On Jul 12, 2015, at 5:07 AM, Serkan ?zal wrote: > > > Hi Martjin, > > Thanks for your interest and comment for making this thread a little bit > more hot. > > > From my previous message ( > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html > ): > > I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: > > > void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { > > if (OptimizeUnsafes) do_UnsafeRawOp(x); > > tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", > > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > > } > > > void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { > > if (OptimizeUnsafes) do_UnsafeRawOp(x); > > tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", > > x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); > > } > > > > So I run the test by calculating address as: > > - *"int * long"* (int is index and long is 8l) > > - *"long * long"* (the first long is index and the second long is 8l) > > - *"int * int"* (the first int is index and the second int is 8) > > Here are the logs: > > > *int * long:* > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 > > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 > > *long * long:* > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 > > Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 > > *int * int:* > > Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 > > Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 > > Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 > > As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. > > One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to > > same *"base"* and *"index"* instructions. This means that address is scaled one more time because there should be only one scale. > > > > With this fix (or attempt since I am not %100 sure if it is perfect/optimum way or not), I prevent multiple scaling on the same index instruction. > > Also one of my previous messages (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html) shows that there are multiple scaling on the index so when it scaled multiple, anymore it shows somewhere or anywhere in the memory. > > > On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg > wrote: > >> Non reviewer here, but I'd add to the comment *why* you don't want to >> scale again. >> >> Cheers, >> Martijn >> >> On 12 July 2015 at 11:29, Serkan ?zal wrote: >> >>> Hi all, >>> >>> I have created a webrev for review including the patch and shared for >>> public access from here: >>> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html >>> >>> Regards. >>> >>> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal >>> wrote: >>> >>>> Hi, >>>> >>>> I have added some logs to show that problem is caused by double scaling >>>> of offset (index) >>>> >>>> Here is my updated (log messages added) reproducer code: >>>> >>>> >>>> int count = 100000; >>>> long size = count * 8L; >>>> long baseAddress = unsafe.allocateMemory(size); >>>> System.out.println("Start address: " + Long.toHexString(baseAddress) + >>>> ", End address: " + Long.toHexString(baseAddress + >>>> size)); >>>> >>>> for (int i = 0; i < count; i++) { >>>> long address = baseAddress + (i * 8L); >>>> System.out.println( >>>> "Normal: " + Long.toHexString(address) + ", " + >>>> "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * >>>> 8L))); >>>> long expected = i; >>>> unsafe.putLong(address, expected); >>>> unsafe.getLong(address); >>>> } >>>> >>>> >>>> After sometime it crashes as >>>> >>>> >>>> ... >>>> Current thread (0x0000000002068800): JavaThread "main" >>>> [_thread_in_Java, id=10412, stack(0x00000000023f0000,0x00000000024f0000)] >>>> >>>> siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 >>>> ... >>>> ... >>>> >>>> >>>> And here is output of the execution until crash: >>>> >>>> Start address: 58bbcfa0, End address: 58c804a0 >>>> Normal: 58bbcfa0, If double scaled: 58bbcfa0 >>>> Normal: 58bbcfa8, If double scaled: 58bbcfe0 >>>> Normal: 58bbcfb0, If double scaled: 58bbd020 >>>> ... >>>> ... >>>> Normal: 58c517b0, If double scaled: 59061020 >>>> >>>> >>>> As seen from the logs and crash dump, double scaled version of target >>>> address (*If double scaled: 59061020*) is the same with the >>>> problematic address (*siginfo: ExceptionCode=0xc0000005, reading >>>> address 0x0000000059061020*) that causes to crash while accessing it. >>>> >>>> So I think, it is obvious that the crash is caused by wrong >>>> optimization of index value since index is scaled two times (for >>>> *Unsafe::put* and *Unsafe::get*) instead of only one time. Then double >>>> scaled index points to invalid memory address. >>>> >>>> Regards. >>>> >>>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I had dived into the issue with JDK-HotSpot commits and >>>>> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >>>>> >>>>> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >>>>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >>>>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>>>> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>>> } >>>>> >>>>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >>>>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>>>> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", >>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>>> } >>>>> >>>>> >>>>> So I run the test by calculating address as >>>>> - *"int * long"* (int is index and long is 8l) >>>>> - *"long * long"* (the first long is index and the second long is 8l) >>>>> - *"int * int"* (the first int is index and the second int is 8) >>>>> >>>>> Here are the logs: >>>>> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 >>>>> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 >>>>> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 >>>>> >>>>> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. >>>>> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to >>>>> same *"base"* and *"index"* instructions. >>>>> This means that address is scaled one more time because there should be only one scale. >>>>> >>>>> >>>>> When I debugged the non-problematic run (*"int * int"*), >>>>> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always. >>>>> So there is no scaling. >>>>> static bool match_index_and_scale(Instruction* instr, >>>>> Instruction** index, >>>>> int* log2_scale) { >>>>> ... >>>>> >>>>> ArithmeticOp* arith = instr->as_ArithmeticOp(); >>>>> if (arith != NULL) { >>>>> ... >>>>> } >>>>> >>>>> return false; >>>>> } >>>>> >>>>> >>>>> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: >>>>> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >>>>> Instruction* base = NULL; >>>>> Instruction* index = NULL; >>>>> int log2_scale; >>>>> >>>>> if (match(x, &base, &index, &log2_scale)) { >>>>> x->set_base(base); >>>>> x->set_index(index); // The fix attempt here // ///////////////////////////// >>>>> if (index != NULL) { >>>>> if (index->is_pinned()) { >>>>> log2_scale = 0; >>>>> } else { >>>>> if (log2_scale != 0) { >>>>> index->pin(); >>>>> } >>>>> } >>>>> } // ///////////////////////////// >>>>> x->set_log2_scale(log2_scale); >>>>> if (PrintUnsafeOptimization) { >>>>> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", >>>>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>>>> } >>>>> } >>>>> } >>>>> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction >>>>> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. >>>>> >>>>> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs: >>>>> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >>>>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 >>>>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 >>>>> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 >>>>> >>>>> I am not sure my fix attempt is a really fix or maybe there are better fixes. >>>>> >>>>> Regards. >>>>> >>>>> -- >>>>> >>>>> Serkan ?ZAL >>>>> >>>>> >>>>>> Btw, (thanks to one my colleagues), when address calculation in the loop is >>>>>> converted to >>>>>> long address = baseAddress + (i * 8) >>>>>> test passes. Only difference is next long pointer is calculated using >>>>>> integer 8 instead of long 8. >>>>>> ``` >>>>>> for (int i = 0; i < count; i++) { >>>>>> long address = baseAddress + (i * 8); // <--- here, integer 8 instead >>>>>> of long 8 >>>>>> long expected = i; >>>>>> unsafe.putLong(address, expected); >>>>>> long actual = unsafe.getLong(address); >>>>>> if (expected != actual) { >>>>>> throw new AssertionError("Expected: " + expected + ", Actual: " + >>>>>> actual); >>>>>> } >>>>>> } >>>>>> ``` >>>>>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: >>>>>> >* Hi all, >>>>>> *> >>>>>> >* While I was testing my app using java 8, I encountered the previously >>>>>> *>* reported sun.misc.Unsafe issue. >>>>>> *> >>>>>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 >>>>>> *> >>>>>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >>>>>> *> >>>>>> >* Issue status says it's resolved with resolution "Cannot Reproduce". But >>>>>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and >>>>>> *>* "1.9.0-ea-b67". >>>>>> *> >>>>>> >* Test is very simple: >>>>>> *> >>>>>> >* ``` >>>>>> *>* public static void main(String[] args) throws Exception { >>>>>> *>* Unsafe unsafe = findUnsafe(); >>>>>> *>* // 10000 pass >>>>>> *>* // 100000 jvm crash >>>>>> *>* // 1000000 fail >>>>>> *>* int count = 100000; >>>>>> *>* long size = count * 8L; >>>>>> *>* long baseAddress = unsafe.allocateMemory(size); >>>>>> *> >>>>>> >* try { >>>>>> *>* for (int i = 0; i < count; i++) { >>>>>> *>* long address = baseAddress + (i * 8L); >>>>>> *> >>>>>> >* long expected = i; >>>>>> *>* unsafe.putLong(address, expected); >>>>>> *> >>>>>> >* long actual = unsafe.getLong(address); >>>>>> *> >>>>>> >* if (expected != actual) { >>>>>> *>* throw new AssertionError("Expected: " + expected + ", >>>>>> *>* Actual: " + actual); >>>>>> *>* } >>>>>> *>* } >>>>>> *>* } finally { >>>>>> *>* unsafe.freeMemory(baseAddress); >>>>>> *>* } >>>>>> *>* } >>>>>> *>* ``` >>>>>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is >>>>>> *>* failing constantly. >>>>>> *> >>>>>> >* - With iteration count 10000, test is passing. >>>>>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV. >>>>>> *>* - With iteration count 1000000, test is failing with AssertionError. >>>>>> *> >>>>>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or >>>>>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not >>>>>> *>* failing at all. >>>>>> *> >>>>>> >* I tested on platforms: >>>>>> *>* - Centos-7/openjdk-1.8.0.45 >>>>>> *>* - OSX/oraclejdk-1.8.0.40 >>>>>> *>* - OSX/oraclejdk-1.8.0.45 >>>>>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18 >>>>>> *>* - OSX/oraclejdk-1.9.0-ea-b67 >>>>>> *> >>>>>> >* Previous issue comment ( >>>>>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) >>>>>> *>* says "Cannot reproduce based on the latest version". I hope that latest >>>>>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because >>>>>> *>* both are failing. >>>>>> *> >>>>>> >* I'm looking forward to hearing from you. >>>>>> *> >>>>>> >* Thanks, >>>>>> *>* -Mehmet Dogan- >>>>>> *>* -- >>>>>> *> >>>>>> >* @mmdogan >>>>>> *> >>>>> >>>>> >>>>> -- >>>>> Serkan ?ZAL >>>>> Remotest Software Engineer >>>>> GSM: +90 542 680 39 18 >>>>> Twitter: @serkan_ozal >>>>> >>>> >>>> >>>> >>>> -- >>>> Serkan ?ZAL >>>> Remotest Software Engineer >>>> GSM: +90 542 680 39 18 >>>> Twitter: @serkan_ozal >>>> >>> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> Twitter: @serkan_ozal >>> >> >> > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > > > -- Serkan ?ZAL Remotest Software Engineer GSM: +90 542 680 39 18 Twitter: @serkan_ozal -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Jul 17 20:27:31 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2015 13:27:31 -0700 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: <46530DA3-EF02-434A-9642-7046CF92729B@oracle.com> Message-ID: <55A96533.2050703@oracle.com> It is in released few days ago JDK 8u51: http://www.oracle.com/technetwork/java/javase/8u51-relnotes-2587590.html Regards, Vladimir On 7/17/15 12:49 PM, Serkan ?zal wrote: > Hi John, > > Yes, I have applied your fix and it works. > Thanks! > > Since which JDK version this patch will be there? > > Regards. > > On Fri, Jul 17, 2015 at 10:31 PM, John Rose > wrote: > > Thanks Serkan and Martijn for reporting and analyzing this. > > We had a very similar bug reported internally, and we just > integrated a fix: > http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7 > > Would you mind checking if it fixes your problem also? > > Best wishes, > ? John > > On Jul 12, 2015, at 5:07 AM, Serkan ?zal > wrote: >> >> Hi Martjin, >> >> Thanks for your interest and comment for making this thread a >> little bit more hot. >> >> >> From my previous message >> (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html): >> >> I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >> >> >> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >> >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> >> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id >> %d, index = id %d, log2_scale = %d", >> >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> >> } >> >> >> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >> >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> >> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id >> %d, index = id %d, log2_scale = %d", >> >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> >> } >> >> >> >> So I run the test by calculating address as: >> >> - *"int * long"* (int is index and long is 8l) >> >> - *"long * long"* (the first long is index and the second long >> is 8l) >> >> - *"int * int"* (the first int is index and the second int is 8) >> >> Here are the logs: >> >> >> *int * long:* >> >> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id >> 17, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id >> 19, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id >> 21, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id >> 23, log2_scale = 0 >> >> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id >> 27, log2_scale = 3 >> >> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id >> 27, log2_scale = 3 >> >> *long * long:* >> >> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id >> 17, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id >> 19, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id >> 21, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id >> 23, log2_scale = 0 >> >> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id >> 14, log2_scale = 3 >> >> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id >> 14, log2_scale = 3 >> >> *int * int:* >> >> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id >> 17, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id >> 19, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id >> 21, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id >> 23, log2_scale = 0 >> >> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id >> 29, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id >> 29, log2_scale = 0 >> >> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id >> 15, log2_scale = 0 >> >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id >> 15, log2_scale = 0 >> >> As you can see, at the problematic runs (*"int * long"* and >> *"long * long"*) there are two scaling. >> >> One for *"Unsafe.put"* and the other one is for*"Unsafe.get"* >> and these instructions points to >> >> same *"base"* and *"index"* instructions. This means that >> address is scaled one more time because there should be only >> one scale. >> >> >> >> With this fix (or attempt since I am not %100 sure if it is >> perfect/optimum way or not), I prevent multiple scaling on the >> same index instruction. >> >> Also one of my previous messages >> (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html) >> shows that there are multiple scaling on the index so when it >> scaled multiple, anymore it shows somewhere or anywhere in the memory. >> >> On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg >> > wrote: >> >> Non reviewer here, but I'd add to the comment *why* you don't >> want to scale again. >> >> Cheers, >> Martijn >> >> On 12 July 2015 at 11:29, Serkan ?zal > > wrote: >> >> Hi all, >> >> I have created a webrev for review including the patch and >> shared for public access from here: >> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html >> >> Regards. >> >> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal >> > wrote: >> >> Hi, >> >> I have added some logs to show that problem is caused >> by double scaling of offset (index) >> >> Here is my updated (log messages added) reproducer code: >> >> >> int count = 100000; >> long size = count * 8L; >> long baseAddress = unsafe.allocateMemory(size); >> System.out.println("Start address: " + >> Long.toHexString(baseAddress) + >> ", End address: " + >> Long.toHexString(baseAddress + size)); >> >> for (int i = 0; i < count; i++) { >> long address = baseAddress + (i * 8L); >> System.out.println( >> "Normal: " + Long.toHexString(address) + ", " + >> "If double scaled: " + >> Long.toHexString(baseAddress + (i * 8L * 8L))); >> long expected = i; >> unsafe.putLong(address, expected); >> unsafe.getLong(address); >> } >> >> >> After sometime it crashes as >> >> >> ... >> Current thread (0x0000000002068800): JavaThread >> "main" [_thread_in_Java, id=10412, >> stack(0x00000000023f0000,0x00000000024f0000)] >> >> siginfo: ExceptionCode=0xc0000005, reading address >> 0x0000000059061020 >> ... >> ... >> >> >> And here is output of the execution until crash: >> >> Start address: 58bbcfa0, End address: 58c804a0 >> Normal: 58bbcfa0, If double scaled: 58bbcfa0 >> Normal: 58bbcfa8, If double scaled: 58bbcfe0 >> Normal: 58bbcfb0, If double scaled: 58bbd020 >> ... >> ... >> Normal: 58c517b0, If double scaled: 59061020 >> >> >> As seen from the logs and crash dump, double scaled >> version of target address (*If double scaled: >> 59061020*) is the same with the problematic address >> (*siginfo: ExceptionCode=0xc0000005, reading address >> 0x0000000059061020*) that causes to crash while >> accessing it. >> >> So I think, it is obvious that the crash is caused by >> wrong optimization of index value since index is >> scaled two times (for *Unsafe::put* and *Unsafe::get*) >> instead of only one time. Then double scaled index >> points to invalid memory address. >> >> Regards. >> >> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal >> > >> wrote: >> >> Hi all, I had dived into the issue with >> JDK-HotSpot commits and the issue arised after >> this commit: >> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >> Then I added some additional logs to >> *"vm/c1/c1_Canonicalizer.cpp"*: void >> Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id >> %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), >> x->log2_scale()); } void >> Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> tty->print_cr("Canonicalizer: do_UnsafePutRaw id >> %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), >> x->log2_scale()); } >> >> So I run the test by calculating address as - >> *"int * long"* (int is index and long is 8l) - >> *"long * long"* (the first long is index and the >> second long is 8l) - *"int * int"* (the first int >> is index and the second int is 8) Here are the >> logs: *int * long:* Canonicalizer: do_UnsafeGetRaw >> id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id >> 16, index = id 19, log2_scale = 0 Canonicalizer: >> do_UnsafeGetRaw id 22: base = id 16, index = id >> 21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >> id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 33: base = id >> 13, index = id 27, log2_scale = 3 Canonicalizer: >> do_UnsafeGetRaw id 36: base = id 13, index = id >> 27, log2_scale = 3*long * long:* Canonicalizer: >> do_UnsafeGetRaw id 18: base = id 16, index = id >> 17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >> id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id >> 16, index = id 21, log2_scale = 0 Canonicalizer: >> do_UnsafeGetRaw id 24: base = id 16, index = id >> 23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw >> id 35: base = id 13, index = id 14, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 37: base = id >> 13, index = id 14, log2_scale = 3*int * int:* >> Canonicalizer: do_UnsafeGetRaw id 18: base = id >> 16, index = id 17, log2_scale = 0 Canonicalizer: >> do_UnsafeGetRaw id 20: base = id 16, index = id >> 19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >> id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id >> 16, index = id 23, log2_scale = 0 Canonicalizer: >> do_UnsafePutRaw id 33: base = id 13, index = id >> 29, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >> id 36: base = id 13, index = id 29, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, >> index = id 15, log2_scale = 0 Canonicalizer: >> do_UnsafeGetRaw id 22: base = id 8, index = id 15, >> log2_scale = 0As you can see, at the problematic >> runs (*"int * long"* and *"long * long"*) there >> are two scaling. One for *"Unsafe.put"* and the >> other one is for*"Unsafe.get"* and these >> instructions points to same *"base"* and *"index"* >> instructions. This means that address is scaled >> one more time because there should be only one scale. >> >> When I debugged the non-problematic run (*"int * >> int"*), I saw that *"instr->as_ArithmeticOp();"* >> is always returns *"null" *then >> *"match_index_and_scale"* method returns*"false"* >> always. So there is no scaling. static bool >> match_index_and_scale(Instruction* instr, >> Instruction** index, int* log2_scale) { ... >> ArithmeticOp* arith = instr->as_ArithmeticOp(); if >> (arith != NULL) { ... } return false; } >> >> Then I have added my fix attempt to prevent >> multiple scaling for Unsafe instructions points to >> same index instruction like this: void >> Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >> Instruction* base = NULL; Instruction* index = >> NULL; int log2_scale; if (match(x, &base, &index, >> &log2_scale)) { x->set_base(base); >> x->set_index(index); // The fix attempt here // >> ///////////////////////////// if (index != NULL) { >> if (index->is_pinned()) { log2_scale = 0; } else { >> if (log2_scale != 0) { index->pin(); } } } // >> ///////////////////////////// >> x->set_log2_scale(log2_scale); if >> (PrintUnsafeOptimization) { >> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: >> base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), >> x->log2_scale()); } } } In this fix attempt, if >> there is a scaling for the Unsafe instruction, I >> pin index instruction of that instruction and at >> next calls, if the index instruction is pinned, I >> assummed that there is already scaling so no need >> to another scaling. After this fix, I rerun the >> problematic test (*"int * long"*) and it works >> with these logs: *int * long (after fix):* >> Canonicalizer: do_UnsafeGetRaw id 18: base = id >> 16, index = id 17, log2_scale = 0 Canonicalizer: >> do_UnsafeGetRaw id 20: base = id 16, index = id >> 19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >> id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id >> 16, index = id 23, log2_scale = 0 Canonicalizer: >> do_UnsafePutRaw id 35: base = id 13, index = id >> 14, log2_scale = 3 Canonicalizer: do_UnsafeGetRaw >> id 37: base = id 13, index = id 14, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, >> index = id 11, log2_scale = 3 Canonicalizer: >> do_UnsafeGetRaw id 23: base = id 8, index = id 11, >> log2_scale = 0I am not sure my fix attempt is a >> really fix or maybe there are better fixes. >> Regards. -- Serkan ?ZAL >> >> Btw, (thanks to one my colleagues), when >> address calculation in the loop is >> converted to long address = baseAddress + (i * >> 8) test passes. Only difference is next long >> pointer is calculated using >> integer 8 instead of long 8. ``` >> for (int i = 0; i < count; i++) { >> long address = baseAddress + (i * 8); // <--- >> here, integer 8 instead >> of long 8 long expected = i; >> unsafe.putLong(address, expected); long actual >> = unsafe.getLong(address); if (expected != >> actual) { >> throw new AssertionError("Expected: " + >> expected + ", Actual: " + >> actual); >> } >> } >> ``` On Tue, Jun 9, 2015 at 1:07 PM Mehmet >> Dogan > > >> wrote: >/Hi all, /> >> >/While I was testing my app using java 8, I >> encountered the previously />/reported >> sun.misc.Unsafe issue. /> >> >/https://bugs.openjdk.java.net/browse/JDK-8076445 >> /> >> >/http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >> /> >> >/Issue status says it's resolved with >> resolution "Cannot Reproduce". But >> />/unfortunately it's still reproducible using >> "1.8.0_60-ea-b18" and />/"1.9.0-ea-b67". /> >> >/Test is very simple: /> >> >/``` />/public static void main(String[] >> args) throws Exception { />/Unsafe unsafe = >> findUnsafe(); />/// 10000 pass />/// 100000 >> jvm crash />/// 1000000 fail />/int count = >> 100000; />/long size = count * 8L; />/long >> baseAddress = unsafe.allocateMemory(size); /> >> >/try { />/for (int i = 0; i < count; i++) { >> />/long address = baseAddress + (i * 8L); /> >> >/long expected = i; >> />/unsafe.putLong(address, expected); /> >> >/long actual = unsafe.getLong(address); /> >> >/if (expected != actual) { />/throw new >> AssertionError("Expected: " + expected + ", >> />/Actual: " + actual); />/} />/} />/} finally >> { />/unsafe.freeMemory(baseAddress); />/} />/} >> />/``` />/It's not failing up to version >> 1.8.0.31, by starting 1.8.0.40 test is >> />/failing constantly. /> >> >/- With iteration count 10000, test is >> passing. />/- With iteration count 100000, jvm >> is crashing with SIGSEGV. />/- With iteration >> count 1000000, test is failing with >> AssertionError. /> >> >/When one of compilation (-Xint) or inlining >> (-XX:-Inline) or />/on-stack-replacement >> (-XX:-UseOnStackReplacement) is disabled, test >> is not />/failing at all. /> >> >/I tested on platforms: />/- >> Centos-7/openjdk-1.8.0.45 />/- >> OSX/oraclejdk-1.8.0.40 />/- >> OSX/oraclejdk-1.8.0.45 />/- >> OSX/oraclejdk-1.8.0_60-ea-b18 />/- >> OSX/oraclejdk-1.9.0-ea-b67 /> >> >/Previous issue comment ( >> />/https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043) >> />/says "Cannot reproduce based on the latest >> version". I hope that latest />/version is not >> mentioning to '1.8.0_60-ea-b18' or >> '1.9.0-ea-b67'. Because />/both are failing. /> >> >/I'm looking forward to hearing from you. /> >> >/Thanks, />/-Mehmet Dogan- />/-- /> >> >/@mmdogan /> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> >> Twitter: @serkan_ozal >> >> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal >> >> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal >> >> >> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal > > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal From vlad.ureche at gmail.com Fri Jul 17 22:47:15 2015 From: vlad.ureche at gmail.com (Vlad Ureche) Date: Sat, 18 Jul 2015 00:47:15 +0200 Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of Compile::unique() in appropriate places Message-ID: Hi, Please review the following patch for JDK-8011858. Big thanks to Vladimir Kozlov for his patient guidance while working on this! *Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858 *Problem:* Throughout C2, local stacks are used to prevent recursive calls from blowing up the system stack. These are sized based on the total number of nodes in the compilation run (e.g. C->unique()). Instead, they should be sized based on the live node count (C->live_nodes()). Now, with the increased difference between live_nodes (limited at LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go up to 240K), it is important to not over-estimate the size of stacks. *Solution:* This patch mirrors a patch written by Vladimir Kozlov for JDK8u. It replaces the initial sizes from C->unique() to C->live_nodes(), preserving any shifts (divisions) and offsets. For example, in the compile.cpp patch : - Node_Stack nstack(unique() >> 1); + Node_Stack nstack(live_nodes() >> 1); There is an issue described at https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the workaround from Vladimir?s patch. *Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or http://vladureche.ro/webrev/8011858 (updated, includes a link to bug 8121702) *Tests:* Running jtreg with the compiler, runtime and gc tests on the dev branch shows the same status before and after the patch: 808 tests passed, 16 failed and 6 errors . What would be a stable point where all tests are expected to pass, so I can test the patch there? Maybe jdk9 ? Thanks, Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sat Jul 18 01:24:06 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2015 18:24:06 -0700 Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of Compile::unique() in appropriate places In-Reply-To: References: Message-ID: <55A9AAB6.50505@oracle.com> Thank you, Vlad It looks good. We usually don't put bug id into comments. So your previous version on cr.openjdk is fine. Second reviewer should look on and sponsor it with you listed as contributor (I see you signed OCA already). Thanks, Vladimir On 7/17/15 3:47 PM, Vlad Ureche wrote: > Hi, > > Please review the following patch for JDK-8011858. Big thanks to > Vladimir Kozlov for his patient guidance while working on this! > > *Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858 > > *Problem:* Throughout C2, local stacks are used to prevent recursive > calls from blowing up the system stack. These are sized based on the > total number of nodes in the compilation run (e.g. C->unique()). > Instead, they should be sized based on the live node count > (C->live_nodes()). > > Now, with the increased difference between live_nodes (limited at > LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go > up to 240K), it is important to not over-estimate the size of stacks. > > *Solution:* This patch mirrors a patch written by Vladimir Kozlov for > JDK8u. It replaces the initial sizes from C->unique() to > C->live_nodes(), preserving any shifts (divisions) and offsets. For > example, in the compile.cpp patch > : > > |- Node_Stack nstack(unique() >> 1); > + Node_Stack nstack(live_nodes() >> 1); > | > > There is an issue described at > https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the > workaround from Vladimir?s patch. > > *Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or > http://vladureche.ro/webrev/8011858 > (updated, includes a link to bug > 8121702) > > *Tests:* Running jtreg with the compiler, runtime and gc tests on the > dev branch shows the same status > before and after the patch: 808 tests passed, 16 failed and 6 errors > . What > would be a stable point where all tests are expected to pass, so I can > test the patch there? Maybe jdk9 ? > > Thanks, > Vlad > From martijnverburg at gmail.com Sat Jul 18 06:43:39 2015 From: martijnverburg at gmail.com (Martijn Verburg) Date: Sat, 18 Jul 2015 07:43:39 +0100 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: <55A96533.2050703@oracle.com> References: <46530DA3-EF02-434A-9642-7046CF92729B@oracle.com> <55A96533.2050703@oracle.com> Message-ID: Fix works for me as well - thanks for following up, appreciate this was an obscure one in an officially unsupported API On Friday, 17 July 2015, Vladimir Kozlov wrote: > It is in released few days ago JDK 8u51: > > http://www.oracle.com/technetwork/java/javase/8u51-relnotes-2587590.html > > Regards, > Vladimir > > On 7/17/15 12:49 PM, Serkan ?zal wrote: > >> Hi John, >> >> Yes, I have applied your fix and it works. >> Thanks! >> >> Since which JDK version this patch will be there? >> >> Regards. >> >> On Fri, Jul 17, 2015 at 10:31 PM, John Rose > > wrote: >> >> Thanks Serkan and Martijn for reporting and analyzing this. >> >> We had a very similar bug reported internally, and we just >> integrated a fix: >> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/3816de51b5e7 >> >> Would you mind checking if it fixes your problem also? >> >> Best wishes, >> ? John >> >> On Jul 12, 2015, at 5:07 AM, Serkan ?zal > > wrote: >> >>> >>> Hi Martjin, >>> >>> Thanks for your interest and comment for making this thread a >>> little bit more hot. >>> >>> >>> From my previous message >>> ( >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018221.html >>> ): >>> >>> I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >>> >>> >>> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >>> >>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>> >>> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id >>> %d, index = id %d, log2_scale = %d", >>> >>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>> >>> } >>> >>> >>> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >>> >>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>> >>> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id >>> %d, index = id %d, log2_scale = %d", >>> >>> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >>> >>> } >>> >>> >>> >>> So I run the test by calculating address as: >>> >>> - *"int * long"* (int is index and long is 8l) >>> >>> - *"long * long"* (the first long is index and the second long >>> is 8l) >>> >>> - *"int * int"* (the first int is index and the second int is 8) >>> >>> Here are the logs: >>> >>> >>> *int * long:* >>> >>> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id >>> 17, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id >>> 19, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id >>> 21, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id >>> 23, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id >>> 27, log2_scale = 3 >>> >>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id >>> 27, log2_scale = 3 >>> >>> *long * long:* >>> >>> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id >>> 17, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id >>> 19, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id >>> 21, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id >>> 23, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id >>> 14, log2_scale = 3 >>> >>> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id >>> 14, log2_scale = 3 >>> >>> *int * int:* >>> >>> Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id >>> 17, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id >>> 19, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id >>> 21, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id >>> 23, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id >>> 29, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id >>> 29, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id >>> 15, log2_scale = 0 >>> >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id >>> 15, log2_scale = 0 >>> >>> As you can see, at the problematic runs (*"int * long"* and >>> *"long * long"*) there are two scaling. >>> >>> One for *"Unsafe.put"* and the other one is for*"Unsafe.get"* >>> and these instructions points to >>> >>> same *"base"* and *"index"* instructions. This means that >>> address is scaled one more time because there should be only >>> one scale. >>> >>> >>> >>> With this fix (or attempt since I am not %100 sure if it is >>> perfect/optimum way or not), I prevent multiple scaling on the >>> same index instruction. >>> >>> Also one of my previous messages >>> ( >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-July/018383.html >>> ) >>> shows that there are multiple scaling on the index so when it >>> scaled multiple, anymore it shows somewhere or anywhere in the >>> memory. >>> >>> On Sun, Jul 12, 2015 at 2:54 PM, Martijn Verburg >>> > wrote: >>> >>> Non reviewer here, but I'd add to the comment *why* you don't >>> want to scale again. >>> >>> Cheers, >>> Martijn >>> >>> On 12 July 2015 at 11:29, Serkan ?zal >> > wrote: >>> >>> Hi all, >>> >>> I have created a webrev for review including the patch and >>> shared for public access from here: >>> https://s3.amazonaws.com/jdk-8087134/webrev.00/index.html >>> >>> Regards. >>> >>> On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal >>> > wrote: >>> >>> Hi, >>> >>> I have added some logs to show that problem is caused >>> by double scaling of offset (index) >>> >>> Here is my updated (log messages added) reproducer code: >>> >>> >>> int count = 100000; >>> long size = count * 8L; >>> long baseAddress = unsafe.allocateMemory(size); >>> System.out.println("Start address: " + >>> Long.toHexString(baseAddress) + >>> ", End address: " + >>> Long.toHexString(baseAddress + size)); >>> >>> for (int i = 0; i < count; i++) { >>> long address = baseAddress + (i * 8L); >>> System.out.println( >>> "Normal: " + Long.toHexString(address) + ", " + >>> "If double scaled: " + >>> Long.toHexString(baseAddress + (i * 8L * 8L))); >>> long expected = i; >>> unsafe.putLong(address, expected); >>> unsafe.getLong(address); >>> } >>> >>> >>> After sometime it crashes as >>> >>> >>> ... >>> Current thread (0x0000000002068800): JavaThread >>> "main" [_thread_in_Java, id=10412, >>> stack(0x00000000023f0000,0x00000000024f0000)] >>> >>> siginfo: ExceptionCode=0xc0000005, reading address >>> 0x0000000059061020 >>> ... >>> ... >>> >>> >>> And here is output of the execution until crash: >>> >>> Start address: 58bbcfa0, End address: 58c804a0 >>> Normal: 58bbcfa0, If double scaled: 58bbcfa0 >>> Normal: 58bbcfa8, If double scaled: 58bbcfe0 >>> Normal: 58bbcfb0, If double scaled: 58bbd020 >>> ... >>> ... >>> Normal: 58c517b0, If double scaled: 59061020 >>> >>> >>> As seen from the logs and crash dump, double scaled >>> version of target address (*If double scaled: >>> 59061020*) is the same with the problematic address >>> (*siginfo: ExceptionCode=0xc0000005, reading address >>> 0x0000000059061020*) that causes to crash while >>> accessing it. >>> >>> So I think, it is obvious that the crash is caused by >>> wrong optimization of index value since index is >>> scaled two times (for *Unsafe::put* and *Unsafe::get*) >>> instead of only one time. Then double scaled index >>> points to invalid memory address. >>> >>> Regards. >>> >>> On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal >>> > >>> wrote: >>> >>> Hi all, I had dived into the issue with >>> JDK-HotSpot commits and the issue arised after >>> this commit: >>> >>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >>> Then I added some additional logs to >>> *"vm/c1/c1_Canonicalizer.cpp"*: void >>> Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id >>> %d: base = id %d, index = id %d, log2_scale = %d", >>> x->id(), x->base()->id(), x->index()->id(), >>> x->log2_scale()); } void >>> Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >>> if (OptimizeUnsafes) do_UnsafeRawOp(x); >>> tty->print_cr("Canonicalizer: do_UnsafePutRaw id >>> %d: base = id %d, index = id %d, log2_scale = %d", >>> x->id(), x->base()->id(), x->index()->id(), >>> x->log2_scale()); } >>> >>> So I run the test by calculating address as - >>> *"int * long"* (int is index and long is 8l) - >>> *"long * long"* (the first long is index and the >>> second long is 8l) - *"int * int"* (the first int >>> is index and the second int is 8) Here are the >>> logs: *int * long:* Canonicalizer: do_UnsafeGetRaw >>> id 18: base = id 16, index = id 17, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 20: base = id >>> 16, index = id 19, log2_scale = 0 Canonicalizer: >>> do_UnsafeGetRaw id 22: base = id 16, index = id >>> 21, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >>> id 24: base = id 16, index = id 23, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 33: base = id >>> 13, index = id 27, log2_scale = 3 Canonicalizer: >>> do_UnsafeGetRaw id 36: base = id 13, index = id >>> 27, log2_scale = 3*long * long:* Canonicalizer: >>> do_UnsafeGetRaw id 18: base = id 16, index = id >>> 17, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >>> id 20: base = id 16, index = id 19, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 22: base = id >>> 16, index = id 21, log2_scale = 0 Canonicalizer: >>> do_UnsafeGetRaw id 24: base = id 16, index = id >>> 23, log2_scale = 0 Canonicalizer: do_UnsafePutRaw >>> id 35: base = id 13, index = id 14, log2_scale = 3 >>> Canonicalizer: do_UnsafeGetRaw id 37: base = id >>> 13, index = id 14, log2_scale = 3*int * int:* >>> Canonicalizer: do_UnsafeGetRaw id 18: base = id >>> 16, index = id 17, log2_scale = 0 Canonicalizer: >>> do_UnsafeGetRaw id 20: base = id 16, index = id >>> 19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >>> id 22: base = id 16, index = id 21, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id >>> 16, index = id 23, log2_scale = 0 Canonicalizer: >>> do_UnsafePutRaw id 33: base = id 13, index = id >>> 29, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >>> id 36: base = id 13, index = id 29, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, >>> index = id 15, log2_scale = 0 Canonicalizer: >>> do_UnsafeGetRaw id 22: base = id 8, index = id 15, >>> log2_scale = 0As you can see, at the problematic >>> runs (*"int * long"* and *"long * long"*) there >>> are two scaling. One for *"Unsafe.put"* and the >>> other one is for*"Unsafe.get"* and these >>> instructions points to same *"base"* and *"index"* >>> instructions. This means that address is scaled >>> one more time because there should be only one scale. >>> >>> When I debugged the non-problematic run (*"int * >>> int"*), I saw that *"instr->as_ArithmeticOp();"* >>> is always returns *"null" *then >>> *"match_index_and_scale"* method returns*"false"* >>> always. So there is no scaling. static bool >>> match_index_and_scale(Instruction* instr, >>> Instruction** index, int* log2_scale) { ... >>> ArithmeticOp* arith = instr->as_ArithmeticOp(); if >>> (arith != NULL) { ... } return false; } >>> >>> Then I have added my fix attempt to prevent >>> multiple scaling for Unsafe instructions points to >>> same index instruction like this: void >>> Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >>> Instruction* base = NULL; Instruction* index = >>> NULL; int log2_scale; if (match(x, &base, &index, >>> &log2_scale)) { x->set_base(base); >>> x->set_index(index); // The fix attempt here // >>> ///////////////////////////// if (index != NULL) { >>> if (index->is_pinned()) { log2_scale = 0; } else { >>> if (log2_scale != 0) { index->pin(); } } } // >>> ///////////////////////////// >>> x->set_log2_scale(log2_scale); if >>> (PrintUnsafeOptimization) { >>> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: >>> base = id %d, index = id %d, log2_scale = %d", >>> x->id(), x->base()->id(), x->index()->id(), >>> x->log2_scale()); } } } In this fix attempt, if >>> there is a scaling for the Unsafe instruction, I >>> pin index instruction of that instruction and at >>> next calls, if the index instruction is pinned, I >>> assummed that there is already scaling so no need >>> to another scaling. After this fix, I rerun the >>> problematic test (*"int * long"*) and it works >>> with these logs: *int * long (after fix):* >>> Canonicalizer: do_UnsafeGetRaw id 18: base = id >>> 16, index = id 17, log2_scale = 0 Canonicalizer: >>> do_UnsafeGetRaw id 20: base = id 16, index = id >>> 19, log2_scale = 0 Canonicalizer: do_UnsafeGetRaw >>> id 22: base = id 16, index = id 21, log2_scale = 0 >>> Canonicalizer: do_UnsafeGetRaw id 24: base = id >>> 16, index = id 23, log2_scale = 0 Canonicalizer: >>> do_UnsafePutRaw id 35: base = id 13, index = id >>> 14, log2_scale = 3 Canonicalizer: do_UnsafeGetRaw >>> id 37: base = id 13, index = id 14, log2_scale = 0 >>> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, >>> index = id 11, log2_scale = 3 Canonicalizer: >>> do_UnsafeGetRaw id 23: base = id 8, index = id 11, >>> log2_scale = 0I am not sure my fix attempt is a >>> really fix or maybe there are better fixes. >>> Regards. -- Serkan ?ZAL >>> >>> Btw, (thanks to one my colleagues), when >>> address calculation in the loop is >>> converted to long address = baseAddress + (i * >>> 8) test passes. Only difference is next long >>> pointer is calculated using >>> integer 8 instead of long 8. ``` >>> for (int i = 0; i < count; i++) { >>> long address = baseAddress + (i * 8); // <--- >>> here, integer 8 instead >>> of long 8 long expected = i; >>> unsafe.putLong(address, expected); long actual >>> = unsafe.getLong(address); if (expected != >>> actual) { >>> throw new AssertionError("Expected: " + >>> expected + ", Actual: " + >>> actual); >>> } >>> } >>> ``` On Tue, Jun 9, 2015 at 1:07 PM Mehmet >>> Dogan >> < >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev>> >>> wrote: >/Hi all, /> >>> >/While I was testing my app using java 8, I >>> encountered the previously />/reported >>> sun.misc.Unsafe issue. /> >>> >/ >>> https://bugs.openjdk.java.net/browse/JDK-8076445 >>> /> >>> >/ >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >>> /> >>> >/Issue status says it's resolved with >>> resolution "Cannot Reproduce". But >>> />/unfortunately it's still reproducible using >>> "1.8.0_60-ea-b18" and />/"1.9.0-ea-b67". /> >>> >/Test is very simple: /> >>> >/``` />/public static void main(String[] >>> args) throws Exception { />/Unsafe unsafe = >>> findUnsafe(); />/// 10000 pass />/// 100000 >>> jvm crash />/// 1000000 fail />/int count = >>> 100000; />/long size = count * 8L; />/long >>> baseAddress = unsafe.allocateMemory(size); /> >>> >/try { />/for (int i = 0; i < count; i++) { >>> />/long address = baseAddress + (i * 8L); /> >>> >/long expected = i; >>> />/unsafe.putLong(address, expected); /> >>> >/long actual = unsafe.getLong(address); /> >>> >/if (expected != actual) { />/throw new >>> AssertionError("Expected: " + expected + ", >>> />/Actual: " + actual); />/} />/} />/} finally >>> { />/unsafe.freeMemory(baseAddress); />/} />/} >>> />/``` />/It's not failing up to version >>> 1.8.0.31, by starting 1.8.0.40 test is >>> />/failing constantly. /> >>> >/- With iteration count 10000, test is >>> passing. />/- With iteration count 100000, jvm >>> is crashing with SIGSEGV. />/- With iteration >>> count 1000000, test is failing with >>> AssertionError. /> >>> >/When one of compilation (-Xint) or inlining >>> (-XX:-Inline) or />/on-stack-replacement >>> (-XX:-UseOnStackReplacement) is disabled, test >>> is not />/failing at all. /> >>> >/I tested on platforms: />/- >>> Centos-7/openjdk-1.8.0.45 />/- >>> OSX/oraclejdk-1.8.0.40 />/- >>> OSX/oraclejdk-1.8.0.45 />/- >>> OSX/oraclejdk-1.8.0_60-ea-b18 />/- >>> OSX/oraclejdk-1.9.0-ea-b67 /> >>> >/Previous issue comment ( >>> />/ >>> https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 >>> ) >>> />/says "Cannot reproduce based on the latest >>> version". I hope that latest />/version is not >>> mentioning to '1.8.0_60-ea-b18' or >>> '1.9.0-ea-b67'. Because />/both are failing. /> >>> >/I'm looking forward to hearing from you. /> >>> >/Thanks, />/-Mehmet Dogan- />/-- /> >>> >/@mmdogan /> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> >>> Twitter: @serkan_ozal >>> >>> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> Twitter: @serkan_ozal >>> >>> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> Twitter: @serkan_ozal >>> >>> >>> >>> >>> >>> -- >>> Serkan ?ZAL >>> Remotest Software Engineer >>> GSM: +90 542 680 39 18 >>> Twitter: @serkan_ozal >>> >> >> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal >> > -- Cheers, Martijn (Sent from Gmail Mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Sat Jul 18 07:51:03 2015 From: dean.long at oracle.com (Dean Long) Date: Sat, 18 Jul 2015 00:51:03 -0700 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55A9033C.2030302@oracle.com> References: <55A9033C.2030302@oracle.com> Message-ID: <55AA0567.6070602@oracle.com> I think we should distinguish the different uses and treat them accordingly: 1) padding nops for patching, executed We need to be careful about inserting a fat nop here, if later patching overwrites only part of the fat nop, resulting in an illegal intruction. 2) padding nops for patching, never executed It should be safe insert a fat nop here, but there's no point if the nops are not reachable and never executed. 3) alignment nops, never patched, executed Fat nops are fine, but on some CPUs branching may be even better, so I suggest using align() for this, and letting align() decide what to generate. The change in check_icache() could use a version of align that takes the target offset as an argument: 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size); 4) alignment nops, never patched, never executed Doesn't matter what we emit here, but we might as well make it understandable by humans using a debugger. I believe the patching nops in c1_CodeStubs_x86.cpp and c1_LIRAssembler.cpp are patched concurrently while the code is running, not at a safepoint, so it's not clear to me if it's safe to use fat nops on x86. I would consider those changes unsafe on x86 without further analysis of what happens during patching. dl On 7/17/2015 6:29 AM, Aleksey Shipilev wrote: > Hi there, > > C1 is not very good at inlining and intrisifying methods, and hence the > call performance is important there. One nit that we can see in the > generated code on x86 is that C1 uses the single-byte nops, even for > long nop strides. > > This improvement fixes that: > https://bugs.openjdk.java.net/browse/JDK-8131682 > http://cr.openjdk.java.net/~shade/8131682/webrev.00/ > > Testing: > - JPRT -testset hotspot on open platforms > - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 > > (I understand the symmetric change is going to be needed in closed > parts, but let's polish the open part first). > > Thanks, > -Aleksey > From roland.westrelin at oracle.com Mon Jul 20 10:05:09 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 20 Jul 2015 12:05:09 +0200 Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not allowed any more In-Reply-To: <559E9CA1.2050505@oracle.com> References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com> <559E9CA1.2050505@oracle.com> Message-ID: Thanks for the reviews, Vladimir & Vladimir. Roland. From adinn at redhat.com Mon Jul 20 13:22:28 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 20 Jul 2015 14:22:28 +0100 Subject: Query regarding ordering in G1 post-write barrier Message-ID: <55ACF614.5020103@redhat.com> $SUBJECT is in relation to the following code in method GraphKit::g1_mark_card() . . . __ storeCM(__ ctrl(), card_adr, zero, oop_store, oop_alias_idx, card_bt, Compile::AliasIdxRaw); // Now do the queue work __ if_then(index, BoolTest::ne, zeroX); { Node* next_index = _gvn.transform(new SubXNode(index, __ ConX(sizeof(intptr_t)))); Node* log_addr = __ AddP(no_base, buffer, next_index); // Order, see storeCM. __ store(__ ctrl(), log_addr, card_adr, T_ADDRESS, Compile::AliasIdxRaw, MemNode::unordered); __ store(__ ctrl(), index_adr, next_index, TypeX_X->basic_type(), Compile::AliasIdxRaw, MemNode::unordered); } __ else_(); { __ make_leaf_call(tf, CAST_FROM_FN_PTR(address, SharedRuntime::g1_wb_post), "g1_wb_post", card_adr, __ thread()); } __ end_if(); . . . The 3 stores StoreCM -> StoreP -> StoreX which mark the card and then push the card address into the dirty queue appear to end up being emitted in that order -- I assume by virtue of the memory links between them (output of StoreCM is the mem input of StoreP, output of StoreP is the mem input of StoreX). That order makes sense if the queue were to be observed concurrently i.e. mark card to ensure write is flagged before it is performed, write value, then decrement index to make value write visible. So, that's ok on x86 where TCO means this is also the order of visibility. The StoreCM is 'ordered' i.e. it is flagged with mem ordering type = mo_release. However, the latter pair of instructions are 'unordered'. I looked at the G1 code which processes dirty queues and could not make head nor tail (oops, apologies for any undercurrent of a pun) of when it gets run. So, the question is this: does dirty queue processing only happen in GC threads when mutators cannot be writing them? or is there a need on non-TCO architectures to maintain some sort of consistency in the queue update via by serializing these writes? Hmm, ok that seems to be two questions. Let's make that a starter for 10 and a bonus for whoever buzzes fastest. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From thomas.schatzl at oracle.com Mon Jul 20 13:44:47 2015 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 20 Jul 2015 15:44:47 +0200 Subject: Query regarding ordering in G1 post-write barrier In-Reply-To: <55ACF614.5020103@redhat.com> References: <55ACF614.5020103@redhat.com> Message-ID: <1437399887.2272.76.camel@oracle.com> Hi, On Mon, 2015-07-20 at 14:22 +0100, Andrew Dinn wrote: > $SUBJECT is in relation to the following code in method > GraphKit::g1_mark_card() [...] > The StoreCM is 'ordered' i.e. it is flagged with mem ordering type = > mo_release. However, the latter pair of instructions are 'unordered'. I > looked at the G1 code which processes dirty queues and could not make > head nor tail (oops, apologies for any undercurrent of a pun) of when it > gets run. > > So, the question is this: does dirty queue processing only happen in GC > threads when mutators cannot be writing them? or is there a need on > non-TCO architectures to maintain some sort of consistency in the queue > update via by serializing these writes? Mutator threads and refinement threads are running concurrently. While GC threads are working on the queues, mutators never run. While refinement threads may set and clear the mark on the card table concurrently because they were processing that card concurrently, there is afaik sufficient synchronization between the card mark and the reference write on that in the code (in HeapRegion::oops_on_card_seq_iterate_careful()). The refinement threads do not modify or assume any value of the buffer's members, so there does not seem to be a need in the write barrier for further synchronization. Refinement threads and mutator threads take a lock to push and pop a new buffer (in SharedRuntime::g1_wb_post) to and from the buffer queue, which means that there is at least one cmpxchg/synchronization between the thread pushing the buffer and the refinement thread popping it. Which should be sufficient that the buffer's members are visible to all participants of this exchange. Note that GC threads do access the buffers, but there is a safepointing operation between the accesses. That should cover the eventualities. You can now tell us where I am wrong :) > Hmm, ok that seems to be two questions. Let's make that a starter for 10 > and a bonus for whoever buzzes fastest. Not sure if that answers your question. So what did I win? :-) Thanks, Thomas From aleksey.shipilev at oracle.com Mon Jul 20 13:52:15 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 20 Jul 2015 16:52:15 +0300 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final Message-ID: <55ACFD0F.30207@oracle.com> Hi, On the road from Unsafe to VarHandles lies a small deficiency in C1 Class.cast/isInstance optimization: the canonicalizer folds constant class perfectly when it is coming from "inlined" constant, but not from static final, because the constant "shapes" are different: https://bugs.openjdk.java.net/browse/JDK-8131782 And here is the fix: http://cr.openjdk.java.net/~shade/8131782/webrev.01/ (There is another ClassConstant shape, which is, AFAIU, the constant "receiver" class as discovered within the static method. It does not seem to appear in current java.lang.Class, and there seems to be no way for users to stumble upon such a pattern. Which also means I can't do a targeted test for it.) Testing: * JPRT -testset hotspot on all open platforms; * Targeted benchmarks (see JIRA ticket) Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Mon Jul 20 14:14:42 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 20 Jul 2015 15:14:42 +0100 Subject: Query regarding ordering in G1 post-write barrier In-Reply-To: <1437399887.2272.76.camel@oracle.com> References: <55ACF614.5020103@redhat.com> <1437399887.2272.76.camel@oracle.com> Message-ID: <55AD0252.3040007@redhat.com> On 20/07/15 14:44, Thomas Schatzl wrote: > > . . . > > Refinement threads and mutator threads take a lock to push and pop a new > buffer (in SharedRuntime::g1_wb_post) to and from the buffer queue, > which means that there is at least one cmpxchg/synchronization between > the thread pushing the buffer and the refinement thread popping it. > > . . . > > Note that GC threads do access the buffers, but there is a safepointing > operation between the accesses. > > That should cover the eventualities. You can now tell us where I am > wrong :) Thanks for the explanation. I don't doubt that you (and the code) are right -- but I wanted to understand why. >> Hmm, ok that seems to be two questions. Let's make that a starter for 10 >> and a bonus for whoever buzzes fastest. > > Not sure if that answers your question. So what did I win? :-) Why glory***, naturally. regards, Andrew Dinn ----------- *** n.b. to find out what 'glory' means see Through The Looking Glass From Alexander.Alexeev at caviumnetworks.com Mon Jul 20 14:38:31 2015 From: Alexander.Alexeev at caviumnetworks.com (Alexeev, Alexander) Date: Mon, 20 Jul 2015 14:38:31 +0000 Subject: RFR: aarch64: Typo in SHA intrinsics flags handling code for aarch64 Message-ID: Hello Please review provided patch and sponsor if approved. Problem: SHA flags verification code checks condition for UseSHA256Intrinsics, but corrects UseSHA1Intrinsics. The patch: http://cr.openjdk.java.net/~aalexeev/1/webrev.00/ Regards, Alexander -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleksey.shipilev at oracle.com Mon Jul 20 14:51:12 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 20 Jul 2015 17:51:12 +0300 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55AA0567.6070602@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> Message-ID: <55AD0AE0.3060803@oracle.com> Hi Dean, Thanks for taking a look! Silly me, I should have left the call patching cases intact, because you're right, we should be able to patch the nops partially while still producing the correct instruction stream. Therefore, I reverted the cases where we do nop-ing for *instruction* patching, and added the comment there. Other places seem to use the nop sequences to provide the alignment, not for the general patching. Especially interesting for us is the case of aligning the patcheable immediate in the existing call. C2 does the nops in these cases. New webrev: http://cr.openjdk.java.net/~shade/8131682/webrev.01/ Testing: * JPRT -testset hotspot on open platforms; * Targeted benchmarks, plus eyeballing the assembly; Thanks, -Aleksey On 18.07.2015 10:51, Dean Long wrote: > I think we should distinguish the different uses and treat them > accordingly: > > 1) padding nops for patching, executed > > We need to be careful about inserting a fat nop here, if later patching > overwrites only part of the fat nop, resulting in an illegal intruction. > > 2) padding nops for patching, never executed > > It should be safe insert a fat nop here, but there's no point if the > nops are not reachable and never executed. > > > 3) alignment nops, never patched, executed > > Fat nops are fine, but on some CPUs branching may be even better, so I > suggest using align() for this, and letting align() decide what to > generate. The change in check_icache() could use a version of align > that takes the target offset as an argument: > > 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size); > > 4) alignment nops, never patched, never executed > > Doesn't matter what we emit here, but we might as well make it > understandable by humans using a debugger. > > > I believe the patching nops in c1_CodeStubs_x86.cpp and > c1_LIRAssembler.cpp are patched concurrently while the code is running, > not at a safepoint, so it's not clear to me if it's safe to use fat nops > on x86. I would consider those changes unsafe on x86 without further > analysis of what happens during patching. > > dl > > On 7/17/2015 6:29 AM, Aleksey Shipilev wrote: >> Hi there, >> >> C1 is not very good at inlining and intrisifying methods, and hence the >> call performance is important there. One nit that we can see in the >> generated code on x86 is that C1 uses the single-byte nops, even for >> long nop strides. >> >> This improvement fixes that: >> https://bugs.openjdk.java.net/browse/JDK-8131682 >> http://cr.openjdk.java.net/~shade/8131682/webrev.00/ >> >> Testing: >> - JPRT -testset hotspot on open platforms >> - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 >> >> (I understand the symmetric change is going to be needed in closed >> parts, but let's polish the open part first). >> >> Thanks, >> -Aleksey >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From john.r.rose at oracle.com Mon Jul 20 21:14:59 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 20 Jul 2015 14:14:59 -0700 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: <55ACFD0F.30207@oracle.com> References: <55ACFD0F.30207@oracle.com> Message-ID: On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev wrote: > > Hi, > > On the road from Unsafe to VarHandles lies a small deficiency in C1 > Class.cast/isInstance optimization: the canonicalizer folds constant > class perfectly when it is coming from "inlined" constant, but not from > static final, because the constant "shapes" are different: > https://bugs.openjdk.java.net/browse/JDK-8131782 > > And here is the fix: > http://cr.openjdk.java.net/~shade/8131782/webrev.01/ > > (There is another ClassConstant shape, which is, AFAIU, the constant > "receiver" class as discovered within the static method. It does not > seem to appear in current java.lang.Class, and there seems to be no way > for users to stumble upon such a pattern. Which also means I can't do a > targeted test for it.) > > Testing: > * JPRT -testset hotspot on all open platforms; > * Targeted benchmarks (see JIRA ticket) I suggest a deeper fix, to the factory that produces the oddly formatted constant. That may help with other, similar constant folding problems. ? John diff --git a/src/share/vm/c1/c1_ValueType.cpp b/src/share/vm/c1/c1_ValueType.cpp --- a/src/share/vm/c1/c1_ValueType.cpp +++ b/src/share/vm/c1/c1_ValueType.cpp @@ -153,7 +153,19 @@ case T_FLOAT : return new FloatConstant (value.as_float ()); case T_DOUBLE : return new DoubleConstant(value.as_double()); case T_ARRAY : // fall through (ciConstant doesn't have an array accessor) - case T_OBJECT : return new ObjectConstant(value.as_object()); + case T_OBJECT : { + // FIXME: use common code with GraphBuilder::load_constant + ciObject* obj = value.as_object(); + if (obj->is_null_object()) + return objectNull; + if (obj->is_loaded()) { + if (obj->is_array()) + return new ArrayConstant(obj->as_array()); + else if (obj->is_instance()) + return new InstanceConstant(obj->as_instance()); + } + return new ObjectConstant(obj); + } } ShouldNotReachHere(); return illegalType; From john.r.rose at oracle.com Mon Jul 20 23:05:07 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 20 Jul 2015 16:05:07 -0700 Subject: On constant folding of final field loads In-Reply-To: <5592E75A.1000500@oracle.com> References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com> <559143D2.2020201@oracle.com> <5592662C.40400@oracle.com> <5592E75A.1000500@oracle.com> Message-ID: On Jun 30, 2015, at 12:00 PM, Vladimir Ivanov wrote: > > Aleksey, > >>>> Big picture question: do we actually care about propagating final field >>>> values once the object escaped (and in this sense, available to be >>>> introspected by the compiler)? >>>> >>>> Java memory model does not guarantee the final field visibility when the >>>> object had escaped. The very reason why deserialization works is because >>>> the deserialized object had not yet been published. >>>> >>>> That is, are we in line with the spec and general expectations by >>>> folding the final values, *and* not deoptimizing on the store? >>> Can you elaborate on your point and interaction with JMM a bit? >>> >>> Are you talking about not tracking constant folded final field values at >>> all, since there are no guarantees by JMM such updates are visible? >> >> Yup. AFAIU the JMM, there is no guarantees you would see the updated >> value for final field after the object had leaked. So, spec-wise you may >> just use the final field values as constants. I think the only reason >> you have to do the dependency tracking is when constant folding depends >> on instance identity. >> >> So, my question is, do we knowingly make a goodwill call to deopt on >> final field store, even though it is not required by spec? I am not >> opposing the change, but I'd like us to understand the implications better. > That's a good question. I believe that the JMM doesn't give users any hope for changing the value of a final field, apart from objects under initialization, and the specific cases of System.setErr and its two evil twins. (Did I miss a third case? There aren't many.) Rather than wait for the unthinkable and throw a de-opt, I would prefer to make more positive checks against final field changing, and throw a suitable exception when (if ever) an application sets a final field not in a scenario envisioned by the JMM. The value of this would be that whatever context markers or annotations we use to make these checks will also help guide the *suppression* of the final field folding optimization. > I consider it more like a quality of implementation aspect. Neither Reflection nor Unsafe APIs are part of JVM/JLS spec, so I don't think possibility of final field updates should be taken into account there. Reflection allows apps. to emulate the source semantics of Java programs, and (independently) it provides access to some run-time metadata. Whatever it does with final should correspond (within reason) to source semantics. Unsafe is whatever we want it to be, as a simple, well-factored set of building blocks to implement low-level JVM operations and (independently) provide access to some run-time features of the hardware platform. Therefore, Unsafe and Reflection are partially coupled to final semantics. With that said, I think it may be undesirable to push final-bit checking into the Unsafe API. Unsafe loads and stores should map to single memory instructions (with doubly-indexed, unscaled addresses). If we add extra "tag" bits to (say) offsets, we will have to "untag" those offsets when the instruction executes (if the offsets are not JIT-time constants); that is an extra instruction. > In order to avoid surprises and inconsistencies (old value vs new value depending on execution path) which are *very* hard to track down, VM should either completely forbid final field changes or keep track of them and adapt accordingly. I like the "forbid" option, also known as "fail fast". I think (in general) we should (where we can) remove indeterminate behavior from the JVM specification, such as "what happens when I store a new value to a final at an unexpected time". We have enough bits in the object header to encode frozen-ness. This is an opposite property: slushiness-of-finals. We could require that the newInstance operation used by deserialization would create slushy objects. (The normal new/ sequence doesn't need this.) Ideally, we would want the deserializer to issue an explicit "publish" operation, which would clear the slushy flag. JITs would consult that flag to gate final-folding. Reflection (and other users of Unsafe) would consult the flag and throw a fail-fast error if it failed. There would have to be some way to limit the time an object is in the slushy state, ideally by enforcing an error on deserializers who neglect to publish (or discard) a slushy object. For example, we could require an annotation on deserialization methods, as we do today on caller-sensitive methods. That's the sort of thing I would prefer to see, to remove indeterminate behavior. > >> For example, I can see the change gives rise to some interesting >> low-level coding idioms, like: >> >> final boolean running = true; >> Field runningField = resolve(...); // reflective >> >> // run stuff for minutes >> void m() { >> while (running) { // compiler hoists, turns into while(true) >> // do stuff >> } >> } >> >> void hammerTime() { >> runningField.set(this, false); // deopt, break the loop! >> } >> >> Once we allow users to go crazy like that, it would be cruel to >> retract/break/change this behavior. You can simulate this (very interesting) pattern using the "target" variable of a MutableCallSite. I.e., the "fold then deopt" use case is supported by MutableCallSite.setTarget. Those variable semantics should *not* be overloaded on final. That pattern, if driven by a special variable, deserves a new kind of variable. The key parameters would be 1) allowed state transitions between blank, set, reset, dead, and 2) expected frequency of various transitions. The frequencies are guesses, not user contracts, and the JVM would have to measure and retune to cope with surprises. (One thing I wonder about: What could a "volatile final" be? My best suggestion is a moderate extension of blank finals: http://cr.openjdk.java.net/~jrose/draft/lazy-final.html ) >> But I speculate those cases are not pervasive. By and large, people care >> about final ops to jump through the barriers. For example, the final >> load can be commonned through the acquires / control flow. See e.g.: >> http://psy-lob-saw.blogspot.ru/2014/02/when-i-say-final-i-mean-final.html > > >>>>> Regarding alternative approaches to track the finality, an offset bitmap >>>>> on per-class basis can be used (containing locations of final fields). >>>>> Possible downsides are: (1) memory footprint (1/8th of instance size per >>>>> class); and (2) more complex checking logic (load a relevant piece of a >>>>> bitmap from a klass, instead of checking locally available offset >>>>> cookie). The advantage is that it is completely transparent to a user: >>>>> it doesn't change offset translation scheme. >>>> >>>> I like this one. Paying with slightly larger memory footprint for API >>>> compatibility sounds reasonable to me. >>> >>> I don't care about cases when Unsafe API is abused (e.g. raw memory >>> writes on absolute address or arbitrary offset in an object). In the >>> end, it's unsafe API, right? :-) Today's abuse = tomorrow's use. Whatever we might want to do with a memory instruction is a possible valid use for Unsafe. For Project Panama I expect we will be using the managed heap to store temporary native values. The envelope will be something like a new long[2], but the layout (after the envelope header = array base) will *not* be something the JVM knows about in detail; it will be sliced up by Unsafe operations into native bits and bytes. And likewise with malloc-buffers (where the whole VA is stuffed in the offset). >> Yeah, but with millions of users, we are in a bit of a (implicit) >> compatibility bind here ;) > > That's why I deliberately tried to omit compatibility aspect discussion for now :-) > > Unsafe is unique: it's not a supported API, but nonetheless many people rely on it. It means we can't throw it away (even in a major release), but still we are not as limited as with official public API. > > As part of Project Jigsaw there's already an attempt to do an incompatible change for Unsafe API. Depending on how it goes, we can get some insights how to address compatibility concerns (e.g. preserve original behavior in Java 8 compatibility mode). > > What I'm trying to understand right now, before diving into compatibility details, is whether Unsafe API allows offset encoding scheme change itself and what can be done to make it happen. The decision to make offsets opaque was mine; the idea was to hide more details of object layout, for example in case the JVM ever used non-flat layouts for objects. (It never has. It might for objects containing larger value types; we don't know yet.) At least some offsets want to be occur in arithmetic sequences (arrays and now misaligned accesses). Of course 64 bits can encode a lot of stuff, so it would be possible to mix together both symbolic information (type tags, finality and other mode tags) with pure offset or address information. Over 32 bits of arithmetic sequence range can co-exist with this, by putting the tags at either end of the word. But (back to my earlier comment) this makes it hard to compile Unsafe ops as single instructions. Folding tags into offsets will make it harder for Panama-type APIs to perform address arithmetic (they will have to work around the tags). The Unsafe API would have to expose operations like offsetAdd(long o, int delta) and offsetDifference(long o1, long o2). > Though offset value is explicitly described in API as an opaque offset cookie, I spotted 2 inconsistencies in the API itself: > > * Unsafe.get/set*Unaligned() require absolute offsets; > These methods were added in 9, so haven't leaked into public yet. Yep. That seems to push for a high-tag (color bits in the MSB of the offset), or (my preference) no tag or separate tag. You could also copy the alignment bits into the LSB to co-exist with a tag. (The "separate tag" option means something like having a query for the "tag" as well as the base and offset of a variable. The operations getInt, etc., would take an optional third argument, which would be the tag associated with the base and offset. This would allow address arithmetic to remain trivial, at the expense of retooling uses of Unsafe that need to be sensitive to tagging concerns.) > Andrew, can you comment on why you decided to stick with absolute offsets and not preserving Unsafe.getInt() addressing scheme? (The outcome is that the unaligned guys have the same signatures as the aligned ones.) > * Unsafe.copyMemory() > Source and destination addressing operate on offset cookies, but amount of copied data is expressed in bytes. In order to do bulk copies of consecutive memory blocks, the user should be able to convert offset cookies to byte offset and vice versa. There's no way to do that with current API. Right. > Are you aware of any other use cases when people rely on absolute offsets? > > I thought about VarHandles a bit and it seems they aren't a silver bullet - they should be based on Unsafe (or stripped Unsafe equivalent) anyway. > > Unsafe.fireDepChange is a viable option for Reflection and MethodHandles. I'll consider it during further explorations. The downside is that it puts responsibility of tracking final field changes on a user, which is error-prone. There are places in JDK where Unsafe is used directly and they should be analyzed whether a final field is updated or not on a case-by-case basis. Idea: If we go with a three-argument version of getInt, the legacy two-argument version could do a more laborious check. The best way to motivate users of Unsafe to refresh their code (probably) is to improve performance. Recovering lost performance (due to increased safety) is a tactic we can use too, although it is less enjoyable all around. I wish we had value types already; we could make a lot of this clearer if we were able to give cookies their own opaque 64-bit type. (Hacky idea: Use "double" as an envelope type for a second kind of cookie, since "long" is taken. Hacky idea killer: There is an implicit conversion from long to double, which is probably harmful.) > > It's basically opt-in vs opt-out approaches. I'd prefer a cleaner approach, if there's a solution for compatibility issues. > >>> So, my next question is how to proceed. Does changing API and providing >>> 2 set of functions working with absolute and encoded offsets solve the >>> problem? Or leaving Unsafe as is (but clarifying the API) and migrating >>> Reflection/j.l.i to VarHandles solve the problem? That's what I'm trying >>> to understand. >> >> I would think Reflection/j.l.i would eventually migrate to VarHandles >> anyway. Paul? The interim solution for encoding final field flags >> shouldn't leak into (even Unsafe) API, or at least should not break the >> existing APIs. >> >> I further think that an interim solution makes auxiliary single >> Unsafe.fireDepChange(Field f / long addr) or something, and uses it >> along with the Unsafe calls in Reflection/j.l.i, when wrappers know they >> are dealing with final fields. In other words, should we try to reuse >> the knowledge those wrappers already have, instead of trying to encode >> the same knowledge into offset cookies? > > >>>>> II. Managing relations between final fields and nmethods >>>>> Another aspect is how expensive dependency checking becomes. >> >>>> Isn't the underlying problem being the dependencies are searched >>>> linearly? At least in ConstantFieldDep, can we compartmentalize the >>>> dependencies by holder class in some sort of hash table? >>> In some cases (when coarse-grained (per-class) tracking is used), linear >>> traversal is fine, since all nmethods will be invalidated. >>> >>> In order to construct a more efficient data structure, you need a way to >>> order or hash oops. The problem with that is oops aren't stable - they >>> can change at any GC. So, either some stable value should be associated >>> with them (System.identityHashCode()?) or dependency tables should be >>> updated on every GC. >> >> Yeah, like Symbol::_identity_hash. > Symbol is an internal VM entity. Oops are different. They are just pointers to Java object (OOP = Ordinary Object Pointer). The only doable way is piggyback on object hash code. I won't dive into details here, but there are many intricate consequences. We sometimes use binary search instead of identity_hash, e.g., in the CI. We could create a data structure which carries a GC generation counter, and re-sort lazily as needed. In principle, the GC could help with re-sorting. The API could look like a fixed-sized table containing a pair of aligned arrays: Object[] key, int[] value. The arrays would have to be encapsulated, since they can change order at any moment (if GC kicks in), but it would be reasonable to work with snapshots of them for bulk queries. The underlying arrays could be ordinary Java arrays, perhaps with blocking to facilitate growth. Native methods or specially-marked non-interruptable methods would perform the required transactions. Access cost would be O(log(N)). This feels like it might be useful for things besides dependencies. >>> Unless existing machinery can be sped up to appropriate level, I >>> wouldn't consider complicating things so much. >> >> Okay. I just can't escape the feeling we keep band-aiding the linear >> searches everywhere in VM on case-to-case basis, instead of providing >> the asymptotic guarantees with better data structures. > Well, class-based dependency contexts have been working pretty well for KlassDeps. They worked pretty well for CallSiteDeps as well, once a more specific context was used (I introduced a specialized CallSite instance-based implementation because it is simpler to maintain). > > It's hard to come up with a narrow enough class context for ConstantFieldDeps, so, probably, it's a good time to consider a different approach to index nmethod dependencies. But assuming final field updates are rare (with the exception of deserialization), it can be not that important. > >>> The 3 optimizations I initially proposed allow to isolate >>> ConstantFieldDep from other kinds of dependencies, so dependency >>> traversal speed will affect only final field writes. Which is acceptable >>> IMO. >> >> Except for an overwhelming number of cases where the final field stores >> happen in the course of deserialization. What's particularly bad about >> this scenario is that you wouldn't see the time burned in the VM unless >> you employ the native profiler, as we discovered in Nashorn perf work. > Yes, deserialization is a good example. It's special because it operates on freshly created objects, which, as you noted, haven't escaped yet. It'd be nice if VM can skip dependency checking in such case (either automatically or with explicit hints). Agree. The "slushy bit" might help. Hard part: It would have to co-exist with identityHashCode (because deserialization uses that also IIRC). ? John > In order to diagnose performance problems with excessive dependency checking, VM can monitor it closely (UsePerfData counters + JFR events + tracing should provide enough information to spot issues). > >> Recapping the discussion in this thread, I think we would need to have a >> more thorough performance work for this change, since it touches the >> very core of the platform. I think many people outside the >> hotspot-compiler-dev understand some corner intricacies of the problem >> that we miss. JEP and outcry for public comments, maybe? > Yes, I planned to get quick feedback on the list and then file a JEP as a followup. > > Thanks again for the feedback, Aleksey! > > Best regards, > Vladimir Ivanov From edward.nevill at gmail.com Tue Jul 21 09:18:35 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 21 Jul 2015 10:18:35 +0100 Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64 In-Reply-To: References: Message-ID: <1437470315.1575.9.camel@mylittlepony.linaroharston> On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote: > Please review provided patch and sponsor if approved. > Problem: SHA flags verification code checks condition for > UseSHA256Intrinsics, but corrects UseSHA1Intrinsics. > The patch: > http://cr.openjdk.java.net/~aalexeev/1/webrev.00/ Hi Alexander, Thanks for fixing this. I will sponsor this patch. Here is the changeset. http://cr.openjdk.java.net/~enevill/8132010/webrev I have tested this before and after with hotspot jtreg Before: Test results: passed: 876; failed: 3; error: 7 After: Test results: passed: 877; failed: 2; error: 7 The 1 test fixed is the test compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java This regression was introduced in the following changeset http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 Could I have an official reviewer for this please. As this is a trivial 1 liner I think one reviewer should be sufficient. All the best, Ed. From aleksey.shipilev at oracle.com Tue Jul 21 10:05:38 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 21 Jul 2015 13:05:38 +0300 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: References: <55ACFD0F.30207@oracle.com> Message-ID: <55AE1972.4050106@oracle.com> On 21.07.2015 00:14, John Rose wrote: > On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev wrote: >> On the road from Unsafe to VarHandles lies a small deficiency in C1 >> Class.cast/isInstance optimization: the canonicalizer folds constant >> class perfectly when it is coming from "inlined" constant, but not from >> static final, because the constant "shapes" are different: >> https://bugs.openjdk.java.net/browse/JDK-8131782 > I suggest a deeper fix, to the factory that produces the oddly formatted constant. > That may help with other, similar constant folding problems. All right, let's do that! http://cr.openjdk.java.net/~shade/8131782/webrev.02/ I respinned it through JRPT and my targeted benchmarks, and it performs the same as previous patch. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aph at redhat.com Tue Jul 21 10:53:11 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 21 Jul 2015 11:53:11 +0100 Subject: On constant folding of final field loads In-Reply-To: References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com> <559143D2.2020201@oracle.com> <5592662C.40400@oracle.com> <5592E75A.1000500@oracle.com> Message-ID: <55AE2497.8040405@redhat.com> On 21/07/15 00:05, John Rose wrote: >> > Andrew, can you comment on why you decided to stick with absolute >> > offsets and not preserving Unsafe.getInt() addressing scheme? > (The outcome is that the unaligned guys have the same signatures as > the aligned ones.) Indeed it does. I had a look around at the way Unsafe.getXXX(Object, long) is used. One of the most common usages is with arrays. There, the offset is the result of address arithmetic so it cannot be an opaque cookie, and there is no way to make it so without breaking all usages with arrays. Also there is the guarantee that you can use Unsafe.getXXXUnaligned(null, address) to fetch data from an absolute address in memory. To discover that this latter usage is explicitly allowed surprised me, but it does mean that the offset can not be an opaque handle unless we special-case the null form. And I think we don't want to do that. Andrew. From tobias.hartmann at oracle.com Tue Jul 21 13:40:18 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 21 Jul 2015 15:40:18 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space Message-ID: <55AE4BC2.1090104@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8130309 http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ Problem: While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. Solution: Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. Testing: - Failing test - JPRT Thanks, Tobias From edward.nevill at gmail.com Tue Jul 21 14:27:52 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 21 Jul 2015 15:27:52 +0100 Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64 In-Reply-To: <1437470315.1575.9.camel@mylittlepony.linaroharston> References: <1437470315.1575.9.camel@mylittlepony.linaroharston> Message-ID: <1437488872.6057.3.camel@mylittlepony.linaroharston> On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote: > On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote: > > > Please review provided patch and sponsor if approved. > > Problem: SHA flags verification code checks condition for > > UseSHA256Intrinsics, but corrects UseSHA1Intrinsics. > > The patch: > > http://cr.openjdk.java.net/~aalexeev/1/webrev.00/ > > Hi Alexander, > > Thanks for fixing this. I will sponsor this patch. > > Here is the changeset. > > http://cr.openjdk.java.net/~enevill/8132010/webrev Please disregard the above webrev. I had outstanding outgoing changes. Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp http://cr.openjdk.java.net/~enevill/8132010/webrev.01 Sorry for the confusion, working on too many changesets at once. Ed. > > I have tested this before and after with hotspot jtreg > > Before: Test results: passed: 876; failed: 3; error: 7 > After: Test results: passed: 877; failed: 2; error: 7 > > The 1 test fixed is the test > > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java > > This regression was introduced in the following changeset > > http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 > > Could I have an official reviewer for this please. As this is a trivial > 1 liner I think one reviewer should be sufficient. > > All the best, > Ed. > > From zoltan.majo at oracle.com Tue Jul 21 14:32:57 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 21 Jul 2015 16:32:57 +0200 Subject: [aarch64-port-dev ] RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64 In-Reply-To: <1437488872.6057.3.camel@mylittlepony.linaroharston> References: <1437470315.1575.9.camel@mylittlepony.linaroharston> <1437488872.6057.3.camel@mylittlepony.linaroharston> Message-ID: <55AE5819.3070506@oracle.com> Hi, the fix looks good to me (I'm not a *R*eviewer). Thank you and best regards, Zoltan On 07/21/2015 04:27 PM, Edward Nevill wrote: > On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote: >> On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote: >> >>> Please review provided patch and sponsor if approved. >>> Problem: SHA flags verification code checks condition for >>> UseSHA256Intrinsics, but corrects UseSHA1Intrinsics. >>> The patch: >>> http://cr.openjdk.java.net/~aalexeev/1/webrev.00/ >> Hi Alexander, >> >> Thanks for fixing this. I will sponsor this patch. >> >> Here is the changeset. >> >> http://cr.openjdk.java.net/~enevill/8132010/webrev > Please disregard the above webrev. I had outstanding outgoing changes. > > Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp > > http://cr.openjdk.java.net/~enevill/8132010/webrev.01 > > Sorry for the confusion, working on too many changesets at once. > > Ed. > >> I have tested this before and after with hotspot jtreg >> >> Before: Test results: passed: 876; failed: 3; error: 7 >> After: Test results: passed: 877; failed: 2; error: 7 >> >> The 1 test fixed is the test >> >> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java >> >> This regression was introduced in the following changeset >> >> http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 >> >> Could I have an official reviewer for this please. As this is a trivial >> 1 liner I think one reviewer should be sufficient. >> >> All the best, >> Ed. >> >> > From alian567 at 126.com Tue Jul 21 14:39:56 2015 From: alian567 at 126.com (=?GBK?B?wO7Rqb78?=) Date: Tue, 21 Jul 2015 22:39:56 +0800 (CST) Subject: error when building hotspot in aarch64. Message-ID: <1ba1f4e5.fb28.14eb10e8755.Coremail.alian567@126.com> configure cmd: configure --openjdk-target=aarch64 --with-debug-level=slowdebug make error happen in frame_aarch64.cpp in frame::frame(....){ init(...); } init is undefined. is init function missing the implementation? -------------- next part -------------- An HTML attachment was scrubbed... URL: From edward.nevill at gmail.com Tue Jul 21 15:18:15 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 21 Jul 2015 16:18:15 +0100 Subject: RFR: 8131062: aarch64: add support for GHASH acceleration Message-ID: <1437491895.6739.17.camel@mylittlepony.linaroharston> Hi, http://cr.openjdk.java.net/~enevill/8131062/webrev.0/ adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions. This patch was contributed by alexander.alexeev at caviumnetworks.com Note that the 128 pmull instructions are not supported on all aarch64. The patch uses the HWCAP_PMULL bit from getauxv() to determine whether the 128 bit pmull is supported. I have tested this with jtreg / hotspot. Without patch: Test results: passed: 876; failed: 3; error: 9 With patch: Test results: passed: 876; failed: 3; error: 9 In both cases the set of failing/error tests is identical. I have done some performance testing using TestAESMain from the jtreg/hotspot test suite. Here are the results I get:- java -XX:-UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain encode time = 66945.63635, decode time = 34085.08754 java -XX:+UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain encode time = 43469.38244, decode time = 17783.6603 This is an improvement of 54% and 92% respectively. Alexander has done some benchmarking to measure the raw performance improvement of GHASH on its own using the following benchmark. http://cr.openjdk.java.net/~enevill/8131062/GHash.java Here are the results he gets:- -XX:-UseGHASHIntrinsics. Benchmark Mode Cnt Score Error Units GHash.calculateGHash avgt 5 118.688 ? 0.009 us/op -XX:+UseGHASHIntrinsics Benchmark Mode Cnt Score Error Units GHash.calculateGHash avgt 5 21.164 ? 1.763 us/op This represents a 5.6X speed increase on the raw GHASH performance. Thanks your your review, Ed. From zoltan.majo at oracle.com Tue Jul 21 15:19:26 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 21 Jul 2015 17:19:26 +0200 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> Message-ID: <55AE62FE.4070502@oracle.com> Hi John, thank you for the feedback! On 07/15/2015 09:11 PM, John Rose wrote: > On Jul 15, 2015, at 11:44 AM, Vladimir Kozlov > > wrote: >> >> Looks good to me. We still need John's opinion about state transition. > > Just sent a 1-1 reply; here it is FTR. > > On Jul 14, 2015, at 9:42 AM, Zolt?n Maj? > wrote: >> >> So far, I tried Vladimir's solution of going into VM state in >> Compile::make_vm_intrinsic() [2] and it works well. >> >> What we could also do is >> - (1) list in the switch statement in is_intrinsic_available_for() >> the intrinsic ids of all methods of interest (similarly to the way we >> do it for C1); that would eliminate the need to make checks based on >> a method's holder; >> - (2) for the DisableIntrinsic checks (that need to call >> CompilerOracle::has_option_value()we could define a separate method >> that is called directly from a WhiteBox context and through the CI >> from make_vm_intrinsic. > > This is going to be a good cleanup. But it is hard to follow, so please > regard my comments as tentative. > > Some comments: > > I think the term "_for" is a noise word as deprecated in: > https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-NamingNaming I removed the "_for" suffix from relevant method names. It seems that the style guide does not list "_for" as a noise word. I tried to update the page, but I don't seem to have the necessary access rights. > > I agree with the tendency to factor stuff (when possible) away from > the guts of the compilers. > > Suggest Compile::intrinsic_does_virtual_dispatch_for be moved to > vmIntrinsics::does_virtual_dispatch. > It's really part of the vmIntrinsics contract. Same for can_trap (or > whatever it is called). > If it can't be wedged into vmSymbols.cpp, then at least consider > abstractCompiler.cpp. I relocated the following methods: Compile::intrinsic_does_virtual_dispatch_for -> vmIntrinsics::does_virtual_dispatch Compile::intrinsic_predicates_needed_for -> vmIntrinsics::predicates_needed GraphBuilder::intrinsic_can_trap-> vmIntrinsics::can_trap GraphBuilder::intrinsic_preserves_state -> vmIntrinsics::preserves_state > > Similar comment about is_intrinsic_available[_for]. Because of the > dependency > on the compiler tier, it has to be virtual, of course. OK, I changed the name of AbstractCompiler::is_intrinsic_available_for to AbstractCompiler::is_intrinsic_available. The method is virtual and is overridden by Compiler and by C2Compiler. > Suggest a static > vmIntrinsics::is_disabled_by_flags, to check for compiler-independent > disabling logic. > Method::is_intrinsic_disabled is a good thought, but I would suggest > making it > a static method on vmIntrinsic, because the Method* pointer is just a > wrapper around > the intrinsic_id. Stripping the Method* would let you avoid a > VM_ENTRY_MARK > in ciMethod::* if the context argument if null (true for C1?). I managed to extract most of the flag-disabling logic (the parts common to C1 and C2) to the vmIntrinsics::is_disabled_by_flags static method. The compiler-specific parts are implemented by the hierarchy starting at AbstractCompiler::is_intrinsic_disabled_by_flag. Some of the compiler-specific flag-disabling logic might be also considered as an inconsistency between C1 and C2 (please see below). > > The "flag soup" logic in C2 is frustrating, and may defeat an attempt > to factor > the -XX flag checking into vmIntrinsics, but I encourage you to try. > The Matcher calls can be layered on separately, in C2-specific code. > The vm_version checks can go in the same C1/C2-specific layer > as the C2 matcher checks. (Or perhaps factored into abstractCompiler, > but that may be overkill.) I factored the Matcher calls (for C2) and the vm_version checks (for C1) into the hierarchy starting at AbstractCompiler::is_intrinsic_supported(). Regarding the compiler-specific flag-disabling logic, we can make the following observations: 1) The DisableIntrinsic flag is C2-specific therefore it is currently included in C2Compiler::is_intrinsic_disabled_by_flag. 2) The InlineNatives flag disables most but not all intrinsics. There are some intrinsics (implemented by both C1 and C2) but that -XX:-InlineNatives turns off for C1 but leaves unaffected for C2. 3) The _getClass intrinsic (implemented by both C1 and C2) is turned off by -XX:-InlineClassNatives for C1 and is left unaffected by C2. 4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject, _compareAndSwapLong, and _compareAndSwapInt intrinsics are turned off by -XX:-InlineUnsafeOps for C2 and are unaffected by C1. Compiler-specific functionality related to observations (1)-(4) is currently implemented in the hierarchy starting at AbstractCompiler::is_intrinsic_disabled_by_flag. If we decide to standardize some parts of flag processing, we can move the relevant functionality to vmIntrinsics::is_disabled_by_flag(). > > Regarding your original question: I would prefer that the VM_ENTRY > logic be confined to the CI, but there is no functional reason the > compiler itself can't do a native-to-VM transition. Thank you for clarifying! Currently, C1 can perform all checks without going into VM mode. For C2, only vmIntrinsics::is_disabled_by_flags() can be executed in native mode, it seems that the rest of the checks needed by C2 must be performed in VM mode. The logic in Compile::make_vm_intrinsic reflects these considerations. Here is the newest webrev: - top: http://cr.openjdk.java.net/~zmajo/8130832/top/ - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ Testing: - all JTREG tests run locally (linux-x86_64), all pass; - all JPRT tests (testset hotspot, including the newly added test, compiler/intrinsics/IntrinsicAvailableTest.java), all tests pass; - all tests in hotspot/test/compiler/intrinsics/mathexact on aarch64, all tests pass. Thank you! Best regards, Zoltan > > ? John > From vladimir.x.ivanov at oracle.com Tue Jul 21 16:29:08 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 21 Jul 2015 19:29:08 +0300 Subject: On constant folding of final field loads In-Reply-To: <55AE2497.8040405@redhat.com> References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com> <559143D2.2020201@oracle.com> <5592662C.40400@oracle.com> <5592E75A.1000500@oracle.com> <55AE2497.8040405@redhat.com> Message-ID: <55AE7354.1040504@oracle.com> >>>> Andrew, can you comment on why you decided to stick with absolute >>>> offsets and not preserving Unsafe.getInt() addressing scheme? > >> (The outcome is that the unaligned guys have the same signatures as >> the aligned ones.) > > Indeed it does. > > I had a look around at the way Unsafe.getXXX(Object, long) is used. > One of the most common usages is with arrays. There, the offset is > the result of address arithmetic so it cannot be an opaque cookie, and > there is no way to make it so without breaking all usages with arrays. Yes, it can't be a completely opaque cookie, but it doesn't mean it should be a raw offset. Array addressing mode is the following: BASE + index * SCALE where both BASE & SCALE are produced by Unsafe. Such addressing scheme permits encodings which are linear in byte offset. > Also there is the guarantee that you can use > Unsafe.getXXXUnaligned(null, address) to fetch data from an absolute > address in memory. To discover that this latter usage is explicitly > allowed surprised me, but it does mean that the offset can not be an > opaque handle unless we special-case the null form. And I think we > don't want to do that. That's a valid argument. 1-1 correspondence between Unsafe methods & machine ops is appealing. I'm curious do you use any additional addressing modes for unaligned variants? getXXX(Ojbect, long) supports: (1) NULL + address, (2) obj + offset, and (3) base + index * scale (for arrays). Anything besides that for unalinged? Best regards, Vladimir Ivanov From aph at redhat.com Tue Jul 21 16:51:43 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 21 Jul 2015 17:51:43 +0100 Subject: On constant folding of final field loads In-Reply-To: <55AE7354.1040504@oracle.com> References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com> <559143D2.2020201@oracle.com> <5592662C.40400@oracle.com> <5592E75A.1000500@oracle.com> <55AE2497.8040405@redhat.com> <55AE7354.1040504@oracle.com> Message-ID: <55AE789F.9060902@redhat.com> On 07/21/2015 05:29 PM, Vladimir Ivanov wrote: >>>>> Andrew, can you comment on why you decided to stick with absolute >>>>> offsets and not preserving Unsafe.getInt() addressing scheme? >> >>> (The outcome is that the unaligned guys have the same signatures as >>> the aligned ones.) >> >> Indeed it does. >> >> I had a look around at the way Unsafe.getXXX(Object, long) is used. >> One of the most common usages is with arrays. There, the offset is >> the result of address arithmetic so it cannot be an opaque cookie, and >> there is no way to make it so without breaking all usages with arrays. > Yes, it can't be a completely opaque cookie, but it doesn't mean it > should be a raw offset. > > Array addressing mode is the following: > BASE + index * SCALE > > where both BASE & SCALE are produced by Unsafe. > > Such addressing scheme permits encodings which are linear in byte offset. Mmm, okay. I see what you mean: clever tricks are possible with some encodings. Even tag bits. However, I can guarantee you that there is code in the JDK which knows that ARRAY_BYTE_INDEX_SCALE == 1. And lots of other places too, I'm sure. >> Also there is the guarantee that you can use >> Unsafe.getXXXUnaligned(null, address) to fetch data from an absolute >> address in memory. To discover that this latter usage is explicitly >> allowed surprised me, but it does mean that the offset can not be an >> opaque handle unless we special-case the null form. And I think we >> don't want to do that. > > That's a valid argument. 1-1 correspondence between Unsafe methods & > machine ops is appealing. > > I'm curious do you use any additional addressing modes for unaligned > variants? getXXX(Object, long) supports: (1) NULL + address, (2) obj + > offset, and (3) base + index * scale (for arrays). Anything besides that > for unaligned? I don't think that getXXXUnaligned places any restrictions on its callers at all. There aren't any other forms used in the JDK. Andrew. From serkan at hazelcast.com Sat Jul 11 21:58:46 2015 From: serkan at hazelcast.com (=?UTF-8?B?U2Vya2FuIMOWemFs?=) Date: Sun, 12 Jul 2015 00:58:46 +0300 Subject: Array accesses using sun.misc.Unsafe cause data corruption or SIGSEGV In-Reply-To: References: Message-ID: Hi all, I have created a webrev for review including the patch and zipped it then put it to my Dropbox. Here its public link: https://www.dropbox.com/s/9122hbk1vdryvby/JDK-8087134.zip?dl=0 Also attached to this mail. Regards. On Sat, Jul 4, 2015 at 9:06 PM, Serkan ?zal wrote: > Hi, > > I have added some logs to show that problem is caused by double scaling of > offset (index) > > Here is my updated (log messages added) reproducer code: > > > int count = 100000; > long size = count * 8L; > long baseAddress = unsafe.allocateMemory(size); > System.out.println("Start address: " + Long.toHexString(baseAddress) + > ", End address: " + Long.toHexString(baseAddress + > size)); > > for (int i = 0; i < count; i++) { > long address = baseAddress + (i * 8L); > System.out.println( > "Normal: " + Long.toHexString(address) + ", " + > "If double scaled: " + Long.toHexString(baseAddress + (i * 8L * > 8L))); > long expected = i; > unsafe.putLong(address, expected); > unsafe.getLong(address); > } > > > After sometime it crashes as > > > ... > Current thread (0x0000000002068800): JavaThread "main" [_thread_in_Java, > id=10412, stack(0x00000000023f0000,0x00000000024f0000)] > > siginfo: ExceptionCode=0xc0000005, reading address 0x0000000059061020 > ... > ... > > > And here is output of the execution until crash: > > Start address: 58bbcfa0, End address: 58c804a0 > Normal: 58bbcfa0, If double scaled: 58bbcfa0 > Normal: 58bbcfa8, If double scaled: 58bbcfe0 > Normal: 58bbcfb0, If double scaled: 58bbd020 > ... > ... > Normal: 58c517b0, If double scaled: 59061020 > > > As seen from the logs and crash dump, double scaled version of target > address (*If double scaled: 59061020*) is the same with the problematic > address (*siginfo: ExceptionCode=0xc0000005, reading address > 0x0000000059061020*) that causes to crash while accessing it. > > So I think, it is obvious that the crash is caused by wrong optimization > of index value since index is scaled two times (for *Unsafe::put* and > *Unsafe::get*) instead of only one time. Then double scaled index points > to invalid memory address. > > Regards. > > On Sun, Jun 14, 2015 at 2:39 PM, Serkan ?zal wrote: > >> Hi all, >> >> I had dived into the issue with JDK-HotSpot commits and >> the issue arised after this commit: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/a60a1309a03a >> >> Then I added some additional logs to *"vm/c1/c1_Canonicalizer.cpp"*: >> void Canonicalizer::do_UnsafeGetRaw(UnsafeGetRaw* x) { >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> tty->print_cr("Canonicalizer: do_UnsafeGetRaw id %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> } >> >> void Canonicalizer::do_UnsafePutRaw(UnsafePutRaw* x) { >> if (OptimizeUnsafes) do_UnsafeRawOp(x); >> tty->print_cr("Canonicalizer: do_UnsafePutRaw id %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> } >> >> >> So I run the test by calculating address as >> - *"int * long"* (int is index and long is 8l) >> - *"long * long"* (the first long is index and the second long is 8l) >> - *"int * int"* (the first int is index and the second int is 8) >> >> Here are the logs: >> *int * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 27, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 27, log2_scale = 3 >> *long * long:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 3 >> *int * int:*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 33: base = id 13, index = id 29, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 36: base = id 13, index = id 29, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 19: base = id 8, index = id 15, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 8, index = id 15, log2_scale = 0 >> >> As you can see, at the problematic runs (*"int * long"* and *"long * long"*) there are two scaling. >> One for *"Unsafe.put"* and the other one is for* "Unsafe.get"* and these instructions points to >> same *"base"* and *"index"* instructions. >> This means that address is scaled one more time because there should be only one scale. >> >> >> When I debugged the non-problematic run (*"int * int"*), >> I saw that *"instr->as_ArithmeticOp();"* is always returns *"null" *then *"match_index_and_scale"* method returns* "false"* always. >> So there is no scaling. >> static bool match_index_and_scale(Instruction* instr, >> Instruction** index, >> int* log2_scale) { >> ... >> >> ArithmeticOp* arith = instr->as_ArithmeticOp(); >> if (arith != NULL) { >> ... >> } >> >> return false; >> } >> >> >> Then I have added my fix attempt to prevent multiple scaling for Unsafe instructions points to same index instruction like this: >> void Canonicalizer::do_UnsafeRawOp(UnsafeRawOp* x) { >> Instruction* base = NULL; >> Instruction* index = NULL; >> int log2_scale; >> >> if (match(x, &base, &index, &log2_scale)) { >> x->set_base(base); >> x->set_index(index); // The fix attempt here // ///////////////////////////// >> if (index != NULL) { >> if (index->is_pinned()) { >> log2_scale = 0; >> } else { >> if (log2_scale != 0) { >> index->pin(); >> } >> } >> } // ///////////////////////////// >> x->set_log2_scale(log2_scale); >> if (PrintUnsafeOptimization) { >> tty->print_cr("Canonicalizer: UnsafeRawOp id %d: base = id %d, index = id %d, log2_scale = %d", >> x->id(), x->base()->id(), x->index()->id(), x->log2_scale()); >> } >> } >> } >> In this fix attempt, if there is a scaling for the Unsafe instruction, I pin index instruction of that instruction >> and at next calls, if the index instruction is pinned, I assummed that there is already scaling so no need to another scaling. >> >> After this fix, I rerun the problematic test (*"int * long"*) and it works with these logs: >> *int * long (after fix):*Canonicalizer: do_UnsafeGetRaw id 18: base = id 16, index = id 17, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 20: base = id 16, index = id 19, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 22: base = id 16, index = id 21, log2_scale = 0 >> Canonicalizer: do_UnsafeGetRaw id 24: base = id 16, index = id 23, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 35: base = id 13, index = id 14, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 37: base = id 13, index = id 14, log2_scale = 0 >> Canonicalizer: do_UnsafePutRaw id 21: base = id 8, index = id 11, log2_scale = 3 >> Canonicalizer: do_UnsafeGetRaw id 23: base = id 8, index = id 11, log2_scale = 0 >> >> I am not sure my fix attempt is a really fix or maybe there are better fixes. >> >> Regards. >> >> -- >> >> Serkan ?ZAL >> >> >>> Btw, (thanks to one my colleagues), when address calculation in the loop is >>> converted to >>> long address = baseAddress + (i * 8) >>> test passes. Only difference is next long pointer is calculated using >>> integer 8 instead of long 8. >>> ``` >>> for (int i = 0; i < count; i++) { >>> long address = baseAddress + (i * 8); // <--- here, integer 8 instead >>> of long 8 >>> long expected = i; >>> unsafe.putLong(address, expected); >>> long actual = unsafe.getLong(address); >>> if (expected != actual) { >>> throw new AssertionError("Expected: " + expected + ", Actual: " + >>> actual); >>> } >>> } >>> ``` >>> On Tue, Jun 9, 2015 at 1:07 PM Mehmet Dogan > wrote: >>> >* Hi all, >>> *> >>> >* While I was testing my app using java 8, I encountered the previously >>> *>* reported sun.misc.Unsafe issue. >>> *> >>> >* https://bugs.openjdk.java.net/browse/JDK-8076445 >>> *> >>> >* http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017685.html >>> *> >>> >* Issue status says it's resolved with resolution "Cannot Reproduce". But >>> *>* unfortunately it's still reproducible using "1.8.0_60-ea-b18" and >>> *>* "1.9.0-ea-b67". >>> *> >>> >* Test is very simple: >>> *> >>> >* ``` >>> *>* public static void main(String[] args) throws Exception { >>> *>* Unsafe unsafe = findUnsafe(); >>> *>* // 10000 pass >>> *>* // 100000 jvm crash >>> *>* // 1000000 fail >>> *>* int count = 100000; >>> *>* long size = count * 8L; >>> *>* long baseAddress = unsafe.allocateMemory(size); >>> *> >>> >* try { >>> *>* for (int i = 0; i < count; i++) { >>> *>* long address = baseAddress + (i * 8L); >>> *> >>> >* long expected = i; >>> *>* unsafe.putLong(address, expected); >>> *> >>> >* long actual = unsafe.getLong(address); >>> *> >>> >* if (expected != actual) { >>> *>* throw new AssertionError("Expected: " + expected + ", >>> *>* Actual: " + actual); >>> *>* } >>> *>* } >>> *>* } finally { >>> *>* unsafe.freeMemory(baseAddress); >>> *>* } >>> *>* } >>> *>* ``` >>> *>* It's not failing up to version 1.8.0.31, by starting 1.8.0.40 test is >>> *>* failing constantly. >>> *> >>> >* - With iteration count 10000, test is passing. >>> *>* - With iteration count 100000, jvm is crashing with SIGSEGV. >>> *>* - With iteration count 1000000, test is failing with AssertionError. >>> *> >>> >* When one of compilation (-Xint) or inlining (-XX:-Inline) or >>> *>* on-stack-replacement (-XX:-UseOnStackReplacement) is disabled, test is not >>> *>* failing at all. >>> *> >>> >* I tested on platforms: >>> *>* - Centos-7/openjdk-1.8.0.45 >>> *>* - OSX/oraclejdk-1.8.0.40 >>> *>* - OSX/oraclejdk-1.8.0.45 >>> *>* - OSX/oraclejdk-1.8.0_60-ea-b18 >>> *>* - OSX/oraclejdk-1.9.0-ea-b67 >>> *> >>> >* Previous issue comment ( >>> *>* https://bugs.openjdk.java.net/browse/JDK-8076445?focusedCommentId=13633043#comment-13633043 ) >>> *>* says "Cannot reproduce based on the latest version". I hope that latest >>> *>* version is not mentioning to '1.8.0_60-ea-b18' or '1.9.0-ea-b67'. Because >>> *>* both are failing. >>> *> >>> >* I'm looking forward to hearing from you. >>> *> >>> >* Thanks, >>> *>* -Mehmet Dogan- >>> *>* -- >>> *> >>> >* @mmdogan >>> *> >> >> >> -- >> Serkan ?ZAL >> Remotest Software Engineer >> GSM: +90 542 680 39 18 >> Twitter: @serkan_ozal >> > > > > -- > Serkan ?ZAL > Remotest Software Engineer > GSM: +90 542 680 39 18 > Twitter: @serkan_ozal > -- Serkan ?ZAL Remotest Software Engineer GSM: +90 542 680 39 18 Twitter: @serkan_ozal -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: JDK-8087134.zip Type: application/zip Size: 84609 bytes Desc: not available URL: From irogers at google.com Tue Jul 21 17:24:32 2015 From: irogers at google.com (Ian Rogers) Date: Tue, 21 Jul 2015 10:24:32 -0700 Subject: On constant folding of final field loads In-Reply-To: <558DFBEC.6040700@oracle.com> References: <558DFBEC.6040700@oracle.com> Message-ID: Fwiw, Jikes RVM will propagate final field values without any checking if reflection or JNI is being used to abuse the meaning of finality. As Jikes RVM is seldom used to run anything other than (elderly) benchmarks this has only turned up one problem, specifically in jython: http://bugs.jython.org/issue1611 In the bug there is some (hopefully) interesting discussion on the meaning of finality, notably that the language and VM spec disagree on when final fields may be written. This was why the problem hadn't occurred with Java bytecode but did occur with Jython generated bytecode. A related topic maybe that perhaps Java finalizers should be allowed to write to final fields too, in particular for the case that they represent some kind of native resource. Thanks, Ian Rogers, Google. On Fri, Jun 26, 2015 at 6:27 PM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Hi there, > > Recently I started looking at constant folding of loads from instance > final fields: > https://bugs.openjdk.java.net/browse/JDK-8058164 > > I made some progress and wanted to share my findings and initiate a > discussion about the problems I spotted. > > Current prototype: > http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot > http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk > > The idea is simple: JIT tracks final field changes and throws away > nmethods which are affected. > > There are 2 parts of the problem: > - how to track changes of final field > - how to manage relation between final fields and nmethods; > > I. Tracking changes of final fields > > There are 4 ways to circumvent runtime limitations and change a final > field value: > - Reflection API (Field.setAccessible()) > - Unsafe > - JNI > - java.lang.invoke (MethodHandles) > > (It's also possible to write to a final field in a constructor, but I > consider it as a corner case and haven't addressed yet. VM can ignore ) > > Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with only > 2 cases: JNI & Unsafe. > > For JNI it's possible to encode field "finality" in jfieldID and check > corresponding bit in Set*Field JNI methods before updating a field. There > are already some data encoded in the field ID, so extending it to record > final bit as well went pretty smooth. > > For Unsafe it's much more complex. > I started with a similar approach (mostly implemented in the current > prototype) - record "finality" bit in offset cookie and check it when > performing a write. > > Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly > states that returned value is not guaranteed to be a byte offset [1], after > following that road I don't see how offset encoding scheme can be changed. > > First of all, there are Unsafe.get*Unaligned methods (added in 9), which > require byte offset (Unsafe.getLong()): > "Fetches a value at some byte offset into a given Java object > ... > The specification of this method is the same as {@link > #getLong(Object, long)} except that the offset does not need to > have been obtained from {@link #objectFieldOffset} on the > {@link java.lang.reflect.Field} of some Java field." > > Unsafe.getInt supports 3 addressing modes: > (1) NULL + address > (2) oop + offset > (3) base + index * scale > > Since there are no methods in Unsafe to get byte offsets, there's no way > to make (3) work with non-byte offsets for Unaligned versions. Both base > and scale should be either byte offsets or offset cookies to make things > work. You can get a sense of the problems looking into Unsafe & java.nio > hacks I did to make things somewhat function after switching offset > encoding strategy. > > Also, Unsafe.copyMemory() doesn't work well with offset cookies (see > java.nio.Bits changes I did). Though source and destination addressing > shares the same mode with Unsage.getInt() et al., the size of the copied > region is defined in bytes. So, in order to perform bulk copies of > consecutive memory blocks, the user should be able to convert offset cookie > to byte offset and vice versa. There's no way to solve that with current > API right now. > > I don't want to touch compatibility concerns of switching from byte > offsets to encoded offsets, but it looks like Unsafe API needs some > overhaul in 9 to make offset encoding viable. > > More realistically, since there are external dependencies on Unsafe API, > I'd prefer to leave sun.misc.Unsafe as is and switch to VarHandles (when > they are available in 9) all over JDK. Or temporarily make a private copy > (finally :-)) of field accessors from Unsafe, switch it to encoded offsets, > and use it in Reflection & java.lang.invoke API. > > Regarding alternative approaches to track the finality, an offset bitmap > on per-class basis can be used (containing locations of final fields). > Possible downsides are: (1) memory footprint (1/8th of instance size per > class); and (2) more complex checking logic (load a relevant piece of a > bitmap from a klass, instead of checking locally available offset cookie). > The advantage is that it is completely transparent to a user: it doesn't > change offset translation scheme. > > > II. Managing relations between final fields and nmethods > > Nmethods dependencies suits that purpose pretty well, but some > enhancements are needed. > > I envision 2 types of dependencies: (1) per-class (field holder); and (2) > per-instance (value holder). Field holder is used as a context. > > Unless a final field is changed, there's no need to track per-instance > dependency. VM optimistically starts in per-class mode and switch to > per-instance mode when it sees a field change. The policy is chosen on > per-class basis. VM should be pretty conservative, since false positives > are expensive - a change of unrelated field causes recompilation of all > nmethods where the same field was inlined (even if the value was taken from > a different instance). Aliasing also causes false positives (same instance, > but different final field), so fields in the same class should be > differentiated as well. > > Unilke methods, fields don't have any dedicated metadata associated with > them. All data is confined in holder klass. To be able to identify a field > in a dependency, byte offset can be used. Right now, dependency management > machinery supports only oops and metadata. So, it should be extended to > support primitive values in dependencies (haven't done yet). > > Byte offset + per-instance modes completely eliminates false positives. > > Another aspect is how expensive dependency checking becomes. > > I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle > inlining heavily relies on constant folding of instance final fields. > > Before After > checks (#) 420 12,5K > nmethods checked(#) 3K 1,5M > total time: 60ms 2s > deps total 19K 26K > > Though total number of dependencies in VM didn't change much (+37% = > 19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) and time > spent on dependency checking (30x: 60ms -> 2s) dramatically increased. > > The reason is that constant field value dependencies created heavily > populated contextes which are regularly checked: > > #1 #2 #3/#4 > Before > KlassDep 254 47/2,632 > CallSiteDep 167 46/ 358 > > After > ConstantFieldDep 11,790 0/1,494,112 > KlassDep 286 41/ 2,769 > CallSiteDep 249 58/ 393 > > (#1 - dependency kind; #2 - total number of unique dependencies; > #3/#4 - invalidated nmethods/checked dependencies) > > I have 3 ideas how to improve performance of dependency checking: > > (1) split dependency context list (nmethodBucket) into 3 independent > lists (Klass, CallSite & ConstantValue); (IMPLEMENTED) > > It trades size for speed - duplicate nmethods are possible, but the lists > should be shorter on average. I already implemented it, but it didn't > improve the benchmark I'm playing with, since the fraction of > CallSite/Klass deps is very small compared to ConstantField. > > (2) group nmethodBucket entries into chunks of k-nmethods; (TODO) > > It should improve nmethod iteration speed in heavily populated contexts. > > (3) iterate only dependencies of appropriate kind; (TODO) > > There are 3 kinds of changes which require dependency checking: changes in > CHA (KlassDepChange), call site target change (CallSiteDepChange), and > constant field value change (ConstantFieldDepChange). Different types of > changes affect disjoint sets of dependencies. So, instead of enumerating > all dependencies in a nmethod, a more focused approach can be used (e.g. > check only call_site_target_value deps for CallSiteDepChange). > > Since dependencies are sorted by type when serialized in a nmethod, it's > possible to compute offsets for 3 disjoint sets and use them in DepStream > to iterate only relevant dependencies. > > I hope it'll significantly reduce dependency checking costs I'm seeing. > > That's all for now. Thanks! > > Best regards, > Vladimir Ivanov > > [1] "Do not expect to perform any sort of arithmetic on this offset; > it is just a cookie which is passed to the unsafe heap memory > accessors." > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Jul 21 17:51:52 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 21 Jul 2015 13:51:52 -0400 Subject: On constant folding of final field loads In-Reply-To: References: <558DFBEC.6040700@oracle.com> Message-ID: That jython bug report seems to be talking about static final fields though, whereas this thread is about instance final fields. Hotspot already considers static final fields as constants, and will propagate that into generated code; unfortunately, it doesn't do this for instance final fields. The problem here is that there are known java libs (and likely numerous internal/non-public ones) that use/rely on mutating final instance fields, and so blindly disregarding reflection/JNI/etc will break them (IMHO, they deserve it though! :)). On Tue, Jul 21, 2015 at 1:24 PM, Ian Rogers wrote: > Fwiw, Jikes RVM will propagate final field values without any checking if > reflection or JNI is being used to abuse the meaning of finality. As Jikes > RVM is seldom used to run anything other than (elderly) benchmarks this has > only turned up one problem, specifically in jython: > > http://bugs.jython.org/issue1611 > > In the bug there is some (hopefully) interesting discussion on the meaning > of finality, notably that the language and VM spec disagree on when final > fields may be written. This was why the problem hadn't occurred with Java > bytecode but did occur with Jython generated bytecode. > > A related topic maybe that perhaps Java finalizers should be allowed to > write to final fields too, in particular for the case that they represent > some kind of native resource. > > Thanks, > Ian Rogers, Google. > > > On Fri, Jun 26, 2015 at 6:27 PM, Vladimir Ivanov < > vladimir.x.ivanov at oracle.com> wrote: > >> Hi there, >> >> Recently I started looking at constant folding of loads from instance >> final fields: >> https://bugs.openjdk.java.net/browse/JDK-8058164 >> >> I made some progress and wanted to share my findings and initiate a >> discussion about the problems I spotted. >> >> Current prototype: >> http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot >> http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk >> >> The idea is simple: JIT tracks final field changes and throws away >> nmethods which are affected. >> >> There are 2 parts of the problem: >> - how to track changes of final field >> - how to manage relation between final fields and nmethods; >> >> I. Tracking changes of final fields >> >> There are 4 ways to circumvent runtime limitations and change a final >> field value: >> - Reflection API (Field.setAccessible()) >> - Unsafe >> - JNI >> - java.lang.invoke (MethodHandles) >> >> (It's also possible to write to a final field in a constructor, but I >> consider it as a corner case and haven't addressed yet. VM can ignore ) >> >> Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with only >> 2 cases: JNI & Unsafe. >> >> For JNI it's possible to encode field "finality" in jfieldID and check >> corresponding bit in Set*Field JNI methods before updating a field. There >> are already some data encoded in the field ID, so extending it to record >> final bit as well went pretty smooth. >> >> For Unsafe it's much more complex. >> I started with a similar approach (mostly implemented in the current >> prototype) - record "finality" bit in offset cookie and check it when >> performing a write. >> >> Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly >> states that returned value is not guaranteed to be a byte offset [1], after >> following that road I don't see how offset encoding scheme can be changed. >> >> First of all, there are Unsafe.get*Unaligned methods (added in 9), which >> require byte offset (Unsafe.getLong()): >> "Fetches a value at some byte offset into a given Java object >> ... >> The specification of this method is the same as {@link >> #getLong(Object, long)} except that the offset does not need to >> have been obtained from {@link #objectFieldOffset} on the >> {@link java.lang.reflect.Field} of some Java field." >> >> Unsafe.getInt supports 3 addressing modes: >> (1) NULL + address >> (2) oop + offset >> (3) base + index * scale >> >> Since there are no methods in Unsafe to get byte offsets, there's no way >> to make (3) work with non-byte offsets for Unaligned versions. Both base >> and scale should be either byte offsets or offset cookies to make things >> work. You can get a sense of the problems looking into Unsafe & java.nio >> hacks I did to make things somewhat function after switching offset >> encoding strategy. >> >> Also, Unsafe.copyMemory() doesn't work well with offset cookies (see >> java.nio.Bits changes I did). Though source and destination addressing >> shares the same mode with Unsage.getInt() et al., the size of the copied >> region is defined in bytes. So, in order to perform bulk copies of >> consecutive memory blocks, the user should be able to convert offset cookie >> to byte offset and vice versa. There's no way to solve that with current >> API right now. >> >> I don't want to touch compatibility concerns of switching from byte >> offsets to encoded offsets, but it looks like Unsafe API needs some >> overhaul in 9 to make offset encoding viable. >> >> More realistically, since there are external dependencies on Unsafe API, >> I'd prefer to leave sun.misc.Unsafe as is and switch to VarHandles (when >> they are available in 9) all over JDK. Or temporarily make a private copy >> (finally :-)) of field accessors from Unsafe, switch it to encoded offsets, >> and use it in Reflection & java.lang.invoke API. >> >> Regarding alternative approaches to track the finality, an offset bitmap >> on per-class basis can be used (containing locations of final fields). >> Possible downsides are: (1) memory footprint (1/8th of instance size per >> class); and (2) more complex checking logic (load a relevant piece of a >> bitmap from a klass, instead of checking locally available offset cookie). >> The advantage is that it is completely transparent to a user: it doesn't >> change offset translation scheme. >> >> >> II. Managing relations between final fields and nmethods >> >> Nmethods dependencies suits that purpose pretty well, but some >> enhancements are needed. >> >> I envision 2 types of dependencies: (1) per-class (field holder); and (2) >> per-instance (value holder). Field holder is used as a context. >> >> Unless a final field is changed, there's no need to track per-instance >> dependency. VM optimistically starts in per-class mode and switch to >> per-instance mode when it sees a field change. The policy is chosen on >> per-class basis. VM should be pretty conservative, since false positives >> are expensive - a change of unrelated field causes recompilation of all >> nmethods where the same field was inlined (even if the value was taken from >> a different instance). Aliasing also causes false positives (same instance, >> but different final field), so fields in the same class should be >> differentiated as well. >> >> Unilke methods, fields don't have any dedicated metadata associated with >> them. All data is confined in holder klass. To be able to identify a field >> in a dependency, byte offset can be used. Right now, dependency management >> machinery supports only oops and metadata. So, it should be extended to >> support primitive values in dependencies (haven't done yet). >> >> Byte offset + per-instance modes completely eliminates false positives. >> >> Another aspect is how expensive dependency checking becomes. >> >> I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle >> inlining heavily relies on constant folding of instance final fields. >> >> Before After >> checks (#) 420 12,5K >> nmethods checked(#) 3K 1,5M >> total time: 60ms 2s >> deps total 19K 26K >> >> Though total number of dependencies in VM didn't change much (+37% = >> 19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) and time >> spent on dependency checking (30x: 60ms -> 2s) dramatically increased. >> >> The reason is that constant field value dependencies created heavily >> populated contextes which are regularly checked: >> >> #1 #2 #3/#4 >> Before >> KlassDep 254 47/2,632 >> CallSiteDep 167 46/ 358 >> >> After >> ConstantFieldDep 11,790 0/1,494,112 >> KlassDep 286 41/ 2,769 >> CallSiteDep 249 58/ 393 >> >> (#1 - dependency kind; #2 - total number of unique dependencies; >> #3/#4 - invalidated nmethods/checked dependencies) >> >> I have 3 ideas how to improve performance of dependency checking: >> >> (1) split dependency context list (nmethodBucket) into 3 independent >> lists (Klass, CallSite & ConstantValue); (IMPLEMENTED) >> >> It trades size for speed - duplicate nmethods are possible, but the lists >> should be shorter on average. I already implemented it, but it didn't >> improve the benchmark I'm playing with, since the fraction of >> CallSite/Klass deps is very small compared to ConstantField. >> >> (2) group nmethodBucket entries into chunks of k-nmethods; (TODO) >> >> It should improve nmethod iteration speed in heavily populated contexts. >> >> (3) iterate only dependencies of appropriate kind; (TODO) >> >> There are 3 kinds of changes which require dependency checking: changes >> in CHA (KlassDepChange), call site target change (CallSiteDepChange), and >> constant field value change (ConstantFieldDepChange). Different types of >> changes affect disjoint sets of dependencies. So, instead of enumerating >> all dependencies in a nmethod, a more focused approach can be used (e.g. >> check only call_site_target_value deps for CallSiteDepChange). >> >> Since dependencies are sorted by type when serialized in a nmethod, it's >> possible to compute offsets for 3 disjoint sets and use them in DepStream >> to iterate only relevant dependencies. >> >> I hope it'll significantly reduce dependency checking costs I'm seeing. >> >> That's all for now. Thanks! >> >> Best regards, >> Vladimir Ivanov >> >> [1] "Do not expect to perform any sort of arithmetic on this offset; >> it is just a cookie which is passed to the unsafe heap memory >> accessors." >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Tue Jul 21 18:00:55 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 21 Jul 2015 20:00:55 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AE4BC2.1090104@oracle.com> References: <55AE4BC2.1090104@oracle.com> Message-ID: <55AE88D7.4080005@oracle.com> [CC'ing aarch64-port-dev because aarch64 files are affected] On 21.07.2015 15:40, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8130309 > http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ > > Problem: > While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. > > More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): > - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes > - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes > However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. > > Solution: > Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. > > Testing: > - Failing test > - JPRT > > Thanks, > Tobias > From vladimir.x.ivanov at oracle.com Tue Jul 21 18:20:10 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 21 Jul 2015 21:20:10 +0300 Subject: [9] RFR (XS): 8131675: EA fails with assert(false) failed: not unsafe or G1 barrier raw StoreP Message-ID: <55AE8D5A.1020906@oracle.com> http://cr.openjdk.java.net/~vlivanov/8131675/webrev.00 https://bugs.openjdk.java.net/browse/JDK-8131675 Newly introduced UnsafeGetConstantField test revealed an uncovered case in EA for unsafe accesses. The following code: U.putAddress(nativeAddr, 0x12345678L); where static final long nativeAddr = U.allocateMemory(16); static final Unsafe U = Unsafe.getUnsafe(); is parsed into the following IR: StoreP (ctrl) (mem) ConP ConP But EA doesn't expect to see a constant address and falls through the unsafe access detection logic hitting the assert right away. The fix is to treat all stores to raw pointers (except G1 barriers) as unsafe accesses and mark stored values as escaped. Testing: failed test, jprt Best regards, Vladimir Ivanov [1] test/compiler/unsafe/UnsafeGetConstantField.java From dean.long at oracle.com Tue Jul 21 18:21:52 2015 From: dean.long at oracle.com (Dean Long) Date: Tue, 21 Jul 2015 11:21:52 -0700 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AE4BC2.1090104@oracle.com> References: <55AE4BC2.1090104@oracle.com> Message-ID: <55AE8DC0.5050000@oracle.com> Looks good to me. dl On 7/21/2015 6:40 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8130309 > http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ > > Problem: > While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. > > More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): > - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes > - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes > However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. > > Solution: > Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. > > Testing: > - Failing test > - JPRT > > Thanks, > Tobias From vladimir.x.ivanov at oracle.com Tue Jul 21 18:48:07 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 21 Jul 2015 21:48:07 +0300 Subject: On constant folding of final field loads In-Reply-To: References: <558DFBEC.6040700@oracle.com> Message-ID: <55AE93E7.9090608@oracle.com> Ian, Thanks for sharing the reference! Though jython problem was about static final fields, it is still applicable to the discussion. Multiple writes to instance final fields are allowed by JVMS and should be correctly handled. There's still some mismatch in field finality between JVMS & JLS, but it seems JVMS was slightly tied up since 2010. JVM is required to throw runtime exception when attempting to write to a final field from outside of / [1]. JVMS 6 [2] allowed to write to final fields in current class. Fixing the problem for static final fields doesn't look like a hard problem. JIT can inspect field holder class state and skip the optimization if the class hasn't been fully initialized yet. Best regards, Vladimir Ivanov [1] http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html#jvms-6.5.putfield: "Otherwise, if the field is final, it must be declared in the current class, and the instruction must occur in an instance initialization method () of the current class. Otherwise, an IllegalAccessError is thrown." [2] https://docs.oracle.com/javase/specs/jvms/se6/html/Instructions2.doc11.html "Otherwise, if the field is final, it must be declared in the current class. Otherwise, an IllegalAccessError is thrown." On 7/21/15 8:24 PM, Ian Rogers wrote: > Fwiw, Jikes RVM will propagate final field values without any checking > if reflection or JNI is being used to abuse the meaning of finality. As > Jikes RVM is seldom used to run anything other than (elderly) benchmarks > this has only turned up one problem, specifically in jython: > > http://bugs.jython.org/issue1611 > > In the bug there is some (hopefully) interesting discussion on the > meaning of finality, notably that the language and VM spec disagree on > when final fields may be written. This was why the problem hadn't > occurred with Java bytecode but did occur with Jython generated bytecode. > > A related topic maybe that perhaps Java finalizers should be allowed to > write to final fields too, in particular for the case that they > represent some kind of native resource. > > Thanks, > Ian Rogers, Google. > > > On Fri, Jun 26, 2015 at 6:27 PM, Vladimir Ivanov > > wrote: > > Hi there, > > Recently I started looking at constant folding of loads from > instance final fields: > https://bugs.openjdk.java.net/browse/JDK-8058164 > > I made some progress and wanted to share my findings and initiate a > discussion about the problems I spotted. > > Current prototype: > http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/hotspot > http://cr.openjdk.java.net/~vlivanov/8058164/webrev.00/jdk > > The idea is simple: JIT tracks final field changes and throws away > nmethods which are affected. > > There are 2 parts of the problem: > - how to track changes of final field > - how to manage relation between final fields and nmethods; > > I. Tracking changes of final fields > > There are 4 ways to circumvent runtime limitations and change a > final field value: > - Reflection API (Field.setAccessible()) > - Unsafe > - JNI > - java.lang.invoke (MethodHandles) > > (It's also possible to write to a final field in a constructor, but > I consider it as a corner case and haven't addressed yet. VM can > ignore ) > > Since Reflection & java.lang.invoke APIs use Unsafe, it ends up with > only 2 cases: JNI & Unsafe. > > For JNI it's possible to encode field "finality" in jfieldID and > check corresponding bit in Set*Field JNI methods before updating a > field. There are already some data encoded in the field ID, so > extending it to record final bit as well went pretty smooth. > > For Unsafe it's much more complex. > I started with a similar approach (mostly implemented in the current > prototype) - record "finality" bit in offset cookie and check it > when performing a write. > > Though Unsafe.objectFieldOffset/staticFieldOffset javadoc explicitly > states that returned value is not guaranteed to be a byte offset > [1], after following that road I don't see how offset encoding > scheme can be changed. > > First of all, there are Unsafe.get*Unaligned methods (added in 9), > which require byte offset (Unsafe.getLong()): > "Fetches a value at some byte offset into a given Java object > ... > The specification of this method is the same as {@link > #getLong(Object, long)} except that the offset does not need to > have been obtained from {@link #objectFieldOffset} on the > {@link java.lang.reflect.Field} of some Java field." > > Unsafe.getInt supports 3 addressing modes: > (1) NULL + address > (2) oop + offset > (3) base + index * scale > > Since there are no methods in Unsafe to get byte offsets, there's no > way to make (3) work with non-byte offsets for Unaligned versions. > Both base and scale should be either byte offsets or offset cookies > to make things work. You can get a sense of the problems looking > into Unsafe & java.nio hacks I did to make things somewhat function > after switching offset encoding strategy. > > Also, Unsafe.copyMemory() doesn't work well with offset cookies (see > java.nio.Bits changes I did). Though source and destination > addressing shares the same mode with Unsage.getInt() et al., the > size of the copied region is defined in bytes. So, in order to > perform bulk copies of consecutive memory blocks, the user should be > able to convert offset cookie to byte offset and vice versa. There's > no way to solve that with current API right now. > > I don't want to touch compatibility concerns of switching from byte > offsets to encoded offsets, but it looks like Unsafe API needs some > overhaul in 9 to make offset encoding viable. > > More realistically, since there are external dependencies on Unsafe > API, I'd prefer to leave sun.misc.Unsafe as is and switch to > VarHandles (when they are available in 9) all over JDK. Or > temporarily make a private copy (finally :-)) of field accessors > from Unsafe, switch it to encoded offsets, and use it in Reflection > & java.lang.invoke API. > > Regarding alternative approaches to track the finality, an offset > bitmap on per-class basis can be used (containing locations of final > fields). Possible downsides are: (1) memory footprint (1/8th of > instance size per class); and (2) more complex checking logic (load > a relevant piece of a bitmap from a klass, instead of checking > locally available offset cookie). The advantage is that it is > completely transparent to a user: it doesn't change offset > translation scheme. > > > II. Managing relations between final fields and nmethods > > Nmethods dependencies suits that purpose pretty well, but some > enhancements are needed. > > I envision 2 types of dependencies: (1) per-class (field holder); > and (2) per-instance (value holder). Field holder is used as a context. > > Unless a final field is changed, there's no need to track > per-instance dependency. VM optimistically starts in per-class mode > and switch to per-instance mode when it sees a field change. The > policy is chosen on per-class basis. VM should be pretty > conservative, since false positives are expensive - a change of > unrelated field causes recompilation of all nmethods where the same > field was inlined (even if the value was taken from a different > instance). Aliasing also causes false positives (same instance, but > different final field), so fields in the same class should be > differentiated as well. > > Unilke methods, fields don't have any dedicated metadata associated > with them. All data is confined in holder klass. To be able to > identify a field in a dependency, byte offset can be used. Right > now, dependency management machinery supports only oops and > metadata. So, it should be extended to support primitive values in > dependencies (haven't done yet). > > Byte offset + per-instance modes completely eliminates false positives. > > Another aspect is how expensive dependency checking becomes. > > I took a benchmark from Nashorn/Octane (Box2D), since MethodHandle > inlining heavily relies on constant folding of instance final fields. > > Before After > checks (#) 420 12,5K > nmethods checked(#) 3K 1,5M > total time: 60ms 2s > deps total 19K 26K > > Though total number of dependencies in VM didn't change much (+37% = > 19K->26K), total number of checked dependencies (500x: 3K -> 1,5M) > and time spent on dependency checking (30x: 60ms -> 2s) dramatically > increased. > > The reason is that constant field value dependencies created heavily > populated contextes which are regularly checked: > > #1 #2 #3/#4 > Before > KlassDep 254 47/2,632 > CallSiteDep 167 46/ 358 > > After > ConstantFieldDep 11,790 0/1,494,112 > KlassDep 286 41/ 2,769 > CallSiteDep 249 58/ 393 > > (#1 - dependency kind; #2 - total number of unique dependencies; > #3/#4 - invalidated nmethods/checked dependencies) > > I have 3 ideas how to improve performance of dependency checking: > > (1) split dependency context list (nmethodBucket) into 3 > independent lists (Klass, CallSite & ConstantValue); (IMPLEMENTED) > > It trades size for speed - duplicate nmethods are possible, but the > lists should be shorter on average. I already implemented it, but it > didn't improve the benchmark I'm playing with, since the fraction of > CallSite/Klass deps is very small compared to ConstantField. > > (2) group nmethodBucket entries into chunks of k-nmethods; (TODO) > > It should improve nmethod iteration speed in heavily populated contexts. > > (3) iterate only dependencies of appropriate kind; (TODO) > > There are 3 kinds of changes which require dependency checking: > changes in CHA (KlassDepChange), call site target change > (CallSiteDepChange), and constant field value change > (ConstantFieldDepChange). Different types of changes affect disjoint > sets of dependencies. So, instead of enumerating all dependencies in > a nmethod, a more focused approach can be used (e.g. check only > call_site_target_value deps for CallSiteDepChange). > > Since dependencies are sorted by type when serialized in a nmethod, > it's possible to compute offsets for 3 disjoint sets and use them in > DepStream to iterate only relevant dependencies. > > I hope it'll significantly reduce dependency checking costs I'm seeing. > > That's all for now. Thanks! > > Best regards, > Vladimir Ivanov > > [1] "Do not expect to perform any sort of arithmetic on this offset; > it is just a cookie which is passed to the unsafe heap memory > accessors." > > From john.r.rose at oracle.com Tue Jul 21 19:02:43 2015 From: john.r.rose at oracle.com (John Rose) Date: Tue, 21 Jul 2015 12:02:43 -0700 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55AE62FE.4070502@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> <55AE62FE.4070502@oracle.com> Message-ID: Yes, that will work, and I think it is cleaner than what we had before, as well as providing the new required functionality. Reviewed; please get a second reviewer. ? John P.S. If the unit tests want to test (via the whitebox API) whether an intrinsic was compiled successfully, we might want to expose Compile::gather_intrinsic_statistics, etc. But not in this change set. P.P.S. As I think I said before, I wish we had a way to consolidate the switch statements further (into vmSymbols.hpp). But I don't see a clean way to do it. On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? wrote: > > Here is the newest webrev: > - top: http://cr.openjdk.java.net/~zmajo/8130832/top/ > - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Tue Jul 21 19:04:42 2015 From: john.r.rose at oracle.com (John Rose) Date: Tue, 21 Jul 2015 12:04:42 -0700 Subject: On constant folding of final field loads In-Reply-To: <55AE93E7.9090608@oracle.com> References: <558DFBEC.6040700@oracle.com> <55AE93E7.9090608@oracle.com> Message-ID: <2BB402A2-0A7B-4DF6-BE81-CC336663CEE7@oracle.com> On Jul 21, 2015, at 11:48 AM, Vladimir Ivanov wrote: > > Fixing the problem for static final fields doesn't look like a hard problem. JIT can inspect field holder class state and skip the optimization if the class hasn't been fully initialized yet. And a slushy bit (for use by reflection, etc.) can handle the corresponding non-static case. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Tue Jul 21 20:28:12 2015 From: dean.long at oracle.com (Dean Long) Date: Tue, 21 Jul 2015 13:28:12 -0700 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55AD0AE0.3060803@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> Message-ID: <55AEAB5C.8050307@oracle.com> This version looks good. dl On 7/20/2015 7:51 AM, Aleksey Shipilev wrote: > Hi Dean, > > Thanks for taking a look! > > Silly me, I should have left the call patching cases intact, because > you're right, we should be able to patch the nops partially while still > producing the correct instruction stream. Therefore, I reverted the > cases where we do nop-ing for *instruction* patching, and added the > comment there. > > Other places seem to use the nop sequences to provide the alignment, not > for the general patching. Especially interesting for us is the case of > aligning the patcheable immediate in the existing call. C2 does the nops > in these cases. > > New webrev: > http://cr.openjdk.java.net/~shade/8131682/webrev.01/ > > Testing: > * JPRT -testset hotspot on open platforms; > * Targeted benchmarks, plus eyeballing the assembly; > > Thanks, > -Aleksey > > On 18.07.2015 10:51, Dean Long wrote: >> I think we should distinguish the different uses and treat them >> accordingly: >> >> 1) padding nops for patching, executed >> >> We need to be careful about inserting a fat nop here, if later patching >> overwrites only part of the fat nop, resulting in an illegal intruction. >> >> 2) padding nops for patching, never executed >> >> It should be safe insert a fat nop here, but there's no point if the >> nops are not reachable and never executed. >> >> >> 3) alignment nops, never patched, executed >> >> Fat nops are fine, but on some CPUs branching may be even better, so I >> suggest using align() for this, and letting align() decide what to >> generate. The change in check_icache() could use a version of align >> that takes the target offset as an argument: >> >> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size); >> >> 4) alignment nops, never patched, never executed >> >> Doesn't matter what we emit here, but we might as well make it >> understandable by humans using a debugger. >> >> >> I believe the patching nops in c1_CodeStubs_x86.cpp and >> c1_LIRAssembler.cpp are patched concurrently while the code is running, >> not at a safepoint, so it's not clear to me if it's safe to use fat nops >> on x86. I would consider those changes unsafe on x86 without further >> analysis of what happens during patching. >> >> dl >> >> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote: >>> Hi there, >>> >>> C1 is not very good at inlining and intrisifying methods, and hence the >>> call performance is important there. One nit that we can see in the >>> generated code on x86 is that C1 uses the single-byte nops, even for >>> long nop strides. >>> >>> This improvement fixes that: >>> https://bugs.openjdk.java.net/browse/JDK-8131682 >>> http://cr.openjdk.java.net/~shade/8131682/webrev.00/ >>> >>> Testing: >>> - JPRT -testset hotspot on open platforms >>> - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 >>> >>> (I understand the symmetric change is going to be needed in closed >>> parts, but let's polish the open part first). >>> >>> Thanks, >>> -Aleksey >>> > From tobias.hartmann at oracle.com Wed Jul 22 06:06:06 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 22 Jul 2015 08:06:06 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AE8DC0.5050000@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55AE8DC0.5050000@oracle.com> Message-ID: <55AF32CE.2000008@oracle.com> Thanks, Dean. Best, Tobias On 21.07.2015 20:21, Dean Long wrote: > Looks good to me. > > dl > > On 7/21/2015 6:40 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8130309 >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ >> >> Problem: >> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. >> >> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): >> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes >> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes >> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. >> >> Solution: >> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. >> >> Testing: >> - Failing test >> - JPRT >> >> Thanks, >> Tobias > From adinn at redhat.com Wed Jul 22 07:47:06 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 22 Jul 2015 08:47:06 +0100 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AE88D7.4080005@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com> Message-ID: <55AF4A7A.6080505@redhat.com> Hi Thomas, The patch looks good so it could be pushed as is as far as I am concerned (*not* a hotspot/openjdk reviewer). However, note that the ppc port also suffers from the same problem. It employs an identical routine emit_trampoline_stub defined in the Arch Description file (ppc.ad). You might want to include a tweak to the ppc code as part of this fix or maybe leave it to Volker/Goetz et al. n.b. the problem also affects both the ppc jdk8 code but is not present in AArch64 jdk8. Since the ppc fix probably needs a backport it might be better to leave that fix as a separate step? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From zoltan.majo at oracle.com Wed Jul 22 07:59:56 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 22 Jul 2015 09:59:56 +0200 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> <55AE62FE.4070502@oracle.com> Message-ID: <55AF4D7C.1030706@oracle.com> Hi John, On 07/21/2015 09:02 PM, John Rose wrote: > Yes, that will work, and I think it is cleaner than what we had > before, as well as providing the new required functionality. > > Reviewed; please get a second reviewer. thank you for the review! I'll ask Vladimir K., maybe he has time to look at the newest webrev. > > ? John > > P.S. If the unit tests want to test (via the whitebox API) whether an > intrinsic was compiled successfully, we might want to > expose Compile::gather_intrinsic_statistics, etc. But not in this > change set. That is an interesting idea. We'd also have to see if current tests require such functionality or if SQE plans to add tests requiring that functionality. > > P.P.S. As I think I said before, I wish we had a way to consolidate > the switch statements further (into vmSymbols.hpp). But I don't see a > clean way to do it. Yes, that would be nice. I've not seen a good way to do that, partly because inconsistencies between the way C1 and C2 depends on the value of command-line flags. Thank you! Best regards, Zoltan > > On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? > wrote: >> >> Here is the newest webrev: >> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/ >> >> - >> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ >> > From tobias.hartmann at oracle.com Wed Jul 22 08:04:57 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 22 Jul 2015 10:04:57 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AF4A7A.6080505@redhat.com> References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com> <55AF4A7A.6080505@redhat.com> Message-ID: <55AF4EA9.6070002@oracle.com> Hi Andrew, On 22.07.2015 09:47, Andrew Dinn wrote: > Hi Thomas, > > The patch looks good so it could be pushed as is as far as I am > concerned (*not* a hotspot/openjdk reviewer). Thanks for taking a look! > However, note that the ppc port also suffers from the same problem. It > employs an identical routine emit_trampoline_stub defined in the Arch > Description file (ppc.ad). You might want to include a tweak to the ppc > code as part of this fix or maybe leave it to Volker/Goetz et al. What problem are you referring to? The actual problem I fixed with this patch is platform independent. It's caused by C2 code not bailing out if the platform dependent code was unable to create a stub (see changes in 'output.cpp'). I only removed the call to 'start_a_stub' in the aarch64 code because it is useless. I don't see this call on ppc though. Thanks, Tobias > n.b. the problem also affects both the ppc jdk8 code but is not present > in AArch64 jdk8. Since the ppc fix probably needs a backport it might be > better to leave that fix as a separate step? > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From aleksey.shipilev at oracle.com Wed Jul 22 08:10:51 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 22 Jul 2015 11:10:51 +0300 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55AEAB5C.8050307@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> Message-ID: <55AF500B.9000505@oracle.com> Thanks for review, Dean! I'd like to hear the opinions of AArch64 and Power folks, since we contaminate their assemblers a bit to gain access to x86 fat nops. -Aleksey On 21.07.2015 23:28, Dean Long wrote: > This version looks good. > > dl > > On 7/20/2015 7:51 AM, Aleksey Shipilev wrote: >> Hi Dean, >> >> Thanks for taking a look! >> >> Silly me, I should have left the call patching cases intact, because >> you're right, we should be able to patch the nops partially while still >> producing the correct instruction stream. Therefore, I reverted the >> cases where we do nop-ing for *instruction* patching, and added the >> comment there. >> >> Other places seem to use the nop sequences to provide the alignment, not >> for the general patching. Especially interesting for us is the case of >> aligning the patcheable immediate in the existing call. C2 does the nops >> in these cases. >> >> New webrev: >> http://cr.openjdk.java.net/~shade/8131682/webrev.01/ >> >> Testing: >> * JPRT -testset hotspot on open platforms; >> * Targeted benchmarks, plus eyeballing the assembly; >> >> Thanks, >> -Aleksey >> >> On 18.07.2015 10:51, Dean Long wrote: >>> I think we should distinguish the different uses and treat them >>> accordingly: >>> >>> 1) padding nops for patching, executed >>> >>> We need to be careful about inserting a fat nop here, if later patching >>> overwrites only part of the fat nop, resulting in an illegal intruction. >>> >>> 2) padding nops for patching, never executed >>> >>> It should be safe insert a fat nop here, but there's no point if the >>> nops are not reachable and never executed. >>> >>> >>> 3) alignment nops, never patched, executed >>> >>> Fat nops are fine, but on some CPUs branching may be even better, so I >>> suggest using align() for this, and letting align() decide what to >>> generate. The change in check_icache() could use a version of align >>> that takes the target offset as an argument: >>> >>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size); >>> >>> 4) alignment nops, never patched, never executed >>> >>> Doesn't matter what we emit here, but we might as well make it >>> understandable by humans using a debugger. >>> >>> >>> I believe the patching nops in c1_CodeStubs_x86.cpp and >>> c1_LIRAssembler.cpp are patched concurrently while the code is running, >>> not at a safepoint, so it's not clear to me if it's safe to use fat nops >>> on x86. I would consider those changes unsafe on x86 without further >>> analysis of what happens during patching. >>> >>> dl >>> >>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote: >>>> Hi there, >>>> >>>> C1 is not very good at inlining and intrisifying methods, and hence the >>>> call performance is important there. One nit that we can see in the >>>> generated code on x86 is that C1 uses the single-byte nops, even for >>>> long nop strides. >>>> >>>> This improvement fixes that: >>>> https://bugs.openjdk.java.net/browse/JDK-8131682 >>>> http://cr.openjdk.java.net/~shade/8131682/webrev.00/ >>>> >>>> Testing: >>>> - JPRT -testset hotspot on open platforms >>>> - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 >>>> >>>> (I understand the symmetric change is going to be needed in closed >>>> parts, but let's polish the open part first). >>>> >>>> Thanks, >>>> -Aleksey >>>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Wed Jul 22 08:32:59 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 22 Jul 2015 09:32:59 +0100 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AF4EA9.6070002@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com> <55AF4A7A.6080505@redhat.com> <55AF4EA9.6070002@oracle.com> Message-ID: <55AF553B.4060906@redhat.com> On 22/07/15 09:04, Tobias Hartmann wrote: > On 22.07.2015 09:47, Andrew Dinn wrote: . . . >> However, note that the ppc port also suffers from the same problem. >> It employs an identical routine emit_trampoline_stub defined in the >> Arch Description file (ppc.ad). You might want to include a tweak >> to the ppc code as part of this fix or maybe leave it to >> Volker/Goetz et al. > > What problem are you referring to? The actual problem I fixed with > this patch is platform independent. It's caused by C2 code not > bailing out if the platform dependent code was unable to create a > stub (see changes in 'output.cpp'). I only removed the call to > 'start_a_stub' in the aarch64 code because it is useless. I don't see > this call on ppc though. Oops, apologies this is my mistake! The AArch4 code was cloned off the ppc code and I when I looked at the ppc versions this morning I saw they included the same repeated call to start_a_stub that you removed from the AArch64 tree. Evidently I was still high on crack from last night's Dionysian debauch (or something like that :-) since, as you say, there is no such call. I'll just go get another cup of coffee . . . regards, Andrew Dinn ----------- From tobias.hartmann at oracle.com Wed Jul 22 09:09:01 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 22 Jul 2015 11:09:01 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AF553B.4060906@redhat.com> References: <55AE4BC2.1090104@oracle.com> <55AE88D7.4080005@oracle.com> <55AF4A7A.6080505@redhat.com> <55AF4EA9.6070002@oracle.com> <55AF553B.4060906@redhat.com> Message-ID: <55AF5DAD.1030608@oracle.com> Okay, no worries :) Best, Tobias On 22.07.2015 10:32, Andrew Dinn wrote: > On 22/07/15 09:04, Tobias Hartmann wrote: >> On 22.07.2015 09:47, Andrew Dinn wrote: . . . >>> However, note that the ppc port also suffers from the same problem. >>> It employs an identical routine emit_trampoline_stub defined in the >>> Arch Description file (ppc.ad). You might want to include a tweak >>> to the ppc code as part of this fix or maybe leave it to >>> Volker/Goetz et al. >> >> What problem are you referring to? The actual problem I fixed with >> this patch is platform independent. It's caused by C2 code not >> bailing out if the platform dependent code was unable to create a >> stub (see changes in 'output.cpp'). I only removed the call to >> 'start_a_stub' in the aarch64 code because it is useless. I don't see >> this call on ppc though. > > Oops, apologies this is my mistake! The AArch4 code was cloned off the > ppc code and I when I looked at the ppc versions this morning I saw they > included the same repeated call to start_a_stub that you removed from > the AArch64 tree. Evidently I was still high on crack from last night's > Dionysian debauch (or something like that :-) since, as you say, there > is no such call. > > I'll just go get another cup of coffee . . . > > regards, > > > Andrew Dinn > ----------- > From edward.nevill at gmail.com Wed Jul 22 09:50:25 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 22 Jul 2015 10:50:25 +0100 Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <55A8CE3C.1050203@redhat.com> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint> <55A8C45D.2070009@redhat.com> <1437124359.29276.18.camel@mint> <55A8CAF5.5060601@redhat.com> <55A8CE3C.1050203@redhat.com> Message-ID: <1437558625.14729.11.camel@mylittlepony.linaroharston> On Fri, 2015-07-17 at 10:43 +0100, Andrew Haley wrote: > On 17/07/15 10:29, Andrew Haley wrote: > > On 17/07/15 10:12, Edward Nevill wrote: > >> On Fri, 2015-07-17 at 10:01 +0100, Andrew Haley wrote: > >>> On 17/07/15 09:52, Edward Nevill wrote: > >>>>>> Should it be +8 instead of +4? Or these offsets are not in bytes?: > >>>>>> > >>>>>> + unspill(rscratch1, true, src_offset); > >>>>>> + spill(rscratch1, true, dst_offset); > >>>>>> + unspill(rscratch1, true, src_offset+4); > >>>>>> + spill(rscratch1, true, dst_offset+4); > >>>> Ouch! Good catch. > >>>> > >>>> New webrev. > >>>> > >>>> http://cr.openjdk.java.net/~enevill/8131362/webrev.03/ > >>> > >>> I'm a bit more concerned that this did not fail in testing. I guess > >>> there were no tests at all for stack-stack spills. > >> > >> Correct. And it would have to be a 128 bit vector stack-stack spill with > >> an offset >= 512. How would you even provoke such a thing. > > > > With a highly-vectorizable test case with a zillion temporaries, I guess. > > Thinking some more: I think I'd add some special code to test it all > once, then delete the special code. If that's what it takes, there > isn't much choice. So what I did was I reduced the number of vector registers from 32 to 2. Even then spill_copy128 was never called. So maybe it just never does stack-stack spills on vector registers. However it does do stack-stack spills on general purpose registers. So I reduced the number of general purpose registers to 2 and faked the size of the general purpose registers at 128 instead of 64. spill_copy128 was then called and I verified that the generated code was correct. I also verified that the code was actually being executed by setting a breakpoint on the spill copy code. I have also verified that both branches of the if in spill_copy128 are tested by changing the condition to force execution of each branch. So, although we still don't know if it will ever call spill_copy128 for a vector register at least we have confidence that the spill_copy code does the right thing if it is ever called. OK to push? Ed. From roland.westrelin at oracle.com Wed Jul 22 13:54:56 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 22 Jul 2015 15:54:56 +0200 Subject: [9] RFR (XS): 8131675: EA fails with assert(false) failed: not unsafe or G1 barrier raw StoreP In-Reply-To: <55AE8D5A.1020906@oracle.com> References: <55AE8D5A.1020906@oracle.com> Message-ID: > http://cr.openjdk.java.net/~vlivanov/8131675/webrev.00 Looks good to me. Roland. From cnewland at chrisnewland.com Wed Jul 22 14:04:03 2015 From: cnewland at chrisnewland.com (Chris Newland) Date: Wed, 22 Jul 2015 15:04:03 +0100 Subject: Lock coarsening LogCompilation output? Message-ID: Hi, I'm building support into JITWatch for highlighting eliminated heap allocations and locks (either via elision or coarsening) and I'm confused by the LogCompilation output in this case: Given the source code here: https://github.com/AdoptOpenJDK/jitwatch/blob/master/src/main/resources/examples/LockCoarsen.java Which consists of two synchronized(this) regions separated by a single statement modifying a local primitive. I get the following LogCompilation output: https://gist.github.com/chriswhocodes/124984ce8078290485e7 Which has the following elimination info in the optimizer phase: BCI 61 refers to the 2nd synchronized block's monitor enter which I believe will be eliminated due to coarsening but I don't understand what the tag means, and why it doesn't contain a jvms tag? Does it mean the first synchronized block has also been eliminated (via elision) ? I don't see any "lock cmpxchg" related to the first synchronized block in the PrintAssembly? Thanks, Chris From vladimir.x.ivanov at oracle.com Wed Jul 22 16:14:26 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 22 Jul 2015 19:14:26 +0300 Subject: On constant folding of final field loads In-Reply-To: References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com> <559143D2.2020201@oracle.com> <5592662C.40400@oracle.com> <5592E75A.1000500@oracle.com> Message-ID: <55AFC162.4030807@oracle.com> John, >> In order to avoid surprises and inconsistencies (old value vs new value depending on execution path) which are *very* hard to track down, VM should either completely forbid final field changes or keep track of them and adapt accordingly. > > I like the "forbid" option, also known as "fail fast". I think (in general) > we should (where we can) remove indeterminate behavior from the > JVM specification, such as "what happens when I store a new value > to a final at an unexpected time". > > We have enough bits in the object header to encode frozen-ness. > This is an opposite property: slushiness-of-finals. We could require > that the newInstance operation used by deserialization would create > slushy objects. (The normal new/ sequence doesn't need this.) > Ideally, we would want the deserializer to issue an explicit "publish" > operation, which would clear the slushy flag. JITs would consult > that flag to gate final-folding. Reflection (and other users of > Unsafe) would consult the flag and throw a fail-fast error if it > failed. There would have to be some way to limit the time > an object is in the slushy state, ideally by enforcing an error > on deserializers who neglect to publish (or discard) a slushy > object. For example, we could require an annotation on > deserialization methods, as we do today on caller-sensitive > methods. > > That's the sort of thing I would prefer to see, to remove > indeterminate behavior. Yes, fail-fast approach is very appealing and I like slushy bit idea, but it is more intrusive (from user perspective), unfortunately. Though Reflection & MethodHandles can be instrument with slushy bit checks and Unsafe left as-is, what can be done for JNI? Do you think it is acceptable for JVM to throw exceptions on "illegal" (slushy bit off) final field writes? SetXXXField JNI functions aren't declared to throw any exceptions [1], so it seems like an intrusive change, even with JNI spec adjustments. Thinking more about "slushiness", limiting the time an object is in that state doesn't look like an easy problem to solve, considering there are 3 interacting operations (instantiate, initialize, freeze). VM can do some analysis to ensure slushy objects don't escape, but it looks either too fragile or too complex to implement. I don't see a big problem with "runaway" objects with slushy bit on. It means JIT will be always conservative when working with them. Or are you mostly concerned about abuse of slushy objects? It can be mitigated by: (1) requiring a user to perform additional actions to set slushy bit (e.g. calling specialized newInstance() equivalent or marking caller method akin to caller-sensitive methods); in that case it is less likely a user won't freeze previously allocated object; (2) providing VM diagnostics to detect runaway slushy objects; In the end, it is expert level API. I don't think many people write their own deserialization frameworks :-) If you bungle it, you are on your own. >> Though offset value is explicitly described in API as an opaque offset cookie, I spotted 2 inconsistencies in the API itself: >> >> * Unsafe.get/set*Unaligned() require absolute offsets; >> These methods were added in 9, so haven't leaked into public yet. > > Yep. That seems to push for a high-tag (color bits in the MSB of > the offset), or (my preference) no tag or separate tag. > You could also copy the alignment bits into the LSB to co-exist > with a tag. > > (The "separate tag" option means something like having a > query for the "tag" as well as the base and offset of a variable. > The operations getInt, etc., would take an optional third argument, > which would be the tag associated with the base and offset. > This would allow address arithmetic to remain trivial, at > the expense of retooling uses of Unsafe that need to be > sensitive to tagging concerns.) Separate tag has the same shortcoming as high-tag: it is not translated into a single machine instruction. VM needs to inspect the tag before performing a field update, though original setXXX() methods aren't affected. Considering fail-fast approach and slushy bits, I'm inclined to leave Unsafe as is (no tag approach). Probably, adding diagnostic mode to VM signalling when a final field is updated using Unsafe. Best regards, Vladimir Ivanov [1] https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#Set_type_Field_routines From vladimir.x.ivanov at oracle.com Wed Jul 22 17:10:03 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 22 Jul 2015 20:10:03 +0300 Subject: Lock coarsening LogCompilation output? In-Reply-To: References: Message-ID: <55AFCE6B.1080108@oracle.com> Chris, Second eliminate_lock corresponds to unlock operation (lock='0'), which should be also eliminated (corresponding diagnostic logic in 9 [1]). VM doesn't attach JVM state to the unlock node in the product binaries (see as_Unlock()->dbg_jvms()), but it is present in fastdebug/slowdebug binaries [2]. Not sure why the state is pruned in product. Probably, there are some performance after-effects from keeping the state for unlock node. Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0d3c20ac648e/src/share/vm/opto/callnode.cpp#l1886 [2] latest hs-comp 9 fastdebug build: On 7/22/15 5:04 PM, Chris Newland wrote: > Hi, > > I'm building support into JITWatch for highlighting eliminated heap > allocations and locks (either via elision or coarsening) and I'm confused > by the LogCompilation output in this case: > > Given the source code here: > https://github.com/AdoptOpenJDK/jitwatch/blob/master/src/main/resources/examples/LockCoarsen.java > > Which consists of two synchronized(this) regions separated by a single > statement modifying a local primitive. > > I get the following LogCompilation output: > https://gist.github.com/chriswhocodes/124984ce8078290485e7 > > Which has the following elimination info in the optimizer phase: > > > > > > > > BCI 61 refers to the 2nd synchronized block's monitor enter which I > believe will be eliminated due to coarsening but I don't understand what > the > > > > > tag means, and why it doesn't contain a jvms tag? > > Does it mean the first synchronized block has also been eliminated (via > elision) ? I don't see any "lock cmpxchg" related to the first > synchronized block in the PrintAssembly? > > Thanks, > > Chris > > > > From vladimir.x.ivanov at oracle.com Wed Jul 22 17:21:45 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 22 Jul 2015 20:21:45 +0300 Subject: [9] RFR (XS): 8131675: EA fails with assert(false) failed: not unsafe or G1 barrier raw StoreP In-Reply-To: References: <55AE8D5A.1020906@oracle.com> Message-ID: <55AFD129.40109@oracle.com> Thanks, Roland! Best regards, Vladimir Ivanov On 7/22/15 4:54 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~vlivanov/8131675/webrev.00 > > Looks good to me. > > Roland. > From varming at gmail.com Wed Jul 22 17:37:36 2015 From: varming at gmail.com (Carsten Varming) Date: Wed, 22 Jul 2015 13:37:36 -0400 Subject: Fwd: Tiered compilation and virtual call heuristics In-Reply-To: References: Message-ID: Dear Hotspot compiler group, I have had a few issues with tiered compilation in JDK8 lately and was wondering if you have some comments or ideas for the given problem. Here is my problem as I currently understand it. Feel free to correct any misunderstandings I may have. With tiered compilation the heuristics for inlining virtual calls seems to degrade quite a bit. I think this is due to MethodData objects being created much earlier with tiered than without. This causes the tracking of the hottest target methods at a virtual call site to go awry, due to the limit (2) on the number of MethodData objects that can be associated with a bci in a method. It seems like the only virtual call targets tracked are the targets that are warm when when C1 is invoked. The program ends up with all call-sites in scala.collection.IndexedSeqOptimized.slice using virtual dispatch with tiered and bimorphic call sites without tiered. The end result with tiered is a tripling of the cpu required to run the program, and instruction pointers from the compiled slice method end up in 90% of all cpu samples (collected with perf at 4kHz). The problem is with a small application built in Scala on top of Netty. I have written a small sample program (see attached Main.java) to spare you the details (and to be able to give you code). When I run the sample program with tiered then the call to count end up being a virtual call, due to Instance$3.count and Instance4.count being warm when C1 kicks in. Without tiered Instance$1.count is the only hot method. I wonder if you guys have seen this problem in the wild or if I just happen to be unlucky. Increasing BciProfileWidth should help in my case, but it is not a product flag. Do you have any experience regarding cost of increasing BciProfileWidth? Do you have any thoughts on throwing out MethodData objects for virtual call sites that turns out to be pretty cold? Thank you in advance for your thoughts, Carsten -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Main.java Type: text/x-java Size: 906 bytes Desc: not available URL: From vladimir.x.ivanov at oracle.com Wed Jul 22 18:14:24 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 22 Jul 2015 21:14:24 +0300 Subject: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be [0..4] Message-ID: <55AFDD80.2050009@oracle.com> http://cr.openjdk.java.net/~vlivanov/8132168/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8132168 PrintIdealGraphLevel=4 is used to dump IR during parsing (see parse2.cpp:2387). Best regards, Vladimir Ivanov From john.r.rose at oracle.com Wed Jul 22 20:14:44 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 22 Jul 2015 13:14:44 -0700 Subject: On constant folding of final field loads In-Reply-To: <55AFC162.4030807@oracle.com> References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com> <559143D2.2020201@oracle.com> <5592662C.40400@oracle.com> <5592E75A.1000500@oracle.com> <55AFC162.4030807@oracle.com> Message-ID: <40E34041-BE42-4F8F-8C81-7DD4ACABD380@oracle.com> On Jul 22, 2015, at 9:14 AM, Vladimir Ivanov wrote: > >> That's the sort of thing I would prefer to see, to remove >> indeterminate behavior. > Yes, fail-fast approach is very appealing and I like slushy bit idea, but it is more intrusive (from user perspective), unfortunately. > > Though Reflection & MethodHandles can be instrument with slushy bit checks and Unsafe left as-is, what can be done for JNI? I would prefer to treat JNI (for these use cases) the same as Unsafe, to keep our story simpler. But it is true that we could add a few more checks to JNI than Unsafe. > Do you think it is acceptable for JVM to throw exceptions on "illegal" (slushy bit off) final field writes? Yes, that's the key feature of "fail fast". The user finds out immediately. > SetXXXField JNI functions aren't declared to throw any exceptions [1], so it seems like an intrusive change, even with JNI spec adjustments. In some cases (in the JMM legal framework) we would be within our rights to silently nullify (ignore and discard) illegal stores to final fields (in any API). Nullification is indistinguishable from the store occurring but never being observed in a future read. This is possible if either the store is delayed indefinitely, or if all threads (and compiled methods) have previously performed a caching read of the original final value. Of course, the threads would not have physically done so, but perhaps the JMM could be tortured to allow some sort of OOTA caching read of the original final value. But silent nullification is a desperation move. Users deserve an exception or other signal so they can fix their code quickly. > Thinking more about "slushiness", limiting the time an object is in that state doesn't look like an easy problem to solve, considering there are 3 interacting operations (instantiate, initialize, freeze). The important thing is to get library writers to "put a lid" on the slushy state by adding an explicit freeze operation. A mixture of performance and robustness concerns would motivate most of them to get with the program. But the non-compliant libraries would, as you say, leak slushy objects. Normally allocated objects should not have the slushy bit set, except (perhaps, but probably not) when they escape during construction. > VM can do some analysis to ensure slushy objects don't escape, but it looks either too fragile or too complex to implement. Agree. > I don't see a big problem with "runaway" objects with slushy bit on. It means JIT will be always conservative when working with them. Or are you mostly concerned about abuse of slushy objects? A slushy object won't optimize fully (in all cases). And its surprising mutability provides a little window for bugs to sneak in. It would be on library writers to fix this. I suppose we could also add an operation to assert frozen-ness, either inside the JIT or explicitly as a tool for users. In some cases, a JIT could generate good code optimistically for properly frozen objects, and de-opt when it runs into a stray slushy object. If the assertion were at the user level, it could fail with an exception; this isn't exactly fail-fast since the bad operation that leaked the slushy object would be long gone. > It can be mitigated by: > (1) requiring a user to perform additional actions to set slushy bit (e.g. calling specialized newInstance() equivalent or marking caller method akin to caller-sensitive methods); in that case it is less likely a user won't freeze previously allocated object; (Even better than annotations would be split types, a la the verifier. But the JVM does not have a highly refined type system.) Any hack that creates a blank instance (Unsafe.newInstance, JNIEnv.AllocObject) without also running a constructor can and should set the slushy bit. This is something we can control, ultimately. > (2) providing VM diagnostics to detect runaway slushy objects; (Yes, such as a user-visible assertion.) > In the end, it is expert level API. I don't think many people write their own deserialization frameworks :-) If you bungle it, you are on your own. And if you own a deserialization framework, you need to read the news and update your code. > ? > Separate tag has the same shortcoming as high-tag: it is not translated into a single machine instruction. VM needs to inspect the tag before performing a field update, though original setXXX() methods aren't affected. > > Considering fail-fast approach and slushy bits, I'm inclined to leave Unsafe as is (no tag approach). Probably, adding diagnostic mode to VM signalling when a final field is updated using Unsafe. That sounds pretty good. One caveat: Unsafe and automagic behavior (even in debug mode) don't usually go together. An alternative to automagic mode would be a *separate* Unsafe API to enable library writers to make the check (in their own debug mode). Something that would decode the (Object,long) pair into a java.lang.reflect.Field (or similar name), which could then be checked for finality. Plus the aforementioned "assertSlushy" operation. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Thu Jul 23 05:55:12 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 23 Jul 2015 05:55:12 +0000 Subject: RFR: 8132160 - support for AVX 512 call frames and stack management Message-ID: Hi Folks, I would like to contribute AVX 512 call frame and stack management changes. I need two reviewers to examine this patch and comment as needed: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132160 webrev: http://cr.openjdk.java.net/~mcberg/8132160/webrev.01/ These changes simplify frame management on 32-bit and 64-bit systems which support EVEX and extend more complete frame save and restore functionality as well as stack management for calls, traps and explicit exception paths. These changes also move CPUID queries into the assembler object state and add more state rules to a large class of instructions while simplifying their use. Also added is support for vectorizing double precision sqrt which is available through the math library. Many generated stubs and internal functions also now have predicated mask management for EVEX added. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Thu Jul 23 07:20:08 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 23 Jul 2015 09:20:08 +0200 Subject: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be [0..4] In-Reply-To: <55AFDD80.2050009@oracle.com> References: <55AFDD80.2050009@oracle.com> Message-ID: <5FF697E2-3D94-4650-A213-B3D3F4F6354B@oracle.com> > http://cr.openjdk.java.net/~vlivanov/8132168/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8132168 > > PrintIdealGraphLevel=4 is used to dump IR during parsing (see parse2.cpp:2387). Ok. Do we want to update the description string: 369 "Level of detail of the ideal graph printout. " \ 370 "System-wide value, 0=nothing is printed, 3=all details printed. "\ 371 "Level of detail of printouts can be set on a per-method level " \ 372 "as well by using CompileCommand=option.?) to cover the ?4? case? Roland. From aph at redhat.com Thu Jul 23 09:23:17 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 23 Jul 2015 10:23:17 +0100 Subject: [aarch64-port-dev ] RFR: 8131362: aarch64: C2 does not handle large stack offsets In-Reply-To: <1437558625.14729.11.camel@mylittlepony.linaroharston> References: <1437036374.31596.18.camel@mylittlepony.linaroharston> <55A7FCA1.6010008@oracle.com> <1437123159.29276.16.camel@mint> <55A8C45D.2070009@redhat.com> <1437124359.29276.18.camel@mint> <55A8CAF5.5060601@redhat.com> <55A8CE3C.1050203@redhat.com> <1437558625.14729.11.camel@mylittlepony.linaroharston> Message-ID: <55B0B285.7010904@redhat.com> On 22/07/15 10:50, Edward Nevill wrote: > So, although we still don't know if it will ever call spill_copy128 for a vector register at least we have confidence that the spill_copy code does the right thing if it is ever called. > > OK to push? Thanks. Fine by me, Andrew. From aph at redhat.com Thu Jul 23 10:02:19 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 23 Jul 2015 11:02:19 +0100 Subject: User-defined deserialization [Was: On constant folding of final field loads] In-Reply-To: References: <558DFBEC.6040700@oracle.com> <55911F68.9050809@oracle.com> <559143D2.2020201@oracle.com> <5592662C.40400@oracle.com> <5592E75A.1000500@oracle.com> Message-ID: <55B0BBAB.6020604@redhat.com> On 21/07/15 00:05, John Rose wrote: > We have enough bits in the object header to encode frozen-ness. > This is an opposite property: slushiness-of-finals. We could > require that the newInstance operation used by deserialization would > create slushy objects. (The normal new/ sequence doesn't need > this.) Ideally, we would want the deserializer to issue an explicit > "publish" operation, which would clear the slushy flag. JITs would > consult that flag to gate final-folding. Reflection (and other > users of Unsafe) would consult the flag and throw a fail-fast error > if it failed. There would have to be some way to limit the time an > object is in the slushy state, ideally by enforcing an error on > deserializers who neglect to publish (or discard) a slushy object. > For example, we could require an annotation on deserialization > methods, as we do today on caller-sensitive methods. > > That's the sort of thing I would prefer to see, to remove > indeterminate behavior. Which reminds me: an issue has arisen regarding Unsafe and user-defined deserialization. Some middleware uses non-Java-standard serialization protocols which require some sort of backdoor mechanism for creating objects which have no zero-arg constructor. Unafe is used to do this, as is ReflectionFactory, but both are internal APIs. I don't think there's anything inherently unreasonable about people wanting to write their own serialization protocols, but having to call internal APIs is fragile. I suppose we would like to tell the runtime that an object is no longer slushy, but there is no "official" interface for serializers to create objects in the first place. Andrew. From vladimir.x.ivanov at oracle.com Thu Jul 23 10:37:23 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 23 Jul 2015 13:37:23 +0300 Subject: [9] RFR (XXS): 8132168: PrintIdealGraphLevel range should be [0..4] In-Reply-To: <5FF697E2-3D94-4650-A213-B3D3F4F6354B@oracle.com> References: <55AFDD80.2050009@oracle.com> <5FF697E2-3D94-4650-A213-B3D3F4F6354B@oracle.com> Message-ID: <55B0C3E3.3020202@oracle.com> Thanks, Roland. I'll adjust the description before pushing. Best regards, Vladimir Ivanov On 7/23/15 10:20 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~vlivanov/8132168/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8132168 >> >> PrintIdealGraphLevel=4 is used to dump IR during parsing (see parse2.cpp:2387). > > Ok. Do we want to update the description string: > > 369 "Level of detail of the ideal graph printout. " \ > 370 "System-wide value, 0=nothing is printed, 3=all details printed. "\ > 371 "Level of detail of printouts can be set on a per-method level " \ > 372 "as well by using CompileCommand=option.?) > > to cover the ?4? case? > > Roland. > From tobias.hartmann at oracle.com Thu Jul 23 11:23:42 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 23 Jul 2015 13:23:42 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55AE4BC2.1090104@oracle.com> References: <55AE4BC2.1090104@oracle.com> Message-ID: <55B0CEBE.4010100@oracle.com> Hi, after running some more tests, I noticed that my fix is incomplete. The problem is that checking for a failed expansion in 'Compile::fill_buffer' may be too late in case we emit additional instructions immediately after we failed to create the stub. For example, 'CallStaticJavaHandle' emits code to restore the stack pointer after 'Java_Static_Call' on Sparc (see sparc.ad): 9991 ins_encode(preserve_SP, Java_Static_Call(meth), restore_SP, call_epilog); If we don't bail out immediately after allocation of the stub failed, we crash in 'restore_SP' while trying to emit code into the now freed code blob. This problem exists on all platforms. Although we do not always emit code (for example on aarch64, -XX:VerifyStackAtCalls is currently unimplemented), I added checks for robustness / completeness. I also added checks to the corresponding C1 code. Here is the new webrev: http://cr.openjdk.java.net/~thartmann/8130309/webrev.01/ Thanks, Tobias On 21.07.2015 15:40, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8130309 > http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ > > Problem: > While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. > > More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): > - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes > - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes > However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. > > Solution: > Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. > > Testing: > - Failing test > - JPRT > > Thanks, > Tobias > From roland.westrelin at oracle.com Thu Jul 23 13:51:57 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 23 Jul 2015 15:51:57 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B0CEBE.4010100@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> Message-ID: <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> > http://cr.openjdk.java.net/~thartmann/8130309/webrev.01/ assembler.cpp 68 Compile::current()->env()->record_failure("CodeCache is full?); That assumes we are calling this from c2 but it can be called from c1 as well. Did you add code for c1 to be on the safe side or have you observed problems with c1? I don?t understand that part: "the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.? What addresses are set to badAddress? Roland. > > Thanks, > Tobias > > On 21.07.2015 15:40, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8130309 >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ >> >> Problem: >> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. >> >> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): >> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes >> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes >> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. >> >> Solution: >> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. >> >> Testing: >> - Failing test >> - JPRT >> >> Thanks, >> Tobias >> From vivek.r.deshpande at intel.com Thu Jul 23 20:26:45 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 23 Jul 2015 20:26:45 +0000 Subject: RFR (M): 8132207: Update for x86 exp in the math lib Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com> Hi all I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM implementation. Please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 webrev: http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Thu Jul 23 23:45:29 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 23 Jul 2015 16:45:29 -0700 Subject: Tiered compilation and virtual call heuristics In-Reply-To: References: Message-ID: <6F5B24D5-F8D9-4C72-B978-93EDD7F353DE@oracle.com> I guess you got a little bit unlucky :). Increasing BciProfileWidth will only work in the case when one of the types appears in the profile later and then dominates other usages (see TypeProfileMajorReceiverPercent, it?s 90% by default). So, by the time C2 see the callsite the receiver type you want should be recorded and have more than 90% of counts. The overhead of BciProfileWidth may be substantial for the startup (slower profiling), but it won?t affect the final code (unless inlining happens differently, in a bad way). Another approach would be to delay the moment when the profiling starts. By, basically, bumping up the values of these guys: product(intx, Tier3InvocationThreshold, 200, \ "Compile if number of method invocations crosses this " \ "threshold") \ \ product(intx, Tier3MinInvocationThreshold, 100, \ "Minimum invocation to compile at tier 3") \ \ product(intx, Tier3CompileThreshold, 2000, \ "Threshold at which tier 3 compilation is invoked (invocation " \ "minimum must be satisfied)") \ Compilation will happen, when the following predicate gets true: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); Here, i is invocations, b is backedges, scale is something dynamic but is >= 1. igor > On Jul 22, 2015, at 10:37 AM, Carsten Varming wrote: > > Dear Hotspot compiler group, > > I have had a few issues with tiered compilation in JDK8 lately and was wondering if you have some comments or ideas for the given problem. > > Here is my problem as I currently understand it. Feel free to correct any misunderstandings I may have. With tiered compilation the heuristics for inlining virtual calls seems to degrade quite a bit. I think this is due to MethodData objects being created much earlier with tiered than without. This causes the tracking of the hottest target methods at a virtual call site to go awry, due to the limit (2) on the number of MethodData objects that can be associated with a bci in a method. It seems like the only virtual call targets tracked are the targets that are warm when when C1 is invoked. > > The program ends up with all call-sites in scala.collection.IndexedSeqOptimized.slice using virtual dispatch with tiered and bimorphic call sites without tiered. The end result with tiered is a tripling of the cpu required to run the program, and instruction pointers from the compiled slice method end up in 90% of all cpu samples (collected with perf at 4kHz). > > The problem is with a small application built in Scala on top of Netty. I have written a small sample program (see attached Main.java) to spare you the details (and to be able to give you code). > > When I run the sample program with tiered then the call to count end up being a virtual call, due to Instance$3.count and Instance4.count being warm when C1 kicks in. Without tiered Instance$1.count is the only hot method. > > I wonder if you guys have seen this problem in the wild or if I just happen to be unlucky. Increasing BciProfileWidth should help in my case, but it is not a product flag. Do you have any experience regarding cost of increasing BciProfileWidth? Do you have any thoughts on throwing out MethodData objects for virtual call sites that turns out to be pretty cold? > > Thank you in advance for your thoughts, > Carsten > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleksey.shipilev at oracle.com Fri Jul 24 09:49:10 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 24 Jul 2015 12:49:10 +0300 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55AF500B.9000505@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> Message-ID: <55B20A16.7020300@oracle.com> (explicitly cc'ing AArch64 and PPC folks) Thanks, -Aleksey On 22.07.2015 11:10, Aleksey Shipilev wrote: > Thanks for review, Dean! > > I'd like to hear the opinions of AArch64 and Power folks, since we > contaminate their assemblers a bit to gain access to x86 fat nops. > > -Aleksey > > On 21.07.2015 23:28, Dean Long wrote: >> This version looks good. >> >> dl >> >> On 7/20/2015 7:51 AM, Aleksey Shipilev wrote: >>> Hi Dean, >>> >>> Thanks for taking a look! >>> >>> Silly me, I should have left the call patching cases intact, because >>> you're right, we should be able to patch the nops partially while still >>> producing the correct instruction stream. Therefore, I reverted the >>> cases where we do nop-ing for *instruction* patching, and added the >>> comment there. >>> >>> Other places seem to use the nop sequences to provide the alignment, not >>> for the general patching. Especially interesting for us is the case of >>> aligning the patcheable immediate in the existing call. C2 does the nops >>> in these cases. >>> >>> New webrev: >>> http://cr.openjdk.java.net/~shade/8131682/webrev.01/ >>> >>> Testing: >>> * JPRT -testset hotspot on open platforms; >>> * Targeted benchmarks, plus eyeballing the assembly; >>> >>> Thanks, >>> -Aleksey >>> >>> On 18.07.2015 10:51, Dean Long wrote: >>>> I think we should distinguish the different uses and treat them >>>> accordingly: >>>> >>>> 1) padding nops for patching, executed >>>> >>>> We need to be careful about inserting a fat nop here, if later patching >>>> overwrites only part of the fat nop, resulting in an illegal intruction. >>>> >>>> 2) padding nops for patching, never executed >>>> >>>> It should be safe insert a fat nop here, but there's no point if the >>>> nops are not reachable and never executed. >>>> >>>> >>>> 3) alignment nops, never patched, executed >>>> >>>> Fat nops are fine, but on some CPUs branching may be even better, so I >>>> suggest using align() for this, and letting align() decide what to >>>> generate. The change in check_icache() could use a version of align >>>> that takes the target offset as an argument: >>>> >>>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size); >>>> >>>> 4) alignment nops, never patched, never executed >>>> >>>> Doesn't matter what we emit here, but we might as well make it >>>> understandable by humans using a debugger. >>>> >>>> >>>> I believe the patching nops in c1_CodeStubs_x86.cpp and >>>> c1_LIRAssembler.cpp are patched concurrently while the code is running, >>>> not at a safepoint, so it's not clear to me if it's safe to use fat nops >>>> on x86. I would consider those changes unsafe on x86 without further >>>> analysis of what happens during patching. >>>> >>>> dl >>>> >>>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote: >>>>> Hi there, >>>>> >>>>> C1 is not very good at inlining and intrisifying methods, and hence the >>>>> call performance is important there. One nit that we can see in the >>>>> generated code on x86 is that C1 uses the single-byte nops, even for >>>>> long nop strides. >>>>> >>>>> This improvement fixes that: >>>>> https://bugs.openjdk.java.net/browse/JDK-8131682 >>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.00/ >>>>> >>>>> Testing: >>>>> - JPRT -testset hotspot on open platforms >>>>> - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 >>>>> >>>>> (I understand the symmetric change is going to be needed in closed >>>>> parts, but let's polish the open part first). >>>>> >>>>> Thanks, >>>>> -Aleksey >>>>> >>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Fri Jul 24 09:50:26 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 24 Jul 2015 12:50:26 +0300 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: <55AE1972.4050106@oracle.com> References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> Message-ID: <55B20A62.2090307@oracle.com> On 21.07.2015 13:05, Aleksey Shipilev wrote: > On 21.07.2015 00:14, John Rose wrote: >> On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev wrote: >>> On the road from Unsafe to VarHandles lies a small deficiency in C1 >>> Class.cast/isInstance optimization: the canonicalizer folds constant >>> class perfectly when it is coming from "inlined" constant, but not from >>> static final, because the constant "shapes" are different: >>> https://bugs.openjdk.java.net/browse/JDK-8131782 > >> I suggest a deeper fix, to the factory that produces the oddly formatted constant. >> That may help with other, similar constant folding problems. > > All right, let's do that! > http://cr.openjdk.java.net/~shade/8131782/webrev.02/ > > I respinned it through JRPT and my targeted benchmarks, and it performs > the same as previous patch. Any other reviews pending? If not, please sponsor! Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From goetz.lindenmaier at sap.com Fri Jul 24 10:38:10 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 24 Jul 2015 10:38:10 +0000 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B20A16.7020300@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> Hi Aleksey, thanks for pointing us to that change! Looks good, but does not compile. Default arg should only be in the header. See below. Ppc part reviewed and I don?t need a new webrev. Best regards, Goetz. --- a/src/cpu/ppc/vm/assembler_ppc.inline. +++ b/src/cpu/ppc/vm/assembler_ppc.inline.hpp @@ -210,7 +210,7 @@ inline void Assembler::extsw( Register a, Register s) { emit_int32(EXTSW_OPCODE | rta(a) | rs(s) | rc(0)); } // extended mnemonics -inline void Assembler::nop() { Assembler::ori(R0, R0, 0); } +inline void Assembler::nop(int count) { for (int i = 0; i < count; i++) { Assembler::ori(R0, R0, 0); } } // NOP for FP and BR units (different versions to allow them to be in one group) inline void Assembler::fpnop0() { Assembler::fmr(F30, F30); } inline void Assembler::fpnop1() { Assembler::fmr(F31, F31); } g++ 4.8.3: In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.inline.hpp:43:0, from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/macroAssembler_ppc.inline.hpp:29, from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/macroAssembler.inline.hpp:43, from ../generated/adfiles/ad_ppc_64.cpp:56: /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.inline.hpp:213:41: error: default argument given for parameter 1 of void Assembler::nop(int) [-fpermissive] inline void Assembler::nop(int count = 1) { for(int i = 0; i < count; i++) ^ In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.hpp:434:0, from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/nativeInst_ppc.hpp:29, from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/code/nativeInst.hpp:41, from ../generated/adfiles/ad_ppc_64.hpp:57, from ../generated/adfiles/ad_ppc_64.cpp:54: /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.hpp:1383:15: error: after previous specification in void Assembler::nop(int) [-fpermissive] inline void nop(int count = 1); ^ -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Aleksey Shipilev Sent: Friday, July 24, 2015 11:49 AM To: Dean Long; hotspot compiler Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Re: RFR (S) 8131682: C1 should use multibyte nops everywhere * PGP Signed by an unknown key (explicitly cc'ing AArch64 and PPC folks) Thanks, -Aleksey On 22.07.2015 11:10, Aleksey Shipilev wrote: > Thanks for review, Dean! > > I'd like to hear the opinions of AArch64 and Power folks, since we > contaminate their assemblers a bit to gain access to x86 fat nops. > > -Aleksey > > On 21.07.2015 23:28, Dean Long wrote: >> This version looks good. >> >> dl >> >> On 7/20/2015 7:51 AM, Aleksey Shipilev wrote: >>> Hi Dean, >>> >>> Thanks for taking a look! >>> >>> Silly me, I should have left the call patching cases intact, because >>> you're right, we should be able to patch the nops partially while still >>> producing the correct instruction stream. Therefore, I reverted the >>> cases where we do nop-ing for *instruction* patching, and added the >>> comment there. >>> >>> Other places seem to use the nop sequences to provide the alignment, not >>> for the general patching. Especially interesting for us is the case of >>> aligning the patcheable immediate in the existing call. C2 does the nops >>> in these cases. >>> >>> New webrev: >>> http://cr.openjdk.java.net/~shade/8131682/webrev.01/ >>> >>> Testing: >>> * JPRT -testset hotspot on open platforms; >>> * Targeted benchmarks, plus eyeballing the assembly; >>> >>> Thanks, >>> -Aleksey >>> >>> On 18.07.2015 10:51, Dean Long wrote: >>>> I think we should distinguish the different uses and treat them >>>> accordingly: >>>> >>>> 1) padding nops for patching, executed >>>> >>>> We need to be careful about inserting a fat nop here, if later patching >>>> overwrites only part of the fat nop, resulting in an illegal intruction. >>>> >>>> 2) padding nops for patching, never executed >>>> >>>> It should be safe insert a fat nop here, but there's no point if the >>>> nops are not reachable and never executed. >>>> >>>> >>>> 3) alignment nops, never patched, executed >>>> >>>> Fat nops are fine, but on some CPUs branching may be even better, so I >>>> suggest using align() for this, and letting align() decide what to >>>> generate. The change in check_icache() could use a version of align >>>> that takes the target offset as an argument: >>>> >>>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size); >>>> >>>> 4) alignment nops, never patched, never executed >>>> >>>> Doesn't matter what we emit here, but we might as well make it >>>> understandable by humans using a debugger. >>>> >>>> >>>> I believe the patching nops in c1_CodeStubs_x86.cpp and >>>> c1_LIRAssembler.cpp are patched concurrently while the code is running, >>>> not at a safepoint, so it's not clear to me if it's safe to use fat nops >>>> on x86. I would consider those changes unsafe on x86 without further >>>> analysis of what happens during patching. >>>> >>>> dl >>>> >>>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote: >>>>> Hi there, >>>>> >>>>> C1 is not very good at inlining and intrisifying methods, and hence the >>>>> call performance is important there. One nit that we can see in the >>>>> generated code on x86 is that C1 uses the single-byte nops, even for >>>>> long nop strides. >>>>> >>>>> This improvement fixes that: >>>>> https://bugs.openjdk.java.net/browse/JDK-8131682 >>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.00/ >>>>> >>>>> Testing: >>>>> - JPRT -testset hotspot on open platforms >>>>> - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 >>>>> >>>>> (I understand the symmetric change is going to be needed in closed >>>>> parts, but let's polish the open part first). >>>>> >>>>> Thanks, >>>>> -Aleksey >>>>> >>> >> > > * Unknown Key * 0x62A119A7 From tobias.hartmann at oracle.com Fri Jul 24 11:29:21 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 24 Jul 2015 13:29:21 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> Message-ID: <55B22191.9070904@oracle.com> Hi Roland, thanks for the review! On 23.07.2015 15:51, Roland Westrelin wrote: > assembler.cpp > > 68 Compile::current()->env()->record_failure("CodeCache is full?); > > That assumes we are calling this from c2 but it can be called from c1 as well. You are right. I moved this code to the C2 methods calling 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods already contain a call to 'bailout()'. I also noticed that we multiply the 'to_interp_stub_size()' by 2 when creating the stub (see compiledIC_sparc.cpp): 68 __ start_a_stub(to_interp_stub_size()*2); This seems to be unnecessary and causes problems in "Compile::scratch_emit_size" because we fix the stub section of the CodeBuffer to MAX_stubs_size and therefore fail to emit the to-interpreter stub since to_interp_stub_size()*2 > MAX_stubs_size. I removed the multiplication and added an assert to 'Compile::scratch_emit_size()'. > Did you add code for c1 to be on the safe side or have you observed problems with c1? I did not encounter the problem with C1 but added the checks to be on the safe side. Actually, we do call bailout() in C1 'LIR_Assembler::emit_static_call_stub()' if the stub cannot be created but without checking for bailed_out() in the calling method we will continue to emit code and crash (see explanation below). > "the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.? > > What addresses are set to badAddress? If CodeBuffer::expand() fails, the corresponding buffer blob in the code cache is freed and code section addresses are set to 'badAddress' (see CodeBuffer::set_blob(NULL)). If we continue to emit code into this buffer or use its section addresses, we fail. The assert "wrong size of mach node" is hit because we use CodeBuffer::insts_size() which is now -1647855886. Another instance of this bug fails because we continue to emit code, call CodeSection::emit_int32() and fail because end() is set to 'badAddress'. I did some more testing with "java -XX:+StressCodeBuffers -Xcomp -version" and found more places where we have to check for a failed CodeBuffer expansion to not continue to emit code and crash. I added the corresponding checks. Here is the new webrev: http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ Thanks, Tobias > > Roland. > >> >> Thanks, >> Tobias >> >> On 21.07.2015 15:40, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8130309 >>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ >>> >>> Problem: >>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. >>> >>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): >>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes >>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes >>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. >>> >>> Solution: >>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. >>> >>> Testing: >>> - Failing test >>> - JPRT >>> >>> Thanks, >>> Tobias >>> > From Andrey.Saenkov at oracle.com Fri Jul 24 15:37:12 2015 From: Andrey.Saenkov at oracle.com (Andrey Saenkov) Date: Fri, 24 Jul 2015 18:37:12 +0300 Subject: RFR: 8079667: port vm/compiler/AESIntrinsics/CheckIntrinsics into jtreg In-Reply-To: <55B259AF.2070700@oracle.com> References: <55B259AF.2070700@oracle.com> Message-ID: <55B25BA8.4040201@oracle.com> Hi, could you please review the patch which ports tests for AES Intrinsics from closed jdk to the open. Unfortunately bug is marked as confidential, so you can't view its content unless you have required permissions. Description: Tests for AESIntrinsics ported from closed jdk to open jdk Link to webrev: http://cr.openjdk.java.net/~iignatyev/asaenkov/8079667/webrev.00/ Link to bug: https://jbs.oracle.com/bugs/browse/JDK-8079667 Testing done: Tested locally on Ubuntu x64 with AES support and tested on all other platforms using inner tool. Tests fail on sparc due to: JDK-8131778 -- Andrey Saenkov -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Jul 24 16:21:26 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 24 Jul 2015 09:21:26 -0700 Subject: RFR (M): 8132207: Update for x86 exp in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com> Message-ID: > On Jul 23, 2015, at 1:26 PM, Deshpande, Vivek R wrote: > > Hi all > > I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM implementation. > Please review and sponsor this patch. > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 > > webrev: > http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ I have waited for this a long time and I applaud the change! But I?m baffled by the complexity of the implementation ;-) Are there any comments in the original Intel libm implementation which we could add here as well? > > Thanks, > Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From vivek.r.deshpande at intel.com Thu Jul 23 18:01:11 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Thu, 23 Jul 2015 18:01:11 +0000 Subject: RFR (M): 8132207: Update for x86 exp in the math lib Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A5665199D@ORSMSX106.amr.corp.intel.com> Hi all I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM implementation. Please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 webrev: http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandhya.viswanathan at intel.com Fri Jul 24 17:04:00 2015 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 24 Jul 2015 17:04:00 +0000 Subject: RFR (M): 8132207: Update for x86 exp in the math lib In-Reply-To: References: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B633A7A4D@FMSMSX112.amr.corp.intel.com> Hi Christian, This is generated code by the ICC, that?s why you don?t see any line by line comments here. The algorithm used by Intel libm is given in comments as a preamble. Best Regards, Sandhya From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Christian Thalinger Sent: Friday, July 24, 2015 9:21 AM To: Deshpande, Vivek R Cc: Vladimir.Kozlov at oracle.com; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M): 8132207: Update for x86 exp in the math lib On Jul 23, 2015, at 1:26 PM, Deshpande, Vivek R > wrote: Hi all I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM implementation. Please review and sponsor this patch. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 webrev: http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ I have waited for this a long time and I applaud the change! But I?m baffled by the complexity of the implementation ;-) Are there any comments in the original Intel libm implementation which we could add here as well? Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Jul 24 17:18:10 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 24 Jul 2015 10:18:10 -0700 Subject: RFR (M): 8132207: Update for x86 exp in the math lib In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B633A7A4D@FMSMSX112.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A56651D97@ORSMSX106.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B633A7A4D@FMSMSX112.amr.corp.intel.com> Message-ID: <4B888824-AC50-4E46-8693-D1A52E86A72B@oracle.com> Got it. Thanks. > On Jul 24, 2015, at 10:04 AM, Viswanathan, Sandhya wrote: > > Hi Christian, > This is generated code by the ICC, that?s why you don?t see any line by line comments here. The algorithm used by Intel libm is given in comments as a preamble. > Best Regards, > Sandhya > > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Christian Thalinger > Sent: Friday, July 24, 2015 9:21 AM > To: Deshpande, Vivek R > Cc: Vladimir.Kozlov at oracle.com; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M): 8132207: Update for x86 exp in the math lib > > > On Jul 23, 2015, at 1:26 PM, Deshpande, Vivek R > wrote: > > Hi all > > I would like to contribute a patch which optimizes Math.exp() for 64 and 32 bit X86 architecture using Intel LIBM implementation. > Please review and sponsor this patch. > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 > > webrev: > http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ > > I have waited for this a long time and I applaud the change! But I?m baffled by the complexity of the implementation ;-) Are there any comments in the original Intel libm implementation which we could add here as well? > > > > Thanks, > Vivek > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christos at zoulas.com Fri Jul 24 18:37:26 2015 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 24 Jul 2015 14:37:26 -0400 Subject: RFR (M): 8132207: Update for x86 exp in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A5665199D@ORSMSX106.amr.corp.intel.com> from "Deshpande, Vivek R" (Jul 23, 6:01pm) Message-ID: <20150724183726.3B9C717FDA8@rebar.astron.com> On Jul 23, 6:01pm, vivek.r.deshpande at intel.com ("Deshpande, Vivek R") wrote: -- Subject: RFR (M): 8132207: Update for x86 exp in the math lib | Hi all | | I would like to contribute a patch which optimizes Math.exp() for 64 and 32= | bit X86 architecture using Intel LIBM implementation. | Please review and sponsor this patch. | | Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 | | webrev: | http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ I would be very careful with changes like this. Are you sure that this produces identical results with the current implementation? In the past we've had problems with the values of transcendental functions because JIT replaced the java implementations with native copies and sometimes this were off by 1ULP. This made the following code fail: public class LogTest { public static void main(String[] args) { double n = Integer.parseInt("17197"); double d = Math.log(n); System.out.println("n=" + n + ",log(n)=" + d); for (int i = 0; i < 100000; i++) { double e = Math.log(n); if (e != d) { System.err.println("ERROR after " + i + " iterations:\n" + "previous value: " + d + " (" + Long.toHexString(Double.doubleToLongBits(d)) + ")\n" + " current value: " + e + " (" + Long.toHexString(Double.doubleToLongBits(e)) + ")"); System.exit(1); } } System.err.println("SUCCESS!"); System.exit(0); } } christos From vivek.r.deshpande at intel.com Fri Jul 24 18:56:51 2015 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Fri, 24 Jul 2015 18:56:51 +0000 Subject: RFR (M): 8132207: Update for x86 exp in the math lib In-Reply-To: <20150724183726.3B9C717FDA8@rebar.astron.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A5665199D@ORSMSX106.amr.corp.intel.com> from "Deshpande, Vivek R" (Jul 23, 6:01pm) <20150724183726.3B9C717FDA8@rebar.astron.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A56655B27@ORSMSX106.amr.corp.intel.com> Hi Christos You have a very good point. We have made sure that, interpreter, c1 and c2 give same result by using same stub. These results also meet the 1 ulp requirement. Regards, Vivek -----Original Message----- From: Christos Zoulas [mailto:christos at zoulas.com] Sent: Friday, July 24, 2015 11:37 AM To: Deshpande, Vivek R; hotspot-compiler-dev at openjdk.java.net Cc: Vladimir.Kozlov at oracle.com Subject: Re: RFR (M): 8132207: Update for x86 exp in the math lib On Jul 23, 6:01pm, vivek.r.deshpande at intel.com ("Deshpande, Vivek R") wrote: -- Subject: RFR (M): 8132207: Update for x86 exp in the math lib | Hi all | | I would like to contribute a patch which optimizes Math.exp() for 64 | and 32= bit X86 architecture using Intel LIBM implementation. | Please review and sponsor this patch. | | Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132207 | | webrev: | http://cr.openjdk.java.net/~mcberg/8132207/webrev.01/ I would be very careful with changes like this. Are you sure that this produces identical results with the current implementation? In the past we've had problems with the values of transcendental functions because JIT replaced the java implementations with native copies and sometimes this were off by 1ULP. This made the following code fail: public class LogTest { public static void main(String[] args) { double n = Integer.parseInt("17197"); double d = Math.log(n); System.out.println("n=" + n + ",log(n)=" + d); for (int i = 0; i < 100000; i++) { double e = Math.log(n); if (e != d) { System.err.println("ERROR after " + i + " iterations:\n" + "previous value: " + d + " (" + Long.toHexString(Double.doubleToLongBits(d)) + ")\n" + " current value: " + e + " (" + Long.toHexString(Double.doubleToLongBits(e)) + ")"); System.exit(1); } } System.err.println("SUCCESS!"); System.exit(0); } } christos From christos at zoulas.com Fri Jul 24 19:17:20 2015 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 24 Jul 2015 15:17:20 -0400 Subject: RFR (M): 8132207: Update for x86 exp in the math lib In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A56655B27@ORSMSX106.amr.corp.intel.com> from "Deshpande, Vivek R" (Jul 24, 6:56pm) Message-ID: <20150724191720.E77B117FDA8@rebar.astron.com> On Jul 24, 6:56pm, vivek.r.deshpande at intel.com ("Deshpande, Vivek R") wrote: -- Subject: RE: RFR (M): 8132207: Update for x86 exp in the math lib | Hi Christos | | You have a very good point. We have made sure that, interpreter, c1 and c2 = | give same result by using same stub. | These results also meet the 1 ulp requirement. Excellent, many thanks! I was just making sure :-) I hate debugging these kinds of things... christos From dean.long at oracle.com Fri Jul 24 20:19:36 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 24 Jul 2015 13:19:36 -0700 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: <55B20A62.2090307@oracle.com> References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> Message-ID: <55B29DD8.40008@oracle.com> I can push it for you. Do you need another review? dl On 7/24/2015 2:50 AM, Aleksey Shipilev wrote: > On 21.07.2015 13:05, Aleksey Shipilev wrote: >> On 21.07.2015 00:14, John Rose wrote: >>> On Jul 20, 2015, at 6:52 AM, Aleksey Shipilev wrote: >>>> On the road from Unsafe to VarHandles lies a small deficiency in C1 >>>> Class.cast/isInstance optimization: the canonicalizer folds constant >>>> class perfectly when it is coming from "inlined" constant, but not from >>>> static final, because the constant "shapes" are different: >>>> https://bugs.openjdk.java.net/browse/JDK-8131782 >>> I suggest a deeper fix, to the factory that produces the oddly formatted constant. >>> That may help with other, similar constant folding problems. >> All right, let's do that! >> http://cr.openjdk.java.net/~shade/8131782/webrev.02/ >> >> I respinned it through JRPT and my targeted benchmarks, and it performs >> the same as previous patch. > Any other reviews pending? If not, please sponsor! > > Thanks, > -Aleksey > > From dean.long at oracle.com Fri Jul 24 20:34:02 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 24 Jul 2015 13:34:02 -0700 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B22191.9070904@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> Message-ID: <55B2A13A.4000604@oracle.com> On 7/24/2015 4:29 AM, Tobias Hartmann wrote: > Hi Roland, > > thanks for the review! > > On 23.07.2015 15:51, Roland Westrelin wrote: >> assembler.cpp >> >> 68 Compile::current()->env()->record_failure("CodeCache is full?); >> >> That assumes we are calling this from c2 but it can be called from c1 as well. > You are right. I moved this code to the C2 methods calling 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods already contain a call to 'bailout()'. In the new webrev, aarch64 emit_trampoline_stub still calls Compile::current()->env()->record_failure, and it appears that emit_trampoline_stub can be called from C1. Shouldn't we fix it so that ciEnv::record_failure works correctly from C1? Why does C1 need a different bailout message? dl > I also noticed that we multiply the 'to_interp_stub_size()' by 2 when creating the stub (see compiledIC_sparc.cpp): > > 68 __ start_a_stub(to_interp_stub_size()*2); > > This seems to be unnecessary and causes problems in "Compile::scratch_emit_size" because we fix the stub section of the CodeBuffer to MAX_stubs_size and therefore fail to emit the to-interpreter stub since to_interp_stub_size()*2 > MAX_stubs_size. I removed the multiplication and added an assert to 'Compile::scratch_emit_size()'. > >> Did you add code for c1 to be on the safe side or have you observed problems with c1? > I did not encounter the problem with C1 but added the checks to be on the safe side. Actually, we do call bailout() in C1 'LIR_Assembler::emit_static_call_stub()' if the stub cannot be created but without checking for bailed_out() in the calling method we will continue to emit code and crash (see explanation below). > >> "the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.? >> >> What addresses are set to badAddress? > If CodeBuffer::expand() fails, the corresponding buffer blob in the code cache is freed and code section addresses are set to 'badAddress' (see CodeBuffer::set_blob(NULL)). If we continue to emit code into this buffer or use its section addresses, we fail. The assert "wrong size of mach node" is hit because we use CodeBuffer::insts_size() which is now -1647855886. Another instance of this bug fails because we continue to emit code, call CodeSection::emit_int32() and fail because end() is set to 'badAddress'. > > I did some more testing with "java -XX:+StressCodeBuffers -Xcomp -version" and found more places where we have to check for a failed CodeBuffer expansion to not continue to emit code and crash. I added the corresponding checks. > > Here is the new webrev: > http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ > > Thanks, > Tobias > >> Roland. >> >>> Thanks, >>> Tobias >>> >>> On 21.07.2015 15:40, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8130309 >>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ >>>> >>>> Problem: >>>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. >>>> >>>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): >>>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes >>>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes >>>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. >>>> >>>> Solution: >>>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. >>>> >>>> Testing: >>>> - Failing test >>>> - JPRT >>>> >>>> Thanks, >>>> Tobias >>>> From dean.long at oracle.com Fri Jul 24 21:21:44 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 24 Jul 2015 14:21:44 -0700 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B2A13A.4000604@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com> Message-ID: <55B2AC68.7030900@oracle.com> If TraceJumps causes problems on Sparc, then I think the Sparc version of to_interp_stub_size() needs to be adjusted to take TraceJumps into account. dl From john.r.rose at oracle.com Sat Jul 25 00:12:35 2015 From: john.r.rose at oracle.com (John Rose) Date: Fri, 24 Jul 2015 17:12:35 -0700 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: <55B29DD8.40008@oracle.com> References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> <55B29DD8.40008@oracle.com> Message-ID: On Jul 24, 2015, at 1:19 PM, Dean Long wrote: > > I can push it for you. Do you need another review? I don't think he does; it's a simple change. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Sat Jul 25 01:22:41 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 24 Jul 2015 18:22:41 -0700 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B2A13A.4000604@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com> Message-ID: <55B2E4E1.6060503@oracle.com> On 7/24/2015 1:34 PM, Dean Long wrote: > On 7/24/2015 4:29 AM, Tobias Hartmann wrote: >> Hi Roland, >> >> thanks for the review! >> >> On 23.07.2015 15:51, Roland Westrelin wrote: >>> assembler.cpp >>> >>> 68 Compile::current()->env()->record_failure("CodeCache is full?); >>> >>> That assumes we are calling this from c2 but it can be called from >>> c1 as well. >> You are right. I moved this code to the C2 methods calling >> 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods >> already contain a call to 'bailout()'. > > In the new webrev, aarch64 emit_trampoline_stub still calls > Compile::current()->env()->record_failure, > and it appears that emit_trampoline_stub can be called from C1. > Shouldn't we fix it so that > ciEnv::record_failure works correctly from C1? Why does C1 need a > different bailout message? > > dl I went ahead and file a separate RFE for the bailout issue: https://bugs.openjdk.java.net/browse/JDK-8132354 dl From dean.long at oracle.com Sat Jul 25 01:35:17 2015 From: dean.long at oracle.com (Dean Long) Date: Fri, 24 Jul 2015 18:35:17 -0700 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> <55B29DD8.40008@oracle.com> Message-ID: <55B2E7D5.7060806@oracle.com> OK I will push it now. I did 'hg ci -u shade' so that Aleksey gets credit for it, and jcheck isn't complaining. Does anyone know if JPRT does more checks for Committer status? If so I'll have to redo it and add a Contributed-by line. dl changeset: 8724:df802f98b828 tag: tip user: shade date: Fri Jul 24 21:29:11 2015 -0400 files: src/share/vm/c1/c1_ValueType.cpp description: 8131782: C1 Class.cast optimization breaks when Class is loaded from static final Summary: change as_ValueType() to return InstanceConstant when appropriate Reviewed-by: jrose On 7/24/2015 5:12 PM, John Rose wrote: > On Jul 24, 2015, at 1:19 PM, Dean Long > wrote: >> >> I can push it for you. Do you need another review? > > I don't think he does; it's a simple change. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Sat Jul 25 02:32:02 2015 From: john.r.rose at oracle.com (John Rose) Date: Fri, 24 Jul 2015 19:32:02 -0700 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: <55B2E7D5.7060806@oracle.com> References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> <55B29DD8.40008@oracle.com> <55B2E7D5.7060806@oracle.com> Message-ID: You are fine. Aleksey is an Author for JDK 9: http://openjdk.java.net/census#shade Committers who sponsor changes are expected to use the correct Author (not themselves). It's a syntax error to push a non-Author changeset to an OpenJDK repo. BTW, jcheck does not consult the OJN census AFAIK. ? John On Jul 24, 2015, at 6:35 PM, Dean Long wrote: > > OK I will push it now. I did 'hg ci -u shade' so that Aleksey gets credit for it, and jcheck isn't complaining. Does anyone know if JPRT does more checks for Committer status? If so I'll have to redo it and add a Contributed-by line. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Mon Jul 27 05:44:27 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 27 Jul 2015 07:44:27 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B2A13A.4000604@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com> Message-ID: <55B5C53B.5080606@oracle.com> Hi Dean, On 24.07.2015 22:34, Dean Long wrote: > On 7/24/2015 4:29 AM, Tobias Hartmann wrote: >> Hi Roland, >> >> thanks for the review! >> >> On 23.07.2015 15:51, Roland Westrelin wrote: >>> assembler.cpp >>> >>> 68 Compile::current()->env()->record_failure("CodeCache is full?); >>> >>> That assumes we are calling this from c2 but it can be called from c1 as well. >> You are right. I moved this code to the C2 methods calling 'AbstractAssembler::start_a_stub()'. The corresponding C1 methods already contain a call to 'bailout()'. > > In the new webrev, aarch64 emit_trampoline_stub still calls Compile::current()->env()->record_failure, > and it appears that emit_trampoline_stub can be called from C1. Shouldn't we fix it so that > ciEnv::record_failure works correctly from C1? Why does C1 need a different bailout message? Roland was referring to AbstractAssembler::start_a_stub() which is called from C1 and C2. However, CompiledStaticCall::emit_to_interp_stub() is only used by C2. C1 emits stubs via LIR_Assembler, for example LIR_Assembler::emit_static_call_stub(). I agree that it would be best to have a shared bailout mechanism, +1 to JDK-8132354. Best, Tobias > dl > >> I also noticed that we multiply the 'to_interp_stub_size()' by 2 when creating the stub (see compiledIC_sparc.cpp): >> >> 68 __ start_a_stub(to_interp_stub_size()*2); >> >> This seems to be unnecessary and causes problems in "Compile::scratch_emit_size" because we fix the stub section of the CodeBuffer to MAX_stubs_size and therefore fail to emit the to-interpreter stub since to_interp_stub_size()*2 > MAX_stubs_size. I removed the multiplication and added an assert to 'Compile::scratch_emit_size()'. >> >>> Did you add code for c1 to be on the safe side or have you observed problems with c1? >> I did not encounter the problem with C1 but added the checks to be on the safe side. Actually, we do call bailout() in C1 'LIR_Assembler::emit_static_call_stub()' if the stub cannot be created but without checking for bailed_out() in the calling method we will continue to emit code and crash (see explanation below). >> >>> "the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert.? >>> >>> What addresses are set to badAddress? >> If CodeBuffer::expand() fails, the corresponding buffer blob in the code cache is freed and code section addresses are set to 'badAddress' (see CodeBuffer::set_blob(NULL)). If we continue to emit code into this buffer or use its section addresses, we fail. The assert "wrong size of mach node" is hit because we use CodeBuffer::insts_size() which is now -1647855886. Another instance of this bug fails because we continue to emit code, call CodeSection::emit_int32() and fail because end() is set to 'badAddress'. >> >> I did some more testing with "java -XX:+StressCodeBuffers -Xcomp -version" and found more places where we have to check for a failed CodeBuffer expansion to not continue to emit code and crash. I added the corresponding checks. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ >> >> Thanks, >> Tobias >> >>> Roland. >>> >>>> Thanks, >>>> Tobias >>>> >>>> On 21.07.2015 15:40, Tobias Hartmann wrote: >>>>> Hi, >>>>> >>>>> please review the following patch. >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8130309 >>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.00/ >>>>> >>>>> Problem: >>>>> While C2 is emitting code, an assert is hit because the emitted code size does not correspond to the size of the instruction. The problem is that the code cache is full and therefore the creation of a to-interpreter stub failed. Instead of bailing out we continue and hit the assert. >>>>> >>>>> More precisely, we emit code for a CallStaticJavaDirectNode on aarch64 and emit two stubs (see 'aarch64_enc_java_static_call' in aarch64.ad): >>>>> - MacroAssembler::emit_trampoline_stub() -> requires 64 bytes >>>>> - CompiledStaticCall::emit_to_interp_stub() -> requires 56 bytes >>>>> However, we only have 112 bytes of free space in the stub section of the code buffer and need to expand it. Since the code cache is full, the expansion fails and the corresponding code blob is freed. In CodeBuffer::free_blob() -> CodeBuffer::set_blob() we set addresses to 'badAddress' and therefore hit the assert. >>>>> >>>>> Solution: >>>>> Even if we have enough space in the instruction section, we should check for a failed expansion of the stub section and bail out immediately. I added the corresponding check and also removed an unnecessary call to 'start_a_stub' after we already failed. >>>>> >>>>> Testing: >>>>> - Failing test >>>>> - JPRT >>>>> >>>>> Thanks, >>>>> Tobias >>>>> > From tobias.hartmann at oracle.com Mon Jul 27 05:58:14 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 27 Jul 2015 07:58:14 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B2AC68.7030900@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <55B2A13A.4000604@oracle.com> <55B2AC68.7030900@oracle.com> Message-ID: <55B5C876.6020701@oracle.com> On 24.07.2015 23:21, Dean Long wrote: > If TraceJumps causes problems on Sparc, then I think the Sparc version of to_interp_stub_size() needs to be adjusted to take TraceJumps into account. No, the Sparc version of to_interp_stub_size() already takes TraceJumps into account: 90 int CompiledStaticCall::to_interp_stub_size() { 91 // This doesn't need to be accurate but it must be larger or equal to 92 // the real size of the stub. 93 return (NativeMovConstReg::instruction_size + // sethi/setlo; 94 NativeJump::instruction_size + // sethi; jmp; nop 95 (TraceJumps ? 20 * BytesPerInstWord : 0) ); 96 } The problem is that the additional code needed for TraceJumps does not fit into the scratch buffer because we only allocate 'MAX_stubs_size' for stubs and cannot expand the buffer (see Compile::scratch_emit_size()). Other solutions would be to increase 'MAX_stubs_size' (which does not make sense because this is only a debug case) or to not emit the stub if we are 'in_scratch_emit_size()'. I saw you filed JDK-8132344 which should take care of this issue in general. Best, Tobias > > dl From aleksey.shipilev at oracle.com Mon Jul 27 08:51:59 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 27 Jul 2015 11:51:59 +0300 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> <55B29DD8.40008@oracle.com> <55B2E7D5.7060806@oracle.com> Message-ID: <55B5F12F.5050904@oracle.com> Thanks for pushing the change, Dean! -Aleksey On 07/25/2015 05:32 AM, John Rose wrote: > You are fine. > Aleksey is an Author for JDK 9: http://openjdk.java.net/census#shade > Committers who sponsor changes are expected to use the correct Author > (not themselves). > It's a syntax error to push a non-Author changeset to an OpenJDK repo. > BTW, jcheck does not consult the OJN census AFAIK. > > ? John > > On Jul 24, 2015, at 6:35 PM, Dean Long > wrote: >> >> OK I will push it now. I did 'hg ci -u shade' so that Aleksey gets >> credit for it, and jcheck isn't complaining. Does anyone know if JPRT >> does more checks for Committer status? If so I'll have to redo it and >> add a Contributed-by line. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Mon Jul 27 09:13:53 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 27 Jul 2015 12:13:53 +0300 Subject: RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> Message-ID: <55B5F651.5090509@oracle.com> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. Andrew/Edward, are you OK with AArch64 part? http://cr.openjdk.java.net/~shade/8131682/webrev.02/ Thanks, -Aleksey On 07/24/2015 01:38 PM, Lindenmaier, Goetz wrote: > Hi Aleksey, > > thanks for pointing us to that change! > Looks good, but does not compile. Default arg should only be in the header. > See below. > > Ppc part reviewed and I don?t need a new webrev. > > Best regards, > Goetz. > > --- a/src/cpu/ppc/vm/assembler_ppc.inline. > +++ b/src/cpu/ppc/vm/assembler_ppc.inline.hpp > @@ -210,7 +210,7 @@ > inline void Assembler::extsw( Register a, Register s) { emit_int32(EXTSW_OPCODE | rta(a) | rs(s) | rc(0)); } > > // extended mnemonics > -inline void Assembler::nop() { Assembler::ori(R0, R0, 0); } > +inline void Assembler::nop(int count) { for (int i = 0; i < count; i++) { Assembler::ori(R0, R0, 0); } } > // NOP for FP and BR units (different versions to allow them to be in one group) > inline void Assembler::fpnop0() { Assembler::fmr(F30, F30); } > inline void Assembler::fpnop1() { Assembler::fmr(F31, F31); } > > > g++ 4.8.3: > In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.inline.hpp:43:0, > from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/macroAssembler_ppc.inline.hpp:29, > from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/macroAssembler.inline.hpp:43, > from ../generated/adfiles/ad_ppc_64.cpp:56: > /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.inline.hpp:213:41: error: default argument given for parameter 1 of void Assembler::nop(int) [-fpermissive] > inline void Assembler::nop(int count = 1) { for(int i = 0; i < count; i++) > ^ > In file included from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/asm/assembler.hpp:434:0, > from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/nativeInst_ppc.hpp:29, > from /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/share/vm/code/nativeInst.hpp:41, > from ../generated/adfiles/ad_ppc_64.hpp:57, > from ../generated/adfiles/ad_ppc_64.cpp:54: > /sapmnt/home1/d045726/oJ/8131682-hs-comp/src/cpu/ppc/vm/assembler_ppc.hpp:1383:15: error: after previous specification in void Assembler::nop(int) [-fpermissive] > inline void nop(int count = 1); > ^ > > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Aleksey Shipilev > Sent: Friday, July 24, 2015 11:49 AM > To: Dean Long; hotspot compiler > Cc: ppc-aix-port-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: Re: RFR (S) 8131682: C1 should use multibyte nops everywhere > > * PGP Signed by an unknown key > > (explicitly cc'ing AArch64 and PPC folks) > > Thanks, > -Aleksey > > On 22.07.2015 11:10, Aleksey Shipilev wrote: >> Thanks for review, Dean! >> >> I'd like to hear the opinions of AArch64 and Power folks, since we >> contaminate their assemblers a bit to gain access to x86 fat nops. >> >> -Aleksey >> >> On 21.07.2015 23:28, Dean Long wrote: >>> This version looks good. >>> >>> dl >>> >>> On 7/20/2015 7:51 AM, Aleksey Shipilev wrote: >>>> Hi Dean, >>>> >>>> Thanks for taking a look! >>>> >>>> Silly me, I should have left the call patching cases intact, because >>>> you're right, we should be able to patch the nops partially while still >>>> producing the correct instruction stream. Therefore, I reverted the >>>> cases where we do nop-ing for *instruction* patching, and added the >>>> comment there. >>>> >>>> Other places seem to use the nop sequences to provide the alignment, not >>>> for the general patching. Especially interesting for us is the case of >>>> aligning the patcheable immediate in the existing call. C2 does the nops >>>> in these cases. >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~shade/8131682/webrev.01/ >>>> >>>> Testing: >>>> * JPRT -testset hotspot on open platforms; >>>> * Targeted benchmarks, plus eyeballing the assembly; >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> On 18.07.2015 10:51, Dean Long wrote: >>>>> I think we should distinguish the different uses and treat them >>>>> accordingly: >>>>> >>>>> 1) padding nops for patching, executed >>>>> >>>>> We need to be careful about inserting a fat nop here, if later patching >>>>> overwrites only part of the fat nop, resulting in an illegal intruction. >>>>> >>>>> 2) padding nops for patching, never executed >>>>> >>>>> It should be safe insert a fat nop here, but there's no point if the >>>>> nops are not reachable and never executed. >>>>> >>>>> >>>>> 3) alignment nops, never patched, executed >>>>> >>>>> Fat nops are fine, but on some CPUs branching may be even better, so I >>>>> suggest using align() for this, and letting align() decide what to >>>>> generate. The change in check_icache() could use a version of align >>>>> that takes the target offset as an argument: >>>>> >>>>> 348 align(CodeEntryAlignment,__ offset() + ic_cmp_size); >>>>> >>>>> 4) alignment nops, never patched, never executed >>>>> >>>>> Doesn't matter what we emit here, but we might as well make it >>>>> understandable by humans using a debugger. >>>>> >>>>> >>>>> I believe the patching nops in c1_CodeStubs_x86.cpp and >>>>> c1_LIRAssembler.cpp are patched concurrently while the code is running, >>>>> not at a safepoint, so it's not clear to me if it's safe to use fat nops >>>>> on x86. I would consider those changes unsafe on x86 without further >>>>> analysis of what happens during patching. >>>>> >>>>> dl >>>>> >>>>> On 7/17/2015 6:29 AM, Aleksey Shipilev wrote: >>>>>> Hi there, >>>>>> >>>>>> C1 is not very good at inlining and intrisifying methods, and hence the >>>>>> call performance is important there. One nit that we can see in the >>>>>> generated code on x86 is that C1 uses the single-byte nops, even for >>>>>> long nop strides. >>>>>> >>>>>> This improvement fixes that: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8131682 >>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.00/ >>>>>> >>>>>> Testing: >>>>>> - JPRT -testset hotspot on open platforms >>>>>> - eyeballing the generated assembly with -XX:TieredStopAtLevel=1 >>>>>> >>>>>> (I understand the symmetric change is going to be needed in closed >>>>>> parts, but let's polish the open part first). >>>>>> >>>>>> Thanks, >>>>>> -Aleksey >>>>>> >>>> >>> >> >> > > > > * Unknown Key > * 0x62A119A7 > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Mon Jul 27 09:35:09 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 27 Jul 2015 10:35:09 +0100 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B5F651.5090509@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> Message-ID: <55B5FB4D.4070603@redhat.com> On 27/07/15 10:13, Aleksey Shipilev wrote: > Thanks Goetz! Fixed the assembler_ppc.inline.hpp. > > Andrew/Edward, are you OK with AArch64 part? > http://cr.openjdk.java.net/~shade/8131682/webrev.02/ Yes, it's fine. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From aph at redhat.com Mon Jul 27 10:21:47 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 27 Jul 2015 11:21:47 +0100 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B5F651.5090509@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> Message-ID: <55B6063B.3070604@redhat.com> On 27/07/15 10:13, Aleksey Shipilev wrote: > Thanks Goetz! Fixed the assembler_ppc.inline.hpp. > > Andrew/Edward, are you OK with AArch64 part? > http://cr.openjdk.java.net/~shade/8131682/webrev.02/ I agree that it looks good. Please have a look to see how many NOPs take the same time as a branch. Thanks, Andrew. From roland.westrelin at oracle.com Mon Jul 27 10:29:29 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 27 Jul 2015 12:29:29 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B22191.9070904@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> Message-ID: > Here is the new webrev: > http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think) Roland. From aleksey.shipilev at oracle.com Mon Jul 27 10:53:20 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 27 Jul 2015 13:53:20 +0300 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B6063B.3070604@redhat.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> Message-ID: <55B60DA0.30501@oracle.com> On 07/27/2015 01:21 PM, Andrew Haley wrote: > On 27/07/15 10:13, Aleksey Shipilev wrote: >> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >> >> Andrew/Edward, are you OK with AArch64 part? >> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ > > I agree that it looks good. Please have a look to see how many NOPs take the > same time as a branch. Thanks! I don't quite believe we should spend time trying branches for nops, at least for x86. The change we are discussing follows the Intel Optimization Reference Manual 3.5.1.10 "Using NOPs", which Assembler::align for x86 seems to implement with some bells and whistles. Agner agrees on using multi-byte nops (0F 1F ...) on modern x86 chips as well; up to the point he claims 4 insn/clock throughput for them. Is there a vendor-recommended strategy for using something else? Even if it's so, this calls for experimenting with Assembler::align itself (that also touches C2 usages), and not the C1-specific usages this trivial change addresses. Thanks again, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roland.westrelin at oracle.com Mon Jul 27 11:31:21 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 27 Jul 2015 13:31:21 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> Message-ID: <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> >> Here is the new webrev: >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ > > CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think) Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller? Roland. From aph at redhat.com Mon Jul 27 12:07:12 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 27 Jul 2015 13:07:12 +0100 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B60DA0.30501@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> <55B60DA0.30501@oracle.com> Message-ID: <55B61EF0.40803@redhat.com> On 07/27/2015 11:53 AM, Aleksey Shipilev wrote: > On 07/27/2015 01:21 PM, Andrew Haley wrote: >> On 27/07/15 10:13, Aleksey Shipilev wrote: >>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>> >>> Andrew/Edward, are you OK with AArch64 part? >>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >> >> I agree that it looks good. Please have a look to see how many NOPs take the >> same time as a branch. > > Thanks! > > I don't quite believe we should spend time trying branches for nops, at > least for x86. The change we are discussing follows the Intel > Optimization Reference Manual 3.5.1.10 "Using NOPs", which > Assembler::align for x86 seems to implement with some bells and > whistles. Agner agrees on using multi-byte nops (0F 1F ...) on modern > x86 chips as well; up to the point he claims 4 insn/clock throughput for > them. Sure. My apologies: I responded to the wrong person. My interest is about AArch64. Andrew. From tobias.hartmann at oracle.com Mon Jul 27 14:36:11 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 27 Jul 2015 16:36:11 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> Message-ID: <55B641DB.1010608@oracle.com> On 27.07.2015 13:31, Roland Westrelin wrote: >>> Here is the new webrev: >>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ >> >> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think) Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc. > Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller? I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64. I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it. Here is the new webrev: http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ Thanks, Tobias > > Roland. > From roland.westrelin at oracle.com Mon Jul 27 14:57:17 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 27 Jul 2015 16:57:17 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B641DB.1010608@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> Message-ID: <5B5E5666-647E-4B96-9254-07CB477DE6AD@oracle.com> > http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ That looks good to me. Roland. From vladimir.kozlov at oracle.com Mon Jul 27 16:38:04 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 27 Jul 2015 09:38:04 -0700 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55AF4D7C.1030706@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> <55AE62FE.4070502@oracle.com> <55AF4D7C.1030706@oracle.com> Message-ID: <55B65E6C.6050404@oracle.com> WB method isIntrinsicAvailableForMethod0 has parameter compilationContext. So I don't think you need is_intrinsic_available(methodHandle method) variant - it is not used from WB. You left is_intrinsic_available*_for* in comments in abstractCompiler.hpp Following the same naming logic I would suggest to remove "ForMethod" in WB new methods names. Missing 'virtual' in c2compiler.hpp Leftover line in c2compiler.cpp: + //return Compile::is_intrinsic_available(method, compilation_context, false); Indention is off - keep following lines at the same level as !compilation_context (one space after "("): + (!compilation_context.is_null() && + CompilerOracle::has_option_value(compilation_context, "DisableIntrinsic", disable_intr) && + strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL) Remove 'return false' because following InlineUnsafeOps check disable some intrinsics: + case vmIntrinsics::_Reference_get: + return false; + break; Also I would prefer vmIntrinsics::is_disabled_by_flags() was called from compiler's is_intrinsic_disabled_by_flag() flag. Additionally I think we should not have difference in behavior of is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should follow the same rules. If we disable intrinsic on command line - C1 should not intrinsify it. The only difference should be in supported intrinsics. If you think it is big change - file an other bug (it is bug, not rfe) to fix it separately. Thanks, Vladimir On 7/22/15 12:59 AM, Zolt?n Maj? wrote: > Hi John, > > > On 07/21/2015 09:02 PM, John Rose wrote: >> Yes, that will work, and I think it is cleaner than what we had >> before, as well as providing the new required functionality. >> >> Reviewed; please get a second reviewer. > > thank you for the review! I'll ask Vladimir K., maybe he has time to > look at the newest webrev. > >> >> ? John >> >> P.S. If the unit tests want to test (via the whitebox API) whether an >> intrinsic was compiled successfully, we might want to expose >> Compile::gather_intrinsic_statistics, etc. But not in this change set. > > That is an interesting idea. We'd also have to see if current tests > require such functionality or if SQE plans to add tests requiring that > functionality. > >> >> P.P.S. As I think I said before, I wish we had a way to consolidate >> the switch statements further (into vmSymbols.hpp). But I don't see a >> clean way to do it. > > Yes, that would be nice. I've not seen a good way to do that, partly > because inconsistencies between the way C1 and C2 depends on the value > of command-line flags. > > Thank you! > > Best regards, > > > Zoltan > >> >> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? > > wrote: >>> >>> Here is the newest webrev: >>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/ >>> >>> - >>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ >>> >> > From vladimir.kozlov at oracle.com Mon Jul 27 17:29:21 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 27 Jul 2015 10:29:21 -0700 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B641DB.1010608@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> Message-ID: <55B66A71.8050704@oracle.com> Use ciEnv()::current() instead of Compile::current()->env(). compile.cpp why special case for StressCodeBuffers? Otherwise looks good. Thanks, Vladimir On 7/27/15 7:36 AM, Tobias Hartmann wrote: > > On 27.07.2015 13:31, Roland Westrelin wrote: >>>> Here is the new webrev: >>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ >>> >>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think) > > Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc. > >> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller? > > I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64. > > I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it. > > Here is the new webrev: > http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ > > Thanks, > Tobias > >> >> Roland. >> From dean.long at oracle.com Mon Jul 27 17:44:09 2015 From: dean.long at oracle.com (Dean Long) Date: Mon, 27 Jul 2015 10:44:09 -0700 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B641DB.1010608@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> Message-ID: <55B66DE9.2070000@oracle.com> Looks good. I wish the bailout/record_failure could be done in start_a_stub and not in the callers, but we can clean that up as part of 8132354. dl On 7/27/2015 7:36 AM, Tobias Hartmann wrote: > Here is the new webrev: > http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ From tobias.hartmann at oracle.com Tue Jul 28 07:19:54 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 28 Jul 2015 09:19:54 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B66A71.8050704@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> <55B66A71.8050704@oracle.com> Message-ID: <55B72D1A.1030706@oracle.com> Thanks, Vladimir. On 27.07.2015 19:29, Vladimir Kozlov wrote: > Use ciEnv()::current() instead of Compile::current()->env(). Done. > compile.cpp why special case for StressCodeBuffers? Right, the special case is not necessary since StressCodeBuffers should not affect the scratch buffer. I removed it. New webrev: http://cr.openjdk.java.net/~thartmann/8130309/webrev.04/ Thanks, Tobias > Otherwise looks good. > > Thanks, > Vladimir > > On 7/27/15 7:36 AM, Tobias Hartmann wrote: >> >> On 27.07.2015 13:31, Roland Westrelin wrote: >>>>> Here is the new webrev: >>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ >>>> >>>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think) >> >> Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc. >> >>> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller? >> >> I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64. >> >> I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it. >> >> Here is the new webrev: >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ >> >> Thanks, >> Tobias >> >>> >>> Roland. >>> From tobias.hartmann at oracle.com Tue Jul 28 07:20:11 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 28 Jul 2015 09:20:11 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <5B5E5666-647E-4B96-9254-07CB477DE6AD@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> <5B5E5666-647E-4B96-9254-07CB477DE6AD@oracle.com> Message-ID: <55B72D2B.6080608@oracle.com> Thanks, Roland. Best, Tobias On 27.07.2015 16:57, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ > > That looks good to me. > > Roland. > From tobias.hartmann at oracle.com Tue Jul 28 07:20:24 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 28 Jul 2015 09:20:24 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B66DE9.2070000@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> <55B66DE9.2070000@oracle.com> Message-ID: <55B72D38.2040705@oracle.com> Thanks, Dean. Best, Tobias On 27.07.2015 19:44, Dean Long wrote: > Looks good. I wish the bailout/record_failure could be done in start_a_stub and not in the callers, but we can clean that up as part of 8132354. > > dl > > On 7/27/2015 7:36 AM, Tobias Hartmann wrote: >> Here is the new webrev: >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ > From michael.haupt at oracle.com Tue Jul 28 08:56:05 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Tue, 28 Jul 2015 10:56:05 +0200 Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method Message-ID: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> Dear all, please review and sponsor this change. RFE: https://bugs.openjdk.java.net/browse/JDK-8004073 Webrev: http://cr.openjdk.java.net/~mhaupt/8004073/webrev.00 This change extends the dumping facilities of the C2 IR Node hierarchy. Node::dump() is used in debugging sessions to print information about an IR node. The API is extended by these new entry points: * void Node::dump_comp() -> Dump the node in compact form. * void Node::dump_rel(), void Node::dump_rel_comp() -> Dump the node (in compact form) and all nodes related to it. Mark the current node in the output. The notion of "related" nodes is of course a property of the node itself, or rather, of its class. This is configured in this virtual method: * virtual void Node::rel(GrowableArray *in_rel, GrowableArray *out_rel, bool compact) -> Collect all related nodes. Store the incoming related nodes in the in_rel array, and the outgoing related nodes in the out_rel array. In case compact representation is desired, possibly collect less nodes. This method must be overridden by all subclasses of Node that, in their notion of what related nodes are, divert from the default behaviour as specified in the implementation of Node::rel() in the Node class itself. The default is to collect all inputs and outputs till depth 1, including both data and control nodes, ignoring compactness. There are several auxiliary methods. Node collection is chiefly facilitated by this method: * void Node::collect_nodes(GrowableArray *ns, int d, bool ctrl, bool data) -> Collect nodes till depth d (positive: inputs, negative: outputs), including *only* control or data nodes (this is controlled by the two bool arguments, and setting both to true is nonsensical). Furthermore, there exist pre-defined collectors for common cases: * void Node::collect_nodes_in_all_data(GrowableArray *ns, bool ctrl) -> Collect the entire data input graph. Include control nodes only if requested. * void Node::collect_nodes_in_all_ctrl(GrowableArray *ns, bool data) -> Collect the entire control input graph. Include data nodes only if requested. * void Node::collect_nodes_out_all_ctrl_boundary(GrowableArray *ns) -> Collect all output nodes, stopping at control nodes, including these. * void Node::collect_nodes_in_data_out_1(GrowableArray *is, GrowableArray *os, bool compact) -> Collect the entire data input graph, and outputs till depth 1. Regarding compact dumping, subclasses of Node should override this virtual method: * virtual void dump_comp_spec(outputStream *st) -> Dump the specifics of a node in compact form. This method is supposed to operate in the fashion of Node::dump_spec(). The default behaviour for compact dumping is to dump a node's name and index. Specific notions of "related" have been added to the following node classes: * AbsNode and subclasses * AddNode and subclasses * AddPNode * AtanDNode * BinaryNode * BoolNode * CosDNode * CountBitsNode and subclasses * Div{D,F,I,L}Node * ExpDNode * GotoNode * HaltNode * Log{10D,D}Node * LShift{I,L}Node * Mod{D,F,I,L}Node * MulHiLNode * Mul{D,F,I,L}Node and subclasses * DivModNode and subclasses * IfNode * JumpNode * SafePointNode and subclasses (may require more detail) * StartNode and subclass * NegNode and subclasses * PowDNode * IfProjNode and subclasses * JProjNode and subclass * ParmNode * ReductionNode and subclasses * Round{Double,Float}Node * RShift{I,L}Node * SqrtDNode * SubNode and subclasses * TanDNode * AddV{B,D,F,I,L,S,_}Node * DivV{D,F}Node * LShiftV{B,I,L,S}Node * MulV{D,F,I,S}Node * OrVNode * RShiftV{B,I,L,S}Node * SubV{B,D,F,I,L,S}Node * URShiftV{B,I,L,S}Node * XorVNode * URShift{I,L}Node Here is a sample session in LLDB, showing the different dumps for an IfNode: * thread #28: tid = 0x10d1ce3, 0x000000010353be17 libjvm.dylib`IfNode::Ideal(this=0x0000000104888760, phase=0x000000011cf62368, can_reshape=) + 77 at ifnode.cpp:1297, name = 'Java: C2 CompilerThread0', stop reason = breakpoint 1.1 frame #0: 0x000000010353be17 libjvm.dylib`IfNode::Ideal(this=0x0000000104888760, phase=0x000000011cf62368, can_reshape=) + 77 at ifnode.cpp:1297 1294 if (remove_dead_region(phase, can_reshape)) return this; 1295 // No Def-Use info? 1296 if (!can_reshape) return NULL; -> 1297 PhaseIterGVN *igvn = phase->is_IterGVN(); 1298 1299 // Don't bother trying to transform a dead if 1300 if (in(0)->is_top()) return NULL; (lldb) expr -- this->dump() 82 If === 61 79 [[ 83 84 ]] P=0.999999, C=-1.000000 !jvms: String::charAt @ bci:27 (lldb) expr -- this->dump_comp() If(82)P=0.999999, C=-1.000000 (lldb) expr -- this->dump_rel() 10 Parm === 3 [[ 38 38 65 32 ]] Parm0: java/lang/String:NotNull:exact * Oop:java/lang/String:NotNull:exact * !jvms: String::charAt @ bci:-1 38 AddP === _ 10 10 37 [[ 39 ]] Oop:java/lang/String:NotNull:exact+12 * [narrow] !jvms: String::charAt @ bci:6 39 LoadN === _ 7 38 [[ 40 ]] @java/lang/String:exact+12 * [narrow], name=value, idx=4; #narrowoop: char[int:>=0]:exact * !jvms: String::charAt @ bci:6 40 DecodeN === _ 39 [[ 55 42 ]] #char[int:>=0]:exact * !jvms: String::charAt @ bci:6 37 ConL === 0 [[ 38 56 ]] #long:12 55 CastPP === 47 40 [[ 56 56 97 97 96 86 ]] #char[int:>=0]:NotNull:exact * !jvms: String::charAt @ bci:9 56 AddP === _ 55 55 37 [[ 57 ]] !jvms: String::charAt @ bci:9 7 Parm === 3 [[ 99 98 86 65 57 32 50 39 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: String::charAt @ bci:-1 57 LoadRange === _ 7 56 [[ 65 58 78 ]] @bottom[int:>=0]+12 * [narrow], idx=5; #int:>=0 !jvms: String::charAt @ bci:9 11 Parm === 3 [[ 78 93 65 32 24 32 65 58 86 ]] Parm1: int !jvms: String::charAt @ bci:-1 78 CmpU === _ 11 57 [[ 79 ]] !jvms: String::charAt @ bci:27 79 Bool === _ 78 [[ 82 ]] [lt] !jvms: String::charAt @ bci:27 82 > If === 61 79 [[ 83 84 ]] P=0.999999, C=-1.000000 !jvms: String::charAt @ bci:27 83 IfTrue === 82 [[ 99 98 ]] #1 !jvms: String::charAt @ bci:27 84 IfFalse === 82 [[ 86 ]] #0 !jvms: String::charAt @ bci:27 99 Return === 83 6 7 8 9 returns 98 [[ 0 ]] 98 LoadUS === 83 7 96 [[ 99 ]] @char[int:>=0]:exact+any *, idx=6; #char !jvms: String::charAt @ bci:27 86 CallStaticJava === 84 6 7 8 9 ( 85 1 1 55 11 ) [[ 87 ]] # Static uncommon_trap(reason='range_check' action='make_not_entrant') void ( int ) C=0.000100 String::charAt @ bci:27 !jvms: String::charAt @ bci:27 (lldb) expr -- this->dump_rel_comp() If(82)P=0.999999, C=-1.000000 Bool(79)[lt] CmpU(78) Parm(11)1:int LoadRange(57) @bottom[int:>=0]+12 * [narrow], idx=5; #int:>=0 IfTrue(83)[99][98]#1 IfFalse(84)[86]#0 Return(99) LoadUS(98) @char[int:>=0]:exact+any *, idx=6; #char CallStaticJava(86)uncommon_trap Best, Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | LangTools Team | Nashorn Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Tue Jul 28 09:35:14 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 28 Jul 2015 10:35:14 +0100 Subject: [aarch64-port-dev ] RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64 In-Reply-To: <55AE5819.3070506@oracle.com> References: <1437470315.1575.9.camel@mylittlepony.linaroharston> <1437488872.6057.3.camel@mylittlepony.linaroharston> <55AE5819.3070506@oracle.com> Message-ID: <55B74CD2.8080104@redhat.com> On 21/07/15 15:32, Zolt?n Maj? wrote: > the fix looks good to me (I'm not a *R*eviewer). And to me, too. Official reviewer, please. Thanks, Andrew. From aph at redhat.com Tue Jul 28 09:36:36 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 28 Jul 2015 10:36:36 +0100 Subject: [aarch64-port-dev ] RFR: 8131062: aarch64: add support for GHASH acceleration In-Reply-To: <1437491895.6739.17.camel@mylittlepony.linaroharston> References: <1437491895.6739.17.camel@mylittlepony.linaroharston> Message-ID: <55B74D24.7060508@redhat.com> On 21/07/15 16:18, Edward Nevill wrote: > http://cr.openjdk.java.net/~enevill/8131062/webrev.0/ > > adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions. Looks good to me, thanks. Official reviewer, please. Andrew. From zoltan.majo at oracle.com Tue Jul 28 10:17:00 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 28 Jul 2015 12:17:00 +0200 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55B65E6C.6050404@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> <55AE62FE.4070502@oracle.com> <55AF4D7C.1030706@oracle.com> <55B65E6C.6050404@oracle.com> Message-ID: <55B7569C.60504@oracle.com> Hi Vladimir, thank you for the feedback! On 07/27/2015 06:38 PM, Vladimir Kozlov wrote: > WB method isIntrinsicAvailableForMethod0 has parameter > compilationContext. So I don't think you need > is_intrinsic_available(methodHandle method) variant - it is not used > from WB. You're right, I removed that variant. > You left is_intrinsic_available*_for* in comments in abstractCompiler.hpp Updated. I also updated the text in the comments at other places so that they are more accurate than before. > > Following the same naming logic I would suggest to remove "ForMethod" > in WB new methods names. Updated the method names as well. > > Missing 'virtual' in c2compiler.hpp Updated as well. > > Leftover line in c2compiler.cpp: > > + //return Compile::is_intrinsic_available(method, > compilation_context, false); Thank you for spotting that, removed it. > > Indention is off - keep following lines at the same level as > !compilation_context (one space after "("): > > + (!compilation_context.is_null() && > + CompilerOracle::has_option_value(compilation_context, > "DisableIntrinsic", disable_intr) && > + strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL) Updated the indentation. > > Remove 'return false' because following InlineUnsafeOps check disable > some intrinsics: > > + case vmIntrinsics::_Reference_get: > + return false; > + break; Oh, I missed that! Updated. > > Also I would prefer vmIntrinsics::is_disabled_by_flags() was called > from compiler's is_intrinsic_disabled_by_flag() flag. I moved the call to the compiler-specific is_intrinsic_disabled_by_flag methods. > > Additionally I think we should not have difference in behavior of > is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should > follow the same rules. If we disable intrinsic on command line - C1 > should not intrinsify it. The only difference should be in supported > intrinsics. If you think it is big change - file an other bug (it is > bug, not rfe) to fix it separately. Yes, I also think it makes sense that flags behave the same way for all compilers. But I would like to keep that as a separate issue, partly because I haven't figured out all details yet for implementing a fix and partly because the current issue is related to some "critical" nightly failures we have. So I filed JDK-8132457: "Unify command-line flags controlling the usage of compiler intrinsics" for addressing inconsistencies in processing intrinsic-related command-line flags. I would like to push the newest webrev (if you are fine with it) and then continue with JDK-8132457. I hope that is OK. Here is the newest webrev: - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.04/ - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.04/ Testing: - JPRT (testset "hotspot" + tests in compiler/intrinsics/mathexact); all tests pass. Thank you and best regards, Zoltan > > Thanks, > Vladimir > > On 7/22/15 12:59 AM, Zolt?n Maj? wrote: >> Hi John, >> >> >> On 07/21/2015 09:02 PM, John Rose wrote: >>> Yes, that will work, and I think it is cleaner than what we had >>> before, as well as providing the new required functionality. >>> >>> Reviewed; please get a second reviewer. >> >> thank you for the review! I'll ask Vladimir K., maybe he has time to >> look at the newest webrev. >> >>> >>> ? John >>> >>> P.S. If the unit tests want to test (via the whitebox API) whether an >>> intrinsic was compiled successfully, we might want to expose >>> Compile::gather_intrinsic_statistics, etc. But not in this change set. >> >> That is an interesting idea. We'd also have to see if current tests >> require such functionality or if SQE plans to add tests requiring that >> functionality. >> >>> >>> P.P.S. As I think I said before, I wish we had a way to consolidate >>> the switch statements further (into vmSymbols.hpp). But I don't see a >>> clean way to do it. >> >> Yes, that would be nice. I've not seen a good way to do that, partly >> because inconsistencies between the way C1 and C2 depends on the value >> of command-line flags. >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? >> > wrote: >>>> >>>> Here is the newest webrev: >>>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/ >>>> >>>> - >>>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ >>>> >>> >> From goetz.lindenmaier at sap.com Tue Jul 28 13:03:35 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 28 Jul 2015 13:03:35 +0000 Subject: [aarch64-port-dev ] RFR: 8131062: aarch64: add support for GHASH acceleration In-Reply-To: <55B74D24.7060508@redhat.com> References: <1437491895.6739.17.camel@mylittlepony.linaroharston> <55B74D24.7060508@redhat.com> Message-ID: <4295855A5C1DE049A61835A1887419CC2D014F39@DEWDFEMB12A.global.corp.sap> Hi Edward, the change looks good! Reviewed. Best regards, Goetz. -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 28. Juli 2015 11:37 To: edward.nevill at gmail.com; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR: 8131062: aarch64: add support for GHASH acceleration On 21/07/15 16:18, Edward Nevill wrote: > http://cr.openjdk.java.net/~enevill/8131062/webrev.0/ > > adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions. Looks good to me, thanks. Official reviewer, please. Andrew. From goetz.lindenmaier at sap.com Tue Jul 28 13:05:29 2015 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 28 Jul 2015 13:05:29 +0000 Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64 In-Reply-To: <1437488872.6057.3.camel@mylittlepony.linaroharston> References: <1437470315.1575.9.camel@mylittlepony.linaroharston> <1437488872.6057.3.camel@mylittlepony.linaroharston> Message-ID: <4295855A5C1DE049A61835A1887419CC2D014F4F@DEWDFEMB12A.global.corp.sap> Hi Edward, webrev.01 looks good. Reviewed. Best regards, Goetz. -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Edward Nevill Sent: Dienstag, 21. Juli 2015 16:28 To: Alexeev, Alexander Cc: hotspot compiler; aarch64-port-dev at openjdk.java.net Subject: Re: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64 On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote: > On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote: > > > Please review provided patch and sponsor if approved. > > Problem: SHA flags verification code checks condition for > > UseSHA256Intrinsics, but corrects UseSHA1Intrinsics. > > The patch: > > http://cr.openjdk.java.net/~aalexeev/1/webrev.00/ > > Hi Alexander, > > Thanks for fixing this. I will sponsor this patch. > > Here is the changeset. > > http://cr.openjdk.java.net/~enevill/8132010/webrev Please disregard the above webrev. I had outstanding outgoing changes. Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp http://cr.openjdk.java.net/~enevill/8132010/webrev.01 Sorry for the confusion, working on too many changesets at once. Ed. > > I have tested this before and after with hotspot jtreg > > Before: Test results: passed: 876; failed: 3; error: 7 > After: Test results: passed: 877; failed: 2; error: 7 > > The 1 test fixed is the test > > compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java > > This regression was introduced in the following changeset > > http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 > > Could I have an official reviewer for this please. As this is a trivial > 1 liner I think one reviewer should be sufficient. > > All the best, > Ed. > > From vladimir.kozlov at oracle.com Tue Jul 28 13:22:17 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2015 06:22:17 -0700 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B72D1A.1030706@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> <55B66A71.8050704@oracle.com> <55B72D1A.1030706@oracle.com> Message-ID: <55B78209.6050604@oracle.com> Looks good. Thanks, Vladimir On 7/28/15 12:19 AM, Tobias Hartmann wrote: > Thanks, Vladimir. > > On 27.07.2015 19:29, Vladimir Kozlov wrote: >> Use ciEnv()::current() instead of Compile::current()->env(). > > Done. > >> compile.cpp why special case for StressCodeBuffers? > > Right, the special case is not necessary since StressCodeBuffers should not affect the scratch buffer. I removed it. > > New webrev: > http://cr.openjdk.java.net/~thartmann/8130309/webrev.04/ > > Thanks, > Tobias > >> Otherwise looks good. >> >> Thanks, >> Vladimir >> >> On 7/27/15 7:36 AM, Tobias Hartmann wrote: >>> >>> On 27.07.2015 13:31, Roland Westrelin wrote: >>>>>> Here is the new webrev: >>>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ >>>>> >>>>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think) >>> >>> Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc. >>> >>>> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller? >>> >>> I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64. >>> >>> I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it. >>> >>> Here is the new webrev: >>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ >>> >>> Thanks, >>> Tobias >>> >>>> >>>> Roland. >>>> From roland.westrelin at oracle.com Tue Jul 28 13:26:16 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 28 Jul 2015 15:26:16 +0200 Subject: RFR(S): 8130858: CICompilerCount=1 when tiered is off is not allowed any more In-Reply-To: References: <2E1EC0B4-DC75-4208-9F2D-49F7D5C2929D@oracle.com> <559E9CA1.2050505@oracle.com> Message-ID: <75027CE4-5A5F-46AA-9B3A-513D7F6EA597@oracle.com> For the record, I asked Gerard to take a look at that change and he recommended I move the code to the commandLineFlagConstraintsCompiler.* files. Here is what I?m pushing: http://cr.openjdk.java.net/~roland/8130858/webrev.01/ Roland. From vladimir.kozlov at oracle.com Tue Jul 28 13:27:26 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2015 06:27:26 -0700 Subject: RFR: 8132010: aarch64: Typo in SHA intrinsics flags handling code for aarch64 In-Reply-To: <1437488872.6057.3.camel@mylittlepony.linaroharston> References: <1437470315.1575.9.camel@mylittlepony.linaroharston> <1437488872.6057.3.camel@mylittlepony.linaroharston> Message-ID: <55B7833E.8010209@oracle.com> Looks good. Thanks, Vladimir On 7/21/15 7:27 AM, Edward Nevill wrote: > On Tue, 2015-07-21 at 10:18 +0100, Edward Nevill wrote: >> On Mon, 2015-07-20 at 14:38 +0000, Alexeev, Alexander wrote: >> >>> Please review provided patch and sponsor if approved. >>> Problem: SHA flags verification code checks condition for >>> UseSHA256Intrinsics, but corrects UseSHA1Intrinsics. >>> The patch: >>> http://cr.openjdk.java.net/~aalexeev/1/webrev.00/ >> >> Hi Alexander, >> >> Thanks for fixing this. I will sponsor this patch. >> >> Here is the changeset. >> >> http://cr.openjdk.java.net/~enevill/8132010/webrev > > Please disregard the above webrev. I had outstanding outgoing changes. > > Here is the corrected changeset. It is just a single line change in vm_version_aarch64.cpp > > http://cr.openjdk.java.net/~enevill/8132010/webrev.01 > > Sorry for the confusion, working on too many changesets at once. > > Ed. > >> >> I have tested this before and after with hotspot jtreg >> >> Before: Test results: passed: 876; failed: 3; error: 7 >> After: Test results: passed: 877; failed: 2; error: 7 >> >> The 1 test fixed is the test >> >> compiler/intrinsics/sha/cli/TestUseSHA256IntrinsicsOptionOnSupportedCPU.java >> >> This regression was introduced in the following changeset >> >> http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/cd16fcb838d2 >> >> Could I have an official reviewer for this please. As this is a trivial >> 1 liner I think one reviewer should be sufficient. >> >> All the best, >> Ed. >> >> > > From vladimir.kozlov at oracle.com Tue Jul 28 13:30:56 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2015 06:30:56 -0700 Subject: RFR: 8131062: aarch64: add support for GHASH acceleration In-Reply-To: <1437491895.6739.17.camel@mylittlepony.linaroharston> References: <1437491895.6739.17.camel@mylittlepony.linaroharston> Message-ID: <55B78410.1050108@oracle.com> Looks good to me. Thanks, Vladimir On 7/21/15 8:18 AM, Edward Nevill wrote: > Hi, > > http://cr.openjdk.java.net/~enevill/8131062/webrev.0/ > > adds support for GHASH acceleration on aarch64 using the 128 bit pmull and pmull2 instructions. > > This patch was contributed by alexander.alexeev at caviumnetworks.com > > Note that the 128 pmull instructions are not supported on all aarch64. The patch uses the HWCAP_PMULL bit from getauxv() to determine whether the 128 bit pmull is supported. > > I have tested this with jtreg / hotspot. > > Without patch: Test results: passed: 876; failed: 3; error: 9 > With patch: Test results: passed: 876; failed: 3; error: 9 > > In both cases the set of failing/error tests is identical. > > I have done some performance testing using TestAESMain from the jtreg/hotspot test suite. Here are the results I get:- > > java -XX:-UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain > > encode time = 66945.63635, decode time = 34085.08754 > > java -XX:+UseGHASHIntrinsics -DcheckOutput=true -Dmode=GCM TestAESMain > > encode time = 43469.38244, decode time = 17783.6603 > > This is an improvement of 54% and 92% respectively. > > Alexander has done some benchmarking to measure the raw performance improvement of GHASH on its own using the following benchmark. > > http://cr.openjdk.java.net/~enevill/8131062/GHash.java > > Here are the results he gets:- > > -XX:-UseGHASHIntrinsics. > > Benchmark Mode Cnt Score Error Units > GHash.calculateGHash avgt 5 118.688 ? 0.009 us/op > > -XX:+UseGHASHIntrinsics > Benchmark Mode Cnt Score Error Units > GHash.calculateGHash avgt 5 21.164 ? 1.763 us/op > > This represents a 5.6X speed increase on the raw GHASH performance. > > Thanks your your review, > > Ed. > > From roland.westrelin at oracle.com Tue Jul 28 14:05:40 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 28 Jul 2015 16:05:40 +0200 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis Message-ID: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. Roland. From tobias.hartmann at oracle.com Tue Jul 28 14:08:38 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 28 Jul 2015 16:08:38 +0200 Subject: [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B78209.6050604@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> <55B66A71.8050704@oracle.com> <55B72D1A.1030706@oracle.com> <55B78209.6050604@oracle.com> Message-ID: <55B78CE6.1050302@oracle.com> Thanks, Vladimir. Best, Tobias On 28.07.2015 15:22, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/28/15 12:19 AM, Tobias Hartmann wrote: >> Thanks, Vladimir. >> >> On 27.07.2015 19:29, Vladimir Kozlov wrote: >>> Use ciEnv()::current() instead of Compile::current()->env(). >> >> Done. >> >>> compile.cpp why special case for StressCodeBuffers? >> >> Right, the special case is not necessary since StressCodeBuffers should not affect the scratch buffer. I removed it. >> >> New webrev: >> http://cr.openjdk.java.net/~thartmann/8130309/webrev.04/ >> >> Thanks, >> Tobias >> >>> Otherwise looks good. >>> >>> Thanks, >>> Vladimir >>> >>> On 7/27/15 7:36 AM, Tobias Hartmann wrote: >>>> >>>> On 27.07.2015 13:31, Roland Westrelin wrote: >>>>>>> Here is the new webrev: >>>>>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.02/ >>>>>> >>>>>> CompiledStaticCall::emit_to_interp_stub() is compiler independent code. Shouldn?t the call be ciEnv::current()->record_failure() even if the method is only called from c2 (for now?)? (which is what Dean suggested as well I think) >>>> >>>> Right, I missed that and got confused because the method is guarded by "#ifdef COMPILER2" on Sparc. >>>> >>>>> Actually, why not have emit_to_interp_stub() returns an error and bail out from compilation in the caller? >>>> >>>> I changed 'emit_to_interpr_stub()' accordingly and now bail out from the caller if it fails. I also had to adapt the ppc code and the 'emit_trampoline_stub()' method on aarch64. >>>> >>>> I left the special handling of -XX:TraceJumps in the Sparc code since JDK-8132344 will fix it. >>>> >>>> Here is the new webrev: >>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ >>>> >>>> Thanks, >>>> Tobias >>>> >>>>> >>>>> Roland. >>>>> From vladimir.kozlov at oracle.com Tue Jul 28 14:12:42 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2015 07:12:42 -0700 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55B7569C.60504@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> <55AE62FE.4070502@oracle.com> <55AF4D7C.1030706@oracle.com> <55B65E6C.6050404@oracle.com> <55B7569C.60504@oracle.com> Message-ID: <55B78DDA.1010705@oracle.com> This looks good. Thanks, Vladimir On 7/28/15 3:17 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > thank you for the feedback! > > On 07/27/2015 06:38 PM, Vladimir Kozlov wrote: >> WB method isIntrinsicAvailableForMethod0 has parameter >> compilationContext. So I don't think you need >> is_intrinsic_available(methodHandle method) variant - it is not used >> from WB. > > You're right, I removed that variant. > >> You left is_intrinsic_available*_for* in comments in abstractCompiler.hpp > > Updated. I also updated the text in the comments at other places so that > they are more accurate than before. > >> >> Following the same naming logic I would suggest to remove "ForMethod" >> in WB new methods names. > > Updated the method names as well. > >> >> Missing 'virtual' in c2compiler.hpp > > Updated as well. > >> >> Leftover line in c2compiler.cpp: >> >> + //return Compile::is_intrinsic_available(method, >> compilation_context, false); > > Thank you for spotting that, removed it. > >> >> Indention is off - keep following lines at the same level as >> !compilation_context (one space after "("): >> >> + (!compilation_context.is_null() && >> + CompilerOracle::has_option_value(compilation_context, >> "DisableIntrinsic", disable_intr) && >> + strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL) > > Updated the indentation. > >> >> Remove 'return false' because following InlineUnsafeOps check disable >> some intrinsics: >> >> + case vmIntrinsics::_Reference_get: >> + return false; >> + break; > > Oh, I missed that! Updated. > >> >> Also I would prefer vmIntrinsics::is_disabled_by_flags() was called >> from compiler's is_intrinsic_disabled_by_flag() flag. > > I moved the call to the compiler-specific is_intrinsic_disabled_by_flag > methods. > >> >> Additionally I think we should not have difference in behavior of >> is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should >> follow the same rules. If we disable intrinsic on command line - C1 >> should not intrinsify it. The only difference should be in supported >> intrinsics. If you think it is big change - file an other bug (it is >> bug, not rfe) to fix it separately. > > Yes, I also think it makes sense that flags behave the same way for all > compilers. But I would like to keep that as a separate issue, partly > because I haven't figured out all details yet for implementing a fix and > partly because the current issue is related to some "critical" nightly > failures we have. > > So I filed JDK-8132457: "Unify command-line flags controlling the usage > of compiler intrinsics" for addressing inconsistencies in processing > intrinsic-related command-line flags. I would like to push the newest > webrev (if you are fine with it) and then continue with JDK-8132457. I > hope that is OK. > > Here is the newest webrev: > - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.04/ > - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.04/ > > Testing: > - JPRT (testset "hotspot" + tests in compiler/intrinsics/mathexact); all > tests pass. > > Thank you and best regards, > > > Zoltan > > >> >> Thanks, >> Vladimir >> >> On 7/22/15 12:59 AM, Zolt?n Maj? wrote: >>> Hi John, >>> >>> >>> On 07/21/2015 09:02 PM, John Rose wrote: >>>> Yes, that will work, and I think it is cleaner than what we had >>>> before, as well as providing the new required functionality. >>>> >>>> Reviewed; please get a second reviewer. >>> >>> thank you for the review! I'll ask Vladimir K., maybe he has time to >>> look at the newest webrev. >>> >>>> >>>> ? John >>>> >>>> P.S. If the unit tests want to test (via the whitebox API) whether an >>>> intrinsic was compiled successfully, we might want to expose >>>> Compile::gather_intrinsic_statistics, etc. But not in this change set. >>> >>> That is an interesting idea. We'd also have to see if current tests >>> require such functionality or if SQE plans to add tests requiring that >>> functionality. >>> >>>> >>>> P.P.S. As I think I said before, I wish we had a way to consolidate >>>> the switch statements further (into vmSymbols.hpp). But I don't see a >>>> clean way to do it. >>> >>> Yes, that would be nice. I've not seen a good way to do that, partly >>> because inconsistencies between the way C1 and C2 depends on the value >>> of command-line flags. >>> >>> Thank you! >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>>> >>>> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? >>> > wrote: >>>>> >>>>> Here is the newest webrev: >>>>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/ >>>>> >>>>> - >>>>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ >>>>> >>>> >>> > From zoltan.majo at oracle.com Tue Jul 28 15:05:30 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 28 Jul 2015 17:05:30 +0200 Subject: [9] RFR(M): 8130832: Extend the WhiteBox API to provide information about the availability of compiler intrinsics In-Reply-To: <55B78DDA.1010705@oracle.com> References: <55A3EBF8.2020208@oracle.com> <55A484F7.4010705@oracle.com> <55A53862.7040308@oracle.com> <55A542D0.50100@oracle.com> <55A6A340.3050204@oracle.com> <55A6AA2B.5030005@oracle.com> <7CC60DCE-330D-4C1B-81FB-C2C3B4DFDEA7@oracle.com> <55AE62FE.4070502@oracle.com> <55AF4D7C.1030706@oracle.com> <55B65E6C.6050404@oracle.com> <55B7569C.60504@oracle.com> <55B78DDA.1010705@oracle.com> Message-ID: <55B79A3A.4050303@oracle.com> Thank you, Vladimir, for the review! Best regards, Zoltan On 07/28/2015 04:12 PM, Vladimir Kozlov wrote: > This looks good. > > Thanks, > Vladimir > > On 7/28/15 3:17 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! >> >> On 07/27/2015 06:38 PM, Vladimir Kozlov wrote: >>> WB method isIntrinsicAvailableForMethod0 has parameter >>> compilationContext. So I don't think you need >>> is_intrinsic_available(methodHandle method) variant - it is not used >>> from WB. >> >> You're right, I removed that variant. >> >>> You left is_intrinsic_available*_for* in comments in >>> abstractCompiler.hpp >> >> Updated. I also updated the text in the comments at other places so that >> they are more accurate than before. >> >>> >>> Following the same naming logic I would suggest to remove "ForMethod" >>> in WB new methods names. >> >> Updated the method names as well. >> >>> >>> Missing 'virtual' in c2compiler.hpp >> >> Updated as well. >> >>> >>> Leftover line in c2compiler.cpp: >>> >>> + //return Compile::is_intrinsic_available(method, >>> compilation_context, false); >> >> Thank you for spotting that, removed it. >> >>> >>> Indention is off - keep following lines at the same level as >>> !compilation_context (one space after "("): >>> >>> + (!compilation_context.is_null() && >>> + CompilerOracle::has_option_value(compilation_context, >>> "DisableIntrinsic", disable_intr) && >>> + strstr(disable_intr, vmIntrinsics::name_at(id)) != NULL) >> >> Updated the indentation. >> >>> >>> Remove 'return false' because following InlineUnsafeOps check disable >>> some intrinsics: >>> >>> + case vmIntrinsics::_Reference_get: >>> + return false; >>> + break; >> >> Oh, I missed that! Updated. >> >>> >>> Also I would prefer vmIntrinsics::is_disabled_by_flags() was called >>> from compiler's is_intrinsic_disabled_by_flag() flag. >> >> I moved the call to the compiler-specific is_intrinsic_disabled_by_flag >> methods. >> >>> >>> Additionally I think we should not have difference in behavior of >>> is_intrinsic_disabled_by_flag() in C1 and C2. Both compilers should >>> follow the same rules. If we disable intrinsic on command line - C1 >>> should not intrinsify it. The only difference should be in supported >>> intrinsics. If you think it is big change - file an other bug (it is >>> bug, not rfe) to fix it separately. >> >> Yes, I also think it makes sense that flags behave the same way for all >> compilers. But I would like to keep that as a separate issue, partly >> because I haven't figured out all details yet for implementing a fix and >> partly because the current issue is related to some "critical" nightly >> failures we have. >> >> So I filed JDK-8132457: "Unify command-line flags controlling the usage >> of compiler intrinsics" for addressing inconsistencies in processing >> intrinsic-related command-line flags. I would like to push the newest >> webrev (if you are fine with it) and then continue with JDK-8132457. I >> hope that is OK. >> >> Here is the newest webrev: >> - top: http://cr.openjdk.java.net/~zmajo/8130832/top/webrev.04/ >> - hotspot: http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.04/ >> >> Testing: >> - JPRT (testset "hotspot" + tests in compiler/intrinsics/mathexact); all >> tests pass. >> >> Thank you and best regards, >> >> >> Zoltan >> >> >>> >>> Thanks, >>> Vladimir >>> >>> On 7/22/15 12:59 AM, Zolt?n Maj? wrote: >>>> Hi John, >>>> >>>> >>>> On 07/21/2015 09:02 PM, John Rose wrote: >>>>> Yes, that will work, and I think it is cleaner than what we had >>>>> before, as well as providing the new required functionality. >>>>> >>>>> Reviewed; please get a second reviewer. >>>> >>>> thank you for the review! I'll ask Vladimir K., maybe he has time to >>>> look at the newest webrev. >>>> >>>>> >>>>> ? John >>>>> >>>>> P.S. If the unit tests want to test (via the whitebox API) whether an >>>>> intrinsic was compiled successfully, we might want to expose >>>>> Compile::gather_intrinsic_statistics, etc. But not in this change >>>>> set. >>>> >>>> That is an interesting idea. We'd also have to see if current tests >>>> require such functionality or if SQE plans to add tests requiring that >>>> functionality. >>>> >>>>> >>>>> P.P.S. As I think I said before, I wish we had a way to consolidate >>>>> the switch statements further (into vmSymbols.hpp). But I don't >>>>> see a >>>>> clean way to do it. >>>> >>>> Yes, that would be nice. I've not seen a good way to do that, partly >>>> because inconsistencies between the way C1 and C2 depends on the value >>>> of command-line flags. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >>>>> >>>>> On Jul 21, 2015, at 8:19 AM, Zolt?n Maj? >>>> > wrote: >>>>>> >>>>>> Here is the newest webrev: >>>>>> - top:http://cr.openjdk.java.net/~zmajo/8130832/top/ >>>>>> >>>>>> - >>>>>> hotspot:http://cr.openjdk.java.net/~zmajo/8130832/hotspot/webrev.03/ >>>>>> >>>>> >>>> >> From vladimir.kozlov at oracle.com Tue Jul 28 16:29:50 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2015 09:29:50 -0700 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> Message-ID: <55B7ADFE.3060109@oracle.com> The next change puzzles me: - if (!call->may_modify(tinst, phase)) { + if (call->may_modify(tinst, phase)) { - mem = call->in(TypeFunc::Memory); + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); Why only ArrayCopy? I think it is most of calls. What set of tests you ran? Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). Add explicit check: && strcmp(_name, "unsafe_arraycopy") != 0) Thanks, Vladimir On 7/28/15 7:05 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8130847/webrev.00/ > > When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. > > I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. > > Roland. > From vladimir.kozlov at oracle.com Tue Jul 28 17:24:30 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2015 10:24:30 -0700 Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method In-Reply-To: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> Message-ID: <55B7BACE.9080100@oracle.com> Some first observations. Before looking to webrev, can you use whole word Node::related(), dump_related(), dump_related_compact(), dump_compact()? "comp" could be confused for "compiled". It is more typing in debugger but it is more clear. Also from this->dump_rel() in your example I see that you dump a lot more input nodes than I expect (only up to inputs of CmpU node). But this->dump_rel_comp() produces correct set of nodes. It would be nice if you can avoid using macro: +#ifndef PRODUCT + REL_IN_DATA_OUT_1; +#endif "Arithmetic nodes" are most common data nodes (vs control nodes this->is_CFG() == true). May be instead specialized rel() method you can use some flags checks in Node::rel() method. Thanks, Vladimir On 7/28/15 1:56 AM, Michael Haupt wrote: > Dear all, > > please review and sponsor this change. > RFE: https://bugs.openjdk.java.net/browse/JDK-8004073 > Webrev: http://cr.openjdk.java.net/~mhaupt/8004073/webrev.00 > > This change extends the dumping facilities of the C2 IR Node hierarchy. > Node::dump() is used in debugging sessions to print information about an > IR node. The API is extended by these new entry points: > > * void Node::dump_comp() > -> Dump the node in compact form. > > * void Node::dump_rel(), void Node::dump_rel_comp() > -> Dump the node (in compact form) and all nodes related to it. Mark the > current node in the output. > > The notion of "related" nodes is of course a property of the node > itself, or rather, of its class. This is configured in this virtual method: > > * virtual void Node::rel(GrowableArray *in_rel, > GrowableArray *out_rel, bool compact) > -> Collect all related nodes. Store the incoming related nodes in the > in_rel > array, and the outgoing related nodes in the out_rel array. In > case compact > representation is desired, possibly collect less nodes. > > This method must be overridden by all subclasses of Node that, in their > notion of what related nodes are, divert from the default behaviour as > specified in the implementation of Node::rel() in the Node class itself. > The default is to collect all inputs and outputs till depth 1, including > both data and control nodes, ignoring compactness. > > There are several auxiliary methods. Node collection is chiefly > facilitated by this method: > > * void Node::collect_nodes(GrowableArray *ns, int d, bool ctrl, > bool data) > -> Collect nodes till depth d (positive: inputs, negative: outputs), > including > *only* control or data nodes (this is controlled by the two bool > arguments, > and setting both to true is nonsensical). > > Furthermore, there exist pre-defined collectors for common cases: > > * void Node::collect_nodes_in_all_data(GrowableArray *ns, bool ctrl) > -> Collect the entire data input graph. Include control nodes only if > requested. > > * void Node::collect_nodes_in_all_ctrl(GrowableArray *ns, bool data) > -> Collect the entire control input graph. Include data nodes only if > requested. > > * void Node::collect_nodes_out_all_ctrl_boundary(GrowableArray *ns) > -> Collect all output nodes, stopping at control nodes, including these. > > * void Node::collect_nodes_in_data_out_1(GrowableArray *is, > GrowableArray *os, bool compact) > -> Collect the entire data input graph, and outputs till depth 1. > > Regarding compact dumping, subclasses of Node should override this > virtual method: > > * virtual void dump_comp_spec(outputStream *st) > -> Dump the specifics of a node in compact form. This method is > supposed to > operate in the fashion of Node::dump_spec(). > > The default behaviour for compact dumping is to dump a node's name and > index. > > Specific notions of "related" have been added to the following node classes: > * AbsNode and subclasses > * AddNode and subclasses > * AddPNode > * AtanDNode > * BinaryNode > * BoolNode > * CosDNode > * CountBitsNode and subclasses > * Div{D,F,I,L}Node > * ExpDNode > * GotoNode > * HaltNode > * Log{10D,D}Node > * LShift{I,L}Node > * Mod{D,F,I,L}Node > * MulHiLNode > * Mul{D,F,I,L}Node and subclasses > * DivModNode and subclasses > * IfNode > * JumpNode > * SafePointNode and subclasses (may require more detail) > * StartNode and subclass > * NegNode and subclasses > * PowDNode > * IfProjNode and subclasses > * JProjNode and subclass > * ParmNode > * ReductionNode and subclasses > * Round{Double,Float}Node > * RShift{I,L}Node > * SqrtDNode > * SubNode and subclasses > * TanDNode > * AddV{B,D,F,I,L,S,_}Node > * DivV{D,F}Node > * LShiftV{B,I,L,S}Node > * MulV{D,F,I,S}Node > * OrVNode > * RShiftV{B,I,L,S}Node > * SubV{B,D,F,I,L,S}Node > * URShiftV{B,I,L,S}Node > * XorVNode > * URShift{I,L}Node > > Here is a sample session in LLDB, showing the different dumps for an IfNode: > > * thread #28: tid = 0x10d1ce3, 0x000000010353be17 > libjvm.dylib`IfNode::Ideal(this=0x0000000104888760, > phase=0x000000011cf62368, can_reshape=) + 77 at > ifnode.cpp:1297, name = 'Java: C2 CompilerThread0', stop reason = > breakpoint 1.1 > frame #0: 0x000000010353be17 > libjvm.dylib`IfNode::Ideal(this=0x0000000104888760, > phase=0x000000011cf62368, can_reshape=) + 77 at ifnode.cpp:1297 > 1294 if (remove_dead_region(phase, can_reshape)) return this; > 1295 // No Def-Use info? > 1296 if (!can_reshape) return NULL; > -> 1297 PhaseIterGVN *igvn = phase->is_IterGVN(); > 1298 > 1299 // Don't bother trying to transform a dead if > 1300 if (in(0)->is_top()) return NULL; > (lldb) expr -- this->dump() > 82If=== 61 79 [[ 83 84 ]] P=0.999999, C=-1.000000 !jvms: > String::charAt @ bci:27 > (lldb) expr -- this->dump_comp() > If(82)P=0.999999, C=-1.000000 > (lldb) expr -- this->dump_rel() > 10Parm=== 3 [[ 38 38 65 32 ]] Parm0: > java/lang/String:NotNull:exact * Oop:java/lang/String:NotNull:exact * > !jvms: String::charAt @ bci:-1 > 38AddP=== _ 10 10 37 [[ 39 > ]] Oop:java/lang/String:NotNull:exact+12 * [narrow] !jvms: > String::charAt @ bci:6 > 39LoadN=== _ 7 38 [[ 40 ]] @java/lang/String:exact+12 * [narrow], > name=value, idx=4; #narrowoop: char[int:>=0]:exact * > !jvms: String::charAt @ bci:6 > 40DecodeN=== _ 39 [[ 55 42 ]] #char[int:>=0]:exact * !jvms: > String::charAt @ bci:6 > 37ConL=== 0 [[ 38 56 ]] #long:12 > 55CastPP=== 47 40 [[ 56 56 97 97 96 86 > ]] #char[int:>=0]:NotNull:exact * !jvms: String::charAt @ bci:9 > 56AddP=== _ 55 55 37 [[ 57 ]] !jvms: String::charAt @ bci:9 > 7Parm=== 3 [[ 99 98 86 65 57 32 50 39 ]] Memory Memory: > @BotPTR *+bot, idx=Bot; !jvms: String::charAt @ bci:-1 > 57LoadRange=== _ 7 56 [[ 65 58 78 ]] @bottom[int:>=0]+12 * > [narrow], idx=5; #int:>=0 !jvms: String::charAt @ bci:9 > 11Parm=== 3 [[ 78 93 65 32 24 32 65 58 86 ]] Parm1: int > !jvms: String::charAt @ bci:-1 > 78CmpU=== _ 11 57 [[ 79 ]] !jvms: String::charAt @ bci:27 > 79Bool=== _ 78 [[ 82 ]] [lt] !jvms: String::charAt @ bci:27 > 82 >If=== 61 79 [[ 83 84 ]] P=0.999999, C=-1.000000 !jvms: > String::charAt @ bci:27 > 83IfTrue=== 82 [[ 99 98 ]] #1 !jvms: String::charAt @ bci:27 > 84IfFalse=== 82 [[ 86 ]] #0 !jvms: String::charAt @ bci:27 > 99Return=== 83 6 7 8 9 returns 98 [[ 0 ]] > 98LoadUS=== 83 7 96 [[ 99 ]] @char[int:>=0]:exact+any *, idx=6; > #char !jvms: String::charAt @ bci:27 > 86CallStaticJava=== 84 6 7 8 9 ( 85 1 1 55 11 ) [[ 87 ]] # > Static > uncommon_trap(reason='range_check' action='make_not_entrant') void ( > int ) C=0.000100 String::charAt @ bci:27 !jvms: String::charAt @ bci:27 > (lldb) expr -- this->dump_rel_comp() > If(82)P=0.999999, > C=-1.000000 Bool(79)[lt] CmpU(78) Parm(11)1:int LoadRange(57) > @bottom[int:>=0]+12 * [narrow], idx=5; #int:>=0 > IfTrue(83)[99][98]#1 IfFalse(84)[86]#0 Return(99) LoadUS(98) > @char[int:>=0]:exact+any *, idx=6; #char CallStaticJava(86)uncommon_trap > > Best, > > Michael > > -- > > Oracle > Dr. Michael Haupt | Principal Member of Technical Staff > Phone: +49 331 200 7277 | Fax: +49 331 200 7561 > OracleJava Platform Group | LangTools Team | Nashorn > Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, > Germany > Green Oracle Oracle is committed to > developing practices and products that help protect the environment > > From roland.westrelin at oracle.com Tue Jul 28 18:20:01 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 28 Jul 2015 20:20:01 +0200 Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer In-Reply-To: References: <55789088.5050405@oracle.com> <9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com> <5580738D.9070900@oracle.com> <5581C465.7070803@oracle.com> <05B8872C-194E-4596-9291-C62262AAA930@oracle.com> <55891C4A.6020002@oracle.com> <7EBACFD5-8D27-4EB3-A6F6-851BFAAF6737@oracle.com> <5589232B.7080705@oracle.com> Message-ID: <3883E157-55BE-4E00-87AD-C5572F92C0A4@oracle.com> >>> What about >>> >>> volatile int y; >>> volatile int x; >>> >>> y=1 >>> x=1 >>> y=2 >>> >>> transformed to: >>> >>> x=1 >>> y=2 >>> >>> ? >> >> I think this is not allowed, since operations over "x" get tied up in >> the synchronization order. > > Thanks. Then for support_IRIW_for_not_multiple_copy_atomic_cpu true, I don?t see how incorrect reordering is prevented. I took another look and I was wrong about that. void Parse::do_put_xxx(Node* obj, ciField* field, bool is_field) { bool is_vol = field->is_volatile(); // If reference is volatile, prevent following memory ops from // floating down past the volatile write. Also prevents commoning // another volatile read. if (is_vol) insert_mem_bar(Op_MemBarRelease); The barrier prevents y=1 from being optimized out. Roland. From roland.westrelin at oracle.com Tue Jul 28 18:26:33 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Tue, 28 Jul 2015 20:26:33 +0200 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: <55B7ADFE.3060109@oracle.com> References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> Message-ID: <49EC4EF4-2ADD-4F53-80DB-1B498E91E8A2@oracle.com> Thanks for looking at this, Vladimir. > The next change puzzles me: > > - if (!call->may_modify(tinst, phase)) { > + if (call->may_modify(tinst, phase)) { > - mem = call->in(TypeFunc::Memory); > + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); > > Why only ArrayCopy? I think it is most of calls. What set of tests you ran? I ran: java/lang, java/util, compiler, closed, runtime, gc jtreg tests nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring from ute jprt I?m not sure if I did CTW or not but I can if you think it makes sense. Aren?t arguments of calls marked as ArgEscape so an object that is an argument to a call cannot be scalar replaced? > Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). > > Add explicit check: > && strcmp(_name, "unsafe_arraycopy") != 0) Thanks for the suggestions. I?ll make the suggested changes. Roland. > > Thanks, > Vladimir > > On 7/28/15 7:05 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ >> >> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. >> >> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. >> >> Roland. >> From rednaxelafx at gmail.com Tue Jul 28 20:43:19 2015 From: rednaxelafx at gmail.com (Krystal Mok) Date: Tue, 28 Jul 2015 13:43:19 -0700 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: <55B5F12F.5050904@oracle.com> References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> <55B29DD8.40008@oracle.com> <55B2E7D5.7060806@oracle.com> <55B5F12F.5050904@oracle.com> Message-ID: Hi Aleksey, Thanks for fixing this in OpenJDK! I actually noticed the same issue a few weeks ago [1], but somehow I had missed the reply that Roland sent me, so I didn't send out a request for review for my version of the change. But now that it's fixed, everythings all right ;-) Thanks, Kris [1]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018179.html On Mon, Jul 27, 2015 at 1:51 AM, Aleksey Shipilev < aleksey.shipilev at oracle.com> wrote: > Thanks for pushing the change, Dean! > > -Aleksey > > On 07/25/2015 05:32 AM, John Rose wrote: > > You are fine. > > Aleksey is an Author for JDK 9: http://openjdk.java.net/census#shade > > Committers who sponsor changes are expected to use the correct Author > > (not themselves). > > It's a syntax error to push a non-Author changeset to an OpenJDK repo. > > BTW, jcheck does not consult the OJN census AFAIK. > > > > ? John > > > > On Jul 24, 2015, at 6:35 PM, Dean Long > > wrote: > >> > >> OK I will push it now. I did 'hg ci -u shade' so that Aleksey gets > >> credit for it, and jcheck isn't complaining. Does anyone know if JPRT > >> does more checks for Committer status? If so I'll have to redo it and > >> add a Contributed-by line. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Jul 28 22:42:08 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2015 15:42:08 -0700 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: <49EC4EF4-2ADD-4F53-80DB-1B498E91E8A2@oracle.com> References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> <49EC4EF4-2ADD-4F53-80DB-1B498E91E8A2@oracle.com> Message-ID: <55B80540.40904@oracle.com> I misread assert's message - I thought "only arraycopy which modifies an allocate makes it escape". Have to read more carefully. Wait, original code looks buggy - 'mem' is set to input memory regardless may_modify() result: ! if (!call->may_modify(tinst, phase)) { ! mem = call->in(TypeFunc::Memory); } mem = in->in(TypeFunc::Memory); That is what got my attention. Your change looks fine in this code - we skip all call nodes except arraycopy which modifies this field/element. And we can skip all calls because we are processing non-escaping allocation. Thanks, Vladimir On 7/28/15 11:26 AM, Roland Westrelin wrote: > Thanks for looking at this, Vladimir. > >> The next change puzzles me: >> >> - if (!call->may_modify(tinst, phase)) { >> + if (call->may_modify(tinst, phase)) { >> - mem = call->in(TypeFunc::Memory); >> + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); >> >> Why only ArrayCopy? I think it is most of calls. What set of tests you ran? > > I ran: > > java/lang, java/util, compiler, closed, runtime, gc jtreg tests > nsk.stress, vm.compiler, vm.regression, nsk.regression, nsk.monitoring from ute > jprt > > I?m not sure if I did CTW or not but I can if you think it makes sense. > > Aren?t arguments of calls marked as ArgEscape so an object that is an argument to a call cannot be scalar replaced? > >> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). >> >> Add explicit check: >> && strcmp(_name, "unsafe_arraycopy") != 0) > > Thanks for the suggestions. I?ll make the suggested changes. > > Roland. > >> >> Thanks, >> Vladimir >> >> On 7/28/15 7:05 AM, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ >>> >>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. >>> >>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. >>> >>> Roland. >>> > From michael.haupt at oracle.com Wed Jul 29 08:54:24 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Wed, 29 Jul 2015 10:54:24 +0200 Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method In-Reply-To: <55B7BACE.9080100@oracle.com> References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> <55B7BACE.9080100@oracle.com> Message-ID: <1AB40291-4B32-4B33-8621-183585854169@oracle.com> Hi Vladimir, thank you for your comments. I have uploaded a revised webrev to http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are inlined below. > Am 28.07.2015 um 19:24 schrieb Vladimir Kozlov : > Before looking to webrev, can you use whole word Node::related(), dump_related(), dump_related_compact(), dump_compact()? "comp" could be confused for "compiled". It is more typing in debugger but it is more clear. Done. > Also from this->dump_rel() in your example I see that you dump a lot more input nodes than I expect (only up to inputs of CmpU node). > But this->dump_rel_comp() produces correct set of nodes. The depth of output can be controlled with the method Node::dump_related(int d_in, int d_out); in my initial post I had not mentioned this method. The default output is also formatted in a way that makes clear where the current node (>) is, and where all the inputs (before) and outputs (after) are. Regarding the notion of "related nodes", YMMV. For additional illustration, I've added an implementation of related() for PhiNode. > It would be nice if you can avoid using macro: > > +#ifndef PRODUCT > + REL_IN_DATA_OUT_1; > +#endif > > "Arithmetic nodes" are most common data nodes (vs control nodes this->is_CFG() == true). May be instead specialized rel() method you can use some flags checks in Node::rel() method. Done. Best, Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | LangTools Team | Nashorn Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleksey.shipilev at oracle.com Wed Jul 29 08:58:51 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 29 Jul 2015 11:58:51 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure Message-ID: <55B895CB.4060105@oracle.com> Hi, I would like to suggest a fix for: https://bugs.openjdk.java.net/browse/JDK-8019968 In short, current reference CAS intrinsic blindly emits post_barrier, ignoring the CAS result. In some cases, notably contended CAS spin-loops, we fail the CAS a lot, and thus go for a post_barrier excessively. Instead, we can conditionalize on the result of the store itself, and put the post_barrier only on success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ More performance results here: http://cr.openjdk.java.net/~shade/8019968/notes.txt Thanks, -Aleksey P.S. Thanks to Vladimir Ivanov, who manhandled me through dealing with GraphKit/IdealKit interop. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From aleksey.shipilev at oracle.com Wed Jul 29 09:00:25 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 29 Jul 2015 12:00:25 +0300 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> <55B29DD8.40008@oracle.com> <55B2E7D5.7060806@oracle.com> <55B5F12F.5050904@oracle.com> Message-ID: <55B89629.8060102@oracle.com> Ah. That would save me half a day digging in HotSpot. Thanks should go to John Rose who suggested the factory fix, not the Class.cast intrinsic one. -Aleksey On 07/28/2015 11:43 PM, Krystal Mok wrote: > Hi Aleksey, > > Thanks for fixing this in OpenJDK! > I actually noticed the same issue a few weeks ago [1], but somehow I had > missed the reply that Roland sent me, so I didn't send out a request for > review for my version of the change. > But now that it's fixed, everythings all right ;-) > > Thanks, > Kris > > [1]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018179.html > > On Mon, Jul 27, 2015 at 1:51 AM, Aleksey Shipilev > > wrote: > > Thanks for pushing the change, Dean! > > -Aleksey > > On 07/25/2015 05:32 AM, John Rose wrote: > > You are fine. > > Aleksey is an Author for JDK 9: http://openjdk.java.net/census#shade > > Committers who sponsor changes are expected to use the correct Author > > (not themselves). > > It's a syntax error to push a non-Author changeset to an OpenJDK repo. > > BTW, jcheck does not consult the OJN census AFAIK. > > > > ? John > > > > On Jul 24, 2015, at 6:35 PM, Dean Long > > >> wrote: > >> > >> OK I will push it now. I did 'hg ci -u shade' so that Aleksey gets > >> credit for it, and jcheck isn't complaining. Does anyone know if > JPRT > >> does more checks for Committer status? If so I'll have to redo > it and > >> add a Contributed-by line. > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From adinn at redhat.com Wed Jul 29 09:24:38 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 29 Jul 2015 10:24:38 +0100 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55B895CB.4060105@oracle.com> References: <55B895CB.4060105@oracle.com> Message-ID: <55B89BD6.7050209@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 29/07/15 09:58, Aleksey Shipilev wrote: > I would like to suggest a fix for: > https://bugs.openjdk.java.net/browse/JDK-8019968 > > In short, current reference CAS intrinsic blindly emits > post_barrier, ignoring the CAS result. In some cases, notably > contended CAS spin-loops, we fail the CAS a lot, and thus go for a > post_barrier excessively. Instead, we can conditionalize on the > result of the store itself, and put the post_barrier only on > success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ > > More performance results here: > http://cr.openjdk.java.net/~shade/8019968/notes.txt Nice! The code looks fine and your test results are very convincing. I'll be interested to see how this looks on AArch64. That said, I am afraid you still need a Reviewer! regards, Andrew Dinn - ----------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJVuJvWAAoJEGnaNq4xxcSz4wMH/3azdliPZzS5ZaS4d67JHTc/ RoX9Mq6m+Aa5PwqTT5luq68B4NIuWVu808gK4iDA0+tS2DSrpgaIyzTJm7HGz94o CYoQRxTG2git9QD1nILJBi40s1OI7zyP4GK0I6sa/2Pm6BnLZ3uA/tXo2UGthYBO nHAb/i9DHR8i5/gIKNLezUnkRNElQRkL32cIdDxt2qlRA/KkKR1GZM+8C4SFoHxY 2SSPewZkOofF9ewByGn/aboQWBSMW6vnPNGjLm4ODSuc9PqtV3XUM3WNyL83yT9Z PIZfUlYn8vAo+POgKsrRro0ZOj6njjG4UIDpJYp01tN8+VYpu1XquM8uhXU6BY8= =GdV6 -----END PGP SIGNATURE----- From aleksey.shipilev at oracle.com Wed Jul 29 09:57:36 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 29 Jul 2015 12:57:36 +0300 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55B89BD6.7050209@redhat.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> Message-ID: <55B8A390.4000203@oracle.com> On 07/29/2015 12:24 PM, Andrew Dinn wrote: > On 29/07/15 09:58, Aleksey Shipilev wrote: >> I would like to suggest a fix for: >> https://bugs.openjdk.java.net/browse/JDK-8019968 > >> In short, current reference CAS intrinsic blindly emits >> post_barrier, ignoring the CAS result. In some cases, notably >> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >> post_barrier excessively. Instead, we can conditionalize on the >> result of the store itself, and put the post_barrier only on >> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ > >> More performance results here: >> http://cr.openjdk.java.net/~shade/8019968/notes.txt > > Nice! The code looks fine and your test results are very convincing. > I'll be interested to see how this looks on AArch64. Thanks Andrew! The change passes JPRT, so AArch64 build is available. The benchmark JAR mentioned in the issue comments would run without intervention, taking around 40 minutes. You are very welcome to try, while Reviewers are taking a look. I can do that only next week. > That said, I am afraid you still need a Reviewer! That reminds me I haven't spelled out what testing was done: * JPRT on all open platforms * Targeted benchmarks * Eyeballing the generated x86 assembly Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From roland.westrelin at oracle.com Wed Jul 29 10:51:36 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 29 Jul 2015 12:51:36 +0200 Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation -XX:CICompilerCount=1 -version asserts since 8130858 Message-ID: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> http://cr.openjdk.java.net/~roland/8132525/webrev.00/ My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process. Roland. From adinn at redhat.com Wed Jul 29 10:55:48 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 29 Jul 2015 11:55:48 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores Message-ID: <55B8B134.2050803@redhat.com> The following webrev is a follow-on to the fix posted for JDK-8078263. The earlier fix optimized *non-object* volatile puts/gets to use stlr/ldar and elide the associated leading/trailing dmb instructions. This one extends the previous optimization to cover volatile object puts. http://cr.openjdk.java.net/~adinn/8078743/webrev.03/ The fix involves identifying certain Ideal Graph configurations in which generation of leading and trailing dmb can be avoided in favour of generating an stlr instruction. As a consequence the fix is sensitive to the current GC configs and also to whether the value being written is null, potentially null or known notnull. I have tested it using 5 GC configs: G1GC CMS+UseCondCardMark CMS-UseCondCardMark Parallel+UseCondCardMark Parallel-UseCondCardMark The last two configs are much of a muchness as regards what the patch code does since with Parallel (or Serial) GC the patch code does not need to look at the code which does the card mark -- but I tested against both just to be sure. Testing involved i) eyeballing the code generated for normal and unsafe volatile puts with null, possibly null and notnull values ii) exercising a large program (build and run sample project in netbeans) and eyeballing the code generated for methods of ConcurrentHashMap iii) running the full jcstress suite Comments and reviews very welcome n.b. I have a 3rd patch queued which performs a similar optimization for CAS operations (drop the dmbs in favour of an ldaxr/stlxr pair). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From roland.westrelin at oracle.com Wed Jul 29 13:57:34 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 29 Jul 2015 15:57:34 +0200 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: <55B7ADFE.3060109@oracle.com> References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> Message-ID: <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com> > The next change puzzles me: > > - if (!call->may_modify(tinst, phase)) { > + if (call->may_modify(tinst, phase)) { > - mem = call->in(TypeFunc::Memory); > + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); > > Why only ArrayCopy? I think it is most of calls. What set of tests you ran? > > Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). What about: static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); instead of membar_for_arraycopy() So ArrayCopyNode would have: virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase); and static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); that do the same thing except the static method also looks for a graph pattern starting from a MemBar. Roland. > > Add explicit check: > && strcmp(_name, "unsafe_arraycopy") != 0) > > Thanks, > Vladimir > > On 7/28/15 7:05 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ >> >> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. >> >> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. >> >> Roland. >> From aleksey.shipilev at oracle.com Wed Jul 29 14:01:00 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 29 Jul 2015 17:01:00 +0300 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B6063B.3070604@redhat.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> Message-ID: <55B8DC9C.7010003@oracle.com> On 07/27/2015 01:21 PM, Andrew Haley wrote: > On 27/07/15 10:13, Aleksey Shipilev wrote: >> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >> >> Andrew/Edward, are you OK with AArch64 part? >> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ > > I agree that it looks good. So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and Andrew Haley. Still no Capital (R)eviewers. Otherwise, I think we are good to go. I respinned the JPRT with open+closed sources, and it would seem the changes in closed sources are not required. Please review and sponsor! Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vladimir.x.ivanov at oracle.com Wed Jul 29 14:16:19 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2015 17:16:19 +0300 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B8DC9C.7010003@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> <55B8DC9C.7010003@oracle.com> Message-ID: <55B8E033.4060900@oracle.com> Looks good. Best regards, Vladimir Ivanov On 7/29/15 5:01 PM, Aleksey Shipilev wrote: > On 07/27/2015 01:21 PM, Andrew Haley wrote: >> On 27/07/15 10:13, Aleksey Shipilev wrote: >>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>> >>> Andrew/Edward, are you OK with AArch64 part? >>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >> >> I agree that it looks good. > > So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and > Andrew Haley. Still no Capital (R)eviewers. > > Otherwise, I think we are good to go. I respinned the JPRT with > open+closed sources, and it would seem the changes in closed sources are > not required. > > Please review and sponsor! > > Thanks, > -Aleksey > > From dean.long at oracle.com Wed Jul 29 16:30:24 2015 From: dean.long at oracle.com (Dean Long) Date: Wed, 29 Jul 2015 09:30:24 -0700 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B8DC9C.7010003@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> <55B8DC9C.7010003@oracle.com> Message-ID: <55B8FFA0.4070105@oracle.com> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: > On 07/27/2015 01:21 PM, Andrew Haley wrote: >> On 27/07/15 10:13, Aleksey Shipilev wrote: >>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>> >>> Andrew/Edward, are you OK with AArch64 part? >>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >> I agree that it looks good. > So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and > Andrew Haley. Still no Capital (R)eviewers. > > Otherwise, I think we are good to go. I respinned the JPRT with > open+closed sources, and it would seem the changes in closed sources are > not required. The changes to sparc and ppc may not be required anymore. dl > Please review and sponsor! > > Thanks, > -Aleksey > > From vladimir.kozlov at oracle.com Wed Jul 29 17:09:22 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2015 10:09:22 -0700 Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation -XX:CICompilerCount=1 -version asserts since 8130858 In-Reply-To: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> Message-ID: <55B908C2.1040301@oracle.com> Okay. Thanks, Vladimir On 7/29/15 3:51 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8132525/webrev.00/ > > My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process. > > Roland. > From john.r.rose at oracle.com Wed Jul 29 17:53:51 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 29 Jul 2015 10:53:51 -0700 Subject: RFR (S) 8131782: C1 Class.cast optimization breaks when Class is loaded from static final In-Reply-To: <55B89629.8060102@oracle.com> References: <55ACFD0F.30207@oracle.com> <55AE1972.4050106@oracle.com> <55B20A62.2090307@oracle.com> <55B29DD8.40008@oracle.com> <55B2E7D5.7060806@oracle.com> <55B5F12F.5050904@oracle.com> <55B89629.8060102@oracle.com> Message-ID: <0AC6A98C-8773-4A8D-9EE8-6DC9D207A0CA@oracle.com> It's possible that Kris's suggestion helped me (through some unconscious brain backplane) to find the same fix location. In any case, he called it first; thanks Kris, and keep up the good work! ? John On Jul 29, 2015, at 2:00 AM, Aleksey Shipilev wrote: > > Ah. That would save me half a day digging in HotSpot. Thanks should go > to John Rose who suggested the factory fix, not the Class.cast intrinsic > one. > > -Aleksey > > On 07/28/2015 11:43 PM, Krystal Mok wrote: >> Hi Aleksey, >> >> Thanks for fixing this in OpenJDK! >> I actually noticed the same issue a few weeks ago [1], but somehow I had >> missed the reply that Roland sent me, so I didn't send out a request for >> review for my version of the change. >> But now that it's fixed, everythings all right ;-) >> >> Thanks, >> Kris >> >> [1]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-June/018179.html >> >> On Mon, Jul 27, 2015 at 1:51 AM, Aleksey Shipilev >> > wrote: >> >> Thanks for pushing the change, Dean! >> >> -Aleksey >> >> On 07/25/2015 05:32 AM, John Rose wrote: >>> You are fine. >>> Aleksey is an Author for JDK 9: http://openjdk.java.net/census#shade >>> Committers who sponsor changes are expected to use the correct Author >>> (not themselves). >>> It's a syntax error to push a non-Author changeset to an OpenJDK repo. >>> BTW, jcheck does not consult the OJN census AFAIK. >>> >>> ? John >>> >>> On Jul 24, 2015, at 6:35 PM, Dean Long > >>> >> wrote: >>>> >>>> OK I will push it now. I did 'hg ci -u shade' so that Aleksey gets >>>> credit for it, and jcheck isn't complaining. Does anyone know if >> JPRT >>>> does more checks for Committer status? If so I'll have to redo >> it and >>>> add a Contributed-by line. >>> >> >> >> > > From gerard.ziemski at oracle.com Wed Jul 29 18:34:35 2015 From: gerard.ziemski at oracle.com (gerard ziemski) Date: Wed, 29 Jul 2015 13:34:35 -0500 Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation -XX:CICompilerCount=1 -version asserts since 8130858 In-Reply-To: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> Message-ID: <55B91CBB.2040800@oracle.com> hi Roland, You put *+ // With a client VM, -XX:+TieredCompilation causes TieredCompilation* *+ // to be true here (the option is validated later) and* *+ // min_number_of_compiler_threads to exceed CI_COMPILER_COUNT.* *+ min_number_of_compiler_threads = MIN2(min_number_of_compiler_threads, CI_COMPILER_COUNT);* into src/share/vm/runtime/commandLineFlagConstraintsCompiler.cpp file, but that's a constraint function - there should not be any code that sets anything there. Is there somewhere else we can put this code instead of the constraint function? cheers On 07/29/2015 05:51 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8132525/webrev.00/ > > My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process. > > Roland. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Jul 29 18:48:44 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 29 Jul 2015 20:48:44 +0200 Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation -XX:CICompilerCount=1 -version asserts since 8130858 In-Reply-To: <55B91CBB.2040800@oracle.com> References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> <55B91CBB.2040800@oracle.com> Message-ID: <02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com> Hi Gerard, Thanks for looking at this. > You put + // With a client VM, -XX:+TieredCompilation causes TieredCompilation > + // to be true here (the option is validated later) and > + // min_number_of_compiler_threads to exceed CI_COMPILER_COUNT. > + min_number_of_compiler_threads = MIN2(min_number_of_compiler_threads, CI_COMPILER_COUNT); > into src/share/vm/runtime/commandLineFlagConstraintsCompiler.cpp file, but that's a constraint function - there should not be any code that sets anything there. Is there somewhere else we can put this code instead of the constraint function? min_number_of_compiler_threads is a local variable so it shouldn?t be a problem, right? Roland. > > cheers > > On 07/29/2015 05:51 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8132525/webrev.00/ >> >> >> My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process. >> >> Roland. >> > From gerard.ziemski at oracle.com Wed Jul 29 18:50:00 2015 From: gerard.ziemski at oracle.com (gerard ziemski) Date: Wed, 29 Jul 2015 13:50:00 -0500 Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation -XX:CICompilerCount=1 -version asserts since 8130858 In-Reply-To: <02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com> References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> <55B91CBB.2040800@oracle.com> <02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com> Message-ID: <55B92058.3010207@oracle.com> Right, looks good. cheers On 07/29/2015 01:48 PM, Roland Westrelin wrote: > Hi Gerard, > > Thanks for looking at this. > >> You put + // With a client VM, -XX:+TieredCompilation causes TieredCompilation >> + // to be true here (the option is validated later) and >> + // min_number_of_compiler_threads to exceed CI_COMPILER_COUNT. >> + min_number_of_compiler_threads = MIN2(min_number_of_compiler_threads, CI_COMPILER_COUNT); >> into src/share/vm/runtime/commandLineFlagConstraintsCompiler.cpp file, but that's a constraint function - there should not be any code that sets anything there. Is there somewhere else we can put this code instead of the constraint function? > min_number_of_compiler_threads is a local variable so it shouldn?t be a problem, right? > > Roland. > >> cheers >> >> On 07/29/2015 05:51 AM, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8132525/webrev.00/ >>> >>> >>> My recent change to the validation of CICompilerCount triggers and assert failure: with -client -XX:+TieredCompilation, TieredCompilation is true when the CICompilerCount option is validated and an erroneous value is used for the min number of threads. TieredCompilation is forced off later in the command line validation process. >>> >>> Roland. >>> > From roland.westrelin at oracle.com Wed Jul 29 18:57:09 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 29 Jul 2015 20:57:09 +0200 Subject: RFR(XS): 8132525: java -client -XX:+TieredCompilation -XX:CICompilerCount=1 -version asserts since 8130858 In-Reply-To: <55B92058.3010207@oracle.com> References: <88D46D2A-1EB6-4507-A5C1-B19FF47555BE@oracle.com> <55B91CBB.2040800@oracle.com> <02EC85F7-226D-4D17-89F1-F59A2E6DB0FE@oracle.com> <55B92058.3010207@oracle.com> Message-ID: Thanks Gerard & Vladimir for the reviews. Roland. From vladimir.kozlov at oracle.com Thu Jul 30 00:48:39 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2015 17:48:39 -0700 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com> References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com> Message-ID: <55B97467.3000404@oracle.com> On 7/29/15 6:57 AM, Roland Westrelin wrote: >> The next change puzzles me: >> >> - if (!call->may_modify(tinst, phase)) { >> + if (call->may_modify(tinst, phase)) { >> - mem = call->in(TypeFunc::Memory); >> + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); >> >> Why only ArrayCopy? I think it is most of calls. What set of tests you ran? >> >> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). > > What about: > > static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); > > instead of membar_for_arraycopy() > > So ArrayCopyNode would have: > > virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase); > > and > > static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); > > that do the same thing except the static method also looks for a graph pattern starting from a MemBar. Yes, it is better. Thanks, Vladimir > > Roland. > >> >> Add explicit check: >> && strcmp(_name, "unsafe_arraycopy") != 0) >> >> Thanks, >> Vladimir >> >> On 7/28/15 7:05 AM, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ >>> >>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. >>> >>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. >>> >>> Roland. >>> > From vladimir.kozlov at oracle.com Thu Jul 30 02:08:56 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2015 19:08:56 -0700 Subject: Fwd: Tiered compilation and virtual call heuristics In-Reply-To: References: Message-ID: <55B98738.3040502@oracle.com> Hi Carsten, The main issue here is that without Tiered Interpreter starts collection profiling information only after 3300 invocations (InterpreterProfilePercentage). As result data from first invocations is not recorded. On other hand with Tiered C1 compilation (with profiling code) is triggered after 100 invocations. So you have a lot more data as you observed. If you can sacrifice a startup performance you can try to use CompileThresholdScaling to increase compilation thresholds to delay compilations. Or you can also try to increase Tier3InvocationThreshold and Tier3CompileThreshold to delay only C1 compilation: Here is formula from simpleThresholdPolicy.inline.hpp: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); But if you have real "flat" profile (all called methods are relatively warm) nothing will help you. If you have some methods which are relatively hot you can solve that by trying to call them at the beginning. For example, if you had count400(0) called first (or second) you will get record for it in MDO. And then you can try to low TypeProfileMajorReceiverPercent to avoid virtual call at least for on hot method (recorded in MDO): product(intx, TypeProfileMajorReceiverPercent, 90, "% of major receiver type to all profiled receivers") Regards, Vladimir On 7/22/15 10:37 AM, Carsten Varming wrote: > Dear Hotspot compiler group, > > I have had a few issues with tiered compilation in JDK8 lately and was > wondering if you have some comments or ideas for the given problem. > > Here is my problem as I currently understand it. Feel free to correct > any misunderstandings I may have. With tiered compilation the heuristics > for inlining virtual calls seems to degrade quite a bit. I think this is > due to MethodData objects being created much earlier with tiered than > without. This causes the tracking of the hottest target methods at a > virtual call site to go awry, due to the limit (2) on the number of > MethodData objects that can be associated with a bci in a method. It > seems like the only virtual call targets tracked are the targets that > are warm when when C1 is invoked. > > The program ends up with all call-sites in > scala.collection.IndexedSeqOptimized.slice using virtual dispatch with > tiered and bimorphic call sites without tiered. The end result with > tiered is a tripling of the cpu required to run the program, and > instruction pointers from the compiled slice method end up in 90% of all > cpu samples (collected with perf at 4kHz). > > The problem is with a small application built in Scala on top of Netty. > I have written a small sample program (see attached Main.java) to spare > you the details (and to be able to give you code). > > When I run the sample program with tiered then the call to count end up > being a virtual call, due to Instance$3.count and Instance4.count being > warm when C1 kicks in. Without tiered Instance$1.count is the only hot > method. > > I wonder if you guys have seen this problem in the wild or if I just > happen to be unlucky. Increasing BciProfileWidth should help in my case, > but it is not a product flag. Do you have any experience regarding cost > of increasing BciProfileWidth? Do you have any thoughts on throwing out > MethodData objects for virtual call sites that turns out to be pretty cold? > > Thank you in advance for your thoughts, > Carsten > From vladimir.kozlov at oracle.com Thu Jul 30 02:15:14 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2015 19:15:14 -0700 Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method In-Reply-To: <1AB40291-4B32-4B33-8621-183585854169@oracle.com> References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> <55B7BACE.9080100@oracle.com> <1AB40291-4B32-4B33-8621-183585854169@oracle.com> Message-ID: <55B988B2.5000409@oracle.com> This looks good to me. You need secondreviewer to look on this since changes are big. Thanks. Vladimir On 7/29/15 1:54 AM, Michael Haupt wrote: > Hi Vladimir, > > thank you for your comments. I have uploaded a revised webrev to > http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are > inlined below. > >> Am 28.07.2015 um 19:24 schrieb Vladimir Kozlov >> >: >> Before looking to webrev, can you use whole word Node::related(), >> dump_related(), dump_related_compact(), dump_compact()? "comp" could >> be confused for "compiled". It is more typing in debugger but it is >> more clear. > > Done. > >> Also from this->dump_rel() in your example I see that you dump a lot >> more input nodes than I expect (only up to inputs of CmpU node). >> But this->dump_rel_comp() produces correct set of nodes. > > The depth of output can be controlled with the method > Node::dump_related(int d_in, int d_out); in my initial post I had not > mentioned this method. The default output is also formatted in a way > that makes clear where the current node (>) is, and where all the inputs > (before) and outputs (after) are. Regarding the notion of "related > nodes", YMMV. > > For additional illustration, I've added an implementation of related() > for PhiNode. > >> It would be nice if you can avoid using macro: >> >> +#ifndef PRODUCT >> + REL_IN_DATA_OUT_1; >> +#endif >> >> "Arithmetic nodes" are most common data nodes (vs control nodes >> this->is_CFG() == true). May be instead specialized rel() method you >> can use some flags checks in Node::rel() method. > > Done. > > Best, > > Michael > > -- > > Oracle > Dr. Michael Haupt | Principal Member of Technical Staff > Phone: +49 331 200 7277 | Fax: +49 331 200 7561 > OracleJava Platform Group | LangTools Team | Nashorn > Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, > Germany > Green Oracle Oracle is committed to > developing practices and products that help protect the environment > > From michael.haupt at oracle.com Thu Jul 30 05:06:51 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Thu, 30 Jul 2015 07:06:51 +0200 Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method In-Reply-To: <55B988B2.5000409@oracle.com> References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> <55B7BACE.9080100@oracle.com> <1AB40291-4B32-4B33-8621-183585854169@oracle.com> <55B988B2.5000409@oracle.com> Message-ID: <585D977C-2C99-4F96-BC3A-FB3EE56304C9@oracle.com> Hi Vladimir, thank you. Anyone, please review ... Best, Michael > Am 30.07.2015 um 04:15 schrieb Vladimir Kozlov : > > This looks good to me. You need secondreviewer to look on this since changes are big. > > Thanks. > Vladimir > > On 7/29/15 1:54 AM, Michael Haupt wrote: >> Hi Vladimir, >> >> thank you for your comments. I have uploaded a revised webrev to >> http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are >> inlined below. >> >>> Am 28.07.2015 um 19:24 schrieb Vladimir Kozlov >>> >: >>> Before looking to webrev, can you use whole word Node::related(), >>> dump_related(), dump_related_compact(), dump_compact()? "comp" could >>> be confused for "compiled". It is more typing in debugger but it is >>> more clear. >> >> Done. >> >>> Also from this->dump_rel() in your example I see that you dump a lot >>> more input nodes than I expect (only up to inputs of CmpU node). >>> But this->dump_rel_comp() produces correct set of nodes. >> >> The depth of output can be controlled with the method >> Node::dump_related(int d_in, int d_out); in my initial post I had not >> mentioned this method. The default output is also formatted in a way >> that makes clear where the current node (>) is, and where all the inputs >> (before) and outputs (after) are. Regarding the notion of "related >> nodes", YMMV. >> >> For additional illustration, I've added an implementation of related() >> for PhiNode. >> >>> It would be nice if you can avoid using macro: >>> >>> +#ifndef PRODUCT >>> + REL_IN_DATA_OUT_1; >>> +#endif >>> >>> "Arithmetic nodes" are most common data nodes (vs control nodes >>> this->is_CFG() == true). May be instead specialized rel() method you >>> can use some flags checks in Node::rel() method. >> >> Done. >> >> Best, >> >> Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | LangTools Team | Nashorn Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Jul 30 09:33:45 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 30 Jul 2015 12:33:45 +0300 Subject: [9] RFR (M): 8011858: Use Compile::live_nodes() instead of Compile::unique() in appropriate places In-Reply-To: <55A9AAB6.50505@oracle.com> References: <55A9AAB6.50505@oracle.com> Message-ID: <55B9EF79.1040907@oracle.com> Looks good. I'll sponsor the change. Best regards, Vladimir Ivanov On 7/18/15 4:24 AM, Vladimir Kozlov wrote: > Thank you, Vlad > > It looks good. We usually don't put bug id into comments. So your > previous version on cr.openjdk is fine. > > Second reviewer should look on and sponsor it with you listed as > contributor (I see you signed OCA already). > > Thanks, > Vladimir > > On 7/17/15 3:47 PM, Vlad Ureche wrote: >> Hi, >> >> Please review the following patch for JDK-8011858. Big thanks to >> Vladimir Kozlov for his patient guidance while working on this! >> >> *Bug:* https://bugs.openjdk.java.net/browse/JDK-8011858 >> >> *Problem:* Throughout C2, local stacks are used to prevent recursive >> calls from blowing up the system stack. These are sized based on the >> total number of nodes in the compilation run (e.g. C->unique()). >> Instead, they should be sized based on the live node count >> (C->live_nodes()). >> >> Now, with the increased difference between live_nodes (limited at >> LiveNodeCountInliningCutoff, set to 40K) and unique nodes (which can go >> up to 240K), it is important to not over-estimate the size of stacks. >> >> *Solution:* This patch mirrors a patch written by Vladimir Kozlov for >> JDK8u. It replaces the initial sizes from C->unique() to >> C->live_nodes(), preserving any shifts (divisions) and offsets. For >> example, in the compile.cpp patch >> : >> >> >> |- Node_Stack nstack(unique() >> 1); >> + Node_Stack nstack(live_nodes() >> 1); >> | >> >> There is an issue described at >> https://bugs.openjdk.java.net/browse/JDK-8131702 where I took the >> workaround from Vladimir?s patch. >> >> *Webrev:* http://cr.openjdk.java.net/~kvn/8011858/webrev/ or >> http://vladureche.ro/webrev/8011858 >> (updated, includes a link to bug >> 8121702) >> >> *Tests:* Running jtreg with the compiler, runtime and gc tests on the >> dev branch shows the same status >> before and after the patch: 808 tests passed, 16 failed and 6 errors >> . What >> would be a stable point where all tests are expected to pass, so I can >> test the patch there? Maybe jdk9 ? >> >> Thanks, >> Vlad >> From roland.westrelin at oracle.com Thu Jul 30 18:29:37 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 30 Jul 2015 20:29:37 +0200 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: <55B97467.3000404@oracle.com> References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com> <55B97467.3000404@oracle.com> Message-ID: Updated webrev with Vladimir?s comments: http://cr.openjdk.java.net/~roland/8130847/webrev.01/ Roland. > On Jul 30, 2015, at 2:48 AM, Vladimir Kozlov wrote: > > On 7/29/15 6:57 AM, Roland Westrelin wrote: >>> The next change puzzles me: >>> >>> - if (!call->may_modify(tinst, phase)) { >>> + if (call->may_modify(tinst, phase)) { >>> - mem = call->in(TypeFunc::Memory); >>> + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); >>> >>> Why only ArrayCopy? I think it is most of calls. What set of tests you ran? >>> >>> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). >> >> What about: >> >> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); >> >> instead of membar_for_arraycopy() >> >> So ArrayCopyNode would have: >> >> virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase); >> >> and >> >> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); >> >> that do the same thing except the static method also looks for a graph pattern starting from a MemBar. > > Yes, it is better. > > Thanks, > Vladimir > >> >> Roland. >> >>> >>> Add explicit check: >>> && strcmp(_name, "unsafe_arraycopy") != 0) >>> >>> Thanks, >>> Vladimir >>> >>> On 7/28/15 7:05 AM, Roland Westrelin wrote: >>>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ >>>> >>>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. >>>> >>>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. >>>> >>>> Roland. From vladimir.kozlov at oracle.com Thu Jul 30 18:34:34 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 30 Jul 2015 11:34:34 -0700 Subject: RFR(M): 8130847: Cloned object's fields observed as null after C2 escape analysis In-Reply-To: References: <071D5DCC-9B77-4A86-9DA1-6C97BC99F496@oracle.com> <55B7ADFE.3060109@oracle.com> <9A22C312-E1B5-4851-82C2-CE5FBCF75869@oracle.com> <55B97467.3000404@oracle.com> Message-ID: <55BA6E3A.2030007@oracle.com> Looks good to me. Thanks, Vladimir On 7/30/15 11:29 AM, Roland Westrelin wrote: > Updated webrev with Vladimir?s comments: > > http://cr.openjdk.java.net/~roland/8130847/webrev.01/ > > Roland. > >> On Jul 30, 2015, at 2:48 AM, Vladimir Kozlov wrote: >> >> On 7/29/15 6:57 AM, Roland Westrelin wrote: >>>> The next change puzzles me: >>>> >>>> - if (!call->may_modify(tinst, phase)) { >>>> + if (call->may_modify(tinst, phase)) { >>>> - mem = call->in(TypeFunc::Memory); >>>> + assert(call->is_ArrayCopy(), "ArrayCopy is the only call node that doesn't make allocation escape"); >>>> >>>> Why only ArrayCopy? I think it is most of calls. What set of tests you ran? >>>> >>>> Methods naming is confusing. membar_for_arraycopy() does not check for membar but for calls which can modify. handle_arraycopy() could be make_arraycopy_load(). >>> >>> What about: >>> >>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); >>> >>> instead of membar_for_arraycopy() >>> >>> So ArrayCopyNode would have: >>> >>> virtual bool may_modify(const TypeOopPtr *t_oop, PhaseTransform *phase); >>> >>> and >>> >>> static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase); >>> >>> that do the same thing except the static method also looks for a graph pattern starting from a MemBar. >> >> Yes, it is better. >> >> Thanks, >> Vladimir >> >>> >>> Roland. >>> >>>> >>>> Add explicit check: >>>> && strcmp(_name, "unsafe_arraycopy") != 0) >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/28/15 7:05 AM, Roland Westrelin wrote: >>>>> http://cr.openjdk.java.net/~roland/8130847/webrev.00/ >>>>> >>>>> When an allocation which is the destination of an ArrayCopyNode is eliminated, field?s values recorded at a safepoint (to reallocate the object) do not take the ArrayCopyNode into account at all and the effect or the ArrayCopyNode is lost on a deoptimization. This fix records values from the source of the ArrayCopyNode, emitting new loads if necessary. >>>>> >>>>> I also use the opportunity to pin the loads generated in LoadNode::can_see_arraycopy_value() because they depend on all checks that validate the array copy and not only on the check that immediately dominates. >>>>> >>>>> Roland. > From vladimir.kozlov at oracle.com Thu Jul 30 19:19:19 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 30 Jul 2015 12:19:19 -0700 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55B8B134.2050803@redhat.com> References: <55B8B134.2050803@redhat.com> Message-ID: <55BA78B7.7030300@oracle.com> First, thank you for extensive comments - they help. Second, does it really help? I don't see any numbers. Next pattern (returning NULL or false) is repeater several times. May be make separate function for it. + if (opcode != Op_MemBarRelease) { + if (opcode != Op_MemBarCPUOrder) + return NULL; + MemBarNode *parent = parent_membar(leading); + if (!parent || !parent->Opcode() == Op_MemBarRelease) + return NULL; + } Can you replace retry_feed: with while(cond) ? Next could be one line Node *x = training + Node *x; + // the Mem feed to the membar should be a merge + x = trailing->in(TypeFunc::Memory); Thanks, Vladimir On 7/29/15 3:55 AM, Andrew Dinn wrote: > The following webrev is a follow-on to the fix posted for JDK-8078263. > The earlier fix optimized *non-object* volatile puts/gets to use > stlr/ldar and elide the associated leading/trailing dmb instructions. > This one extends the previous optimization to cover volatile object puts. > > http://cr.openjdk.java.net/~adinn/8078743/webrev.03/ > > The fix involves identifying certain Ideal Graph configurations in which > generation of leading and trailing dmb can be avoided in favour of > generating an stlr instruction. As a consequence the fix is sensitive to > the current GC configs and also to whether the value being written is > null, potentially null or known notnull. I have tested it using 5 GC > configs: > > G1GC > CMS+UseCondCardMark > CMS-UseCondCardMark > Parallel+UseCondCardMark > Parallel-UseCondCardMark > > The last two configs are much of a muchness as regards what the patch > code does since with Parallel (or Serial) GC the patch code does not > need to look at the code which does the card mark -- but I tested > against both just to be sure. > > Testing involved > > i) eyeballing the code generated for normal and unsafe volatile puts > with null, possibly null and notnull values > > ii) exercising a large program (build and run sample project in > netbeans) and eyeballing the code generated for methods of ConcurrentHashMap > > iii) running the full jcstress suite > > Comments and reviews very welcome > > n.b. I have a 3rd patch queued which performs a similar optimization for > CAS operations (drop the dmbs in favour of an ldaxr/stlxr pair). > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From vladimir.kozlov at oracle.com Fri Jul 31 03:03:02 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 30 Jul 2015 20:03:02 -0700 Subject: RFR (S) 8019968: Reference CAS induces GC store barrier even on failure In-Reply-To: <55B8A390.4000203@oracle.com> References: <55B895CB.4060105@oracle.com> <55B89BD6.7050209@redhat.com> <55B8A390.4000203@oracle.com> Message-ID: <55BAE566.5020904@oracle.com> I think the test is wrong. It should be: if_then(load_store, BoolTest::eq, oldval, PROB_STATIC_FREQUENT); Thanks, Vladimir On 7/29/15 2:57 AM, Aleksey Shipilev wrote: > On 07/29/2015 12:24 PM, Andrew Dinn wrote: >> On 29/07/15 09:58, Aleksey Shipilev wrote: >>> I would like to suggest a fix for: >>> https://bugs.openjdk.java.net/browse/JDK-8019968 >> >>> In short, current reference CAS intrinsic blindly emits >>> post_barrier, ignoring the CAS result. In some cases, notably >>> contended CAS spin-loops, we fail the CAS a lot, and thus go for a >>> post_barrier excessively. Instead, we can conditionalize on the >>> result of the store itself, and put the post_barrier only on >>> success path: http://cr.openjdk.java.net/~shade/8019968/webrev.01/ >> >>> More performance results here: >>> http://cr.openjdk.java.net/~shade/8019968/notes.txt >> >> Nice! The code looks fine and your test results are very convincing. >> I'll be interested to see how this looks on AArch64. > > Thanks Andrew! > > The change passes JPRT, so AArch64 build is available. The benchmark JAR > mentioned in the issue comments would run without intervention, taking > around 40 minutes. You are very welcome to try, while Reviewers are > taking a look. I can do that only next week. > >> That said, I am afraid you still need a Reviewer! > > That reminds me I haven't spelled out what testing was done: > > * JPRT on all open platforms > * Targeted benchmarks > * Eyeballing the generated x86 assembly > > Thanks, > -Aleksey > > From zoltan.majo at oracle.com Fri Jul 31 10:02:06 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 31 Jul 2015 12:02:06 +0200 Subject: [9] RFR(S): 8132457: Unify command-line flags controlling the usage of compiler intrinsics Message-ID: <55BB479E.8000402@oracle.com> Hi, please review the following patch for JDK-8132457. Bug: https://bugs.openjdk.java.net/browse/JDK-8132457 Problem: There are four cases when flags controlling intrinsics for C1 and C2 behave inconsistently: 1) The DisableIntrinsic flag is C2-specific. 2) The InlineNatives flag disables most but not all intrinsics. Some intrinsics (implemented by both C1 and C2) are turned off by -XX:-InlineNatives for C1 but are left on for C2. 3) The _getClass intrinsic (implemented by both C1 and C2) is turned off by -XX:-InlineClassNatives for C1 and is left unaffected by C2. 4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject, _compareAndSwapLong, and _compareAndSwapInt intrinsics are turned off by -XX:-InlineUnsafeOps for C2 and are unaffected by C1. Solution: Unify command-line flags controlling intrinsic processing. Processing of command-line flags is now done only in vmIntrinsics::is_disabled_by_flags and there is no compiler-specific flag processing. The inconsistencies listed in the problem description were addressed the following way: 1) Extend the C1 compiler to consider the DisableIntrinsic flag when checking if an intrinsic is available. 2) -XX:-InlineNatives turns off most intrinsics but leaves on some intrinsics (the same set of intrinsics are left on for both C1 and C2). 3) -XX:-InlineClassNatives turns off the _getClass intrinsic for both C1 and C2. 4) -XX:-InlineUnsafeOps turns off the _loadfence, _storefence, _fullfence, _compareAndSwapObject, _compareAndSwapLong, and _compareAndSwapInt intrinsics for both C1 and C2. Webrev: http://cr.openjdk.java.net/~zmajo/8132457/webrev.00/ Testing: - JPRT run, testset hotspot, all tests pass; - all JTREG tests in hotspot/test, all tests pass; - local testing of DisableIntrinsic with both C1 and C2. Thank you and best regards, Zoltan From roland.westrelin at oracle.com Fri Jul 31 10:20:38 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 31 Jul 2015 12:20:38 +0200 Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer In-Reply-To: <5581C465.7070803@oracle.com> References: <55789088.5050405@oracle.com> <9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com> <5580738D.9070900@oracle.com> <5581C465.7070803@oracle.com> Message-ID: Here is a new webrev for this that takes Vladimir?s comments into account: http://cr.openjdk.java.net/~roland/8080289/webrev.01/ Roland. > On Jun 17, 2015, at 9:03 PM, Vladimir Kozlov wrote: > > > http://gee.cs.oswego.edu/dl/jmm/cookbook.html > > > > it?s allowed to reorder normal stores with normal stores > > If we can guarantee that all passed stores are normal (I assume we will have barriers otherwise in between) then I agree. I am not sure why we didn't do it before, there could be a counterargument for that which I don't remember. To make sure, ask John. > > >> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal(). > > > > Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores? > > Yes, it may simplify the code of Ideal. You may still need a loop to look for previous store which could be eliminated but you don't need to have 'prev'. As soon you remove one node, you exit Ideal returning 'this' and it will be called again so you can search for another previous store. > > >> BOTTOM (all slices) Phi? > > > > Wouldn?t there be a MergeMem between the store and the Phi then? > > Yes. Okay, you can keep the check as assert we will see if Nightly testing hit it it or not. > > Thanks, > Vladimir > > On 6/17/15 1:35 AM, Roland Westrelin wrote: >> >>>> That?s what I think the code does. That is if you have: >>>> >>>> st1->st2->st3->st4 >>> >>> I assume st4 is first store and st1 is last. Right? >> >> Program order is: >> st4 >> st3 >> st2 >> st1 >> >>>> and st3 is redundant with st1, the chain should become: >>>> >>>> st1->st2->st4 >>> >>> I am not sure it is correct optimization. On some machines result of st3 could be visible before result of st2. And you change it. >>> I am suggesting not do that. Do you need that for stores move from loop? >> >> It?s not required. It cleans up the graph in some cases like this: >> >> static void test_after_5(int idx) { >> for (int i = 0; i < 1000; i++) { >> array[idx] = i; >> array[idx+1] = i; >> array[idx+2] = i; >> array[idx+3] = i; >> array[idx+4] = i; >> array[idx+5] = i; >> } >> } >> >> all stores are sunk out of the loop but that happens after iteration splitting and so there are multiple redundant copies of each store that are not collapsed. >> >> This said, we currently reorder the stores even if it?s less aggressive than what I?m proposing. With program: >> >> st4 >> st3 >> st2 >> st1 >> >> If st1, st3 and st4 are on one slice and st2 is on another and if st1 and st3 store to the same address we optimize st3 out: >> >> st4 >> st2 >> st1 >> >> so st3=st1 may only be visible after st2. >> >> Also, the way I read the first table in this: >> >> http://gee.cs.oswego.edu/dl/jmm/cookbook.html >> >> it?s allowed to reorder normal stores with normal stores >> >>>> so we need to change the memory input of st2 when we find st3 can be removed. In the code, at that point, this=st1, st = st3 and prev=st2. >>> >>> In this case the code should be: >>> >>> if (st->in(MemNode::Address)->eqv_uncast(address) && >>> ... >>> } else { >>> prev = st; >>> } >>> >>> to update 'prev' with 'st' only if 'st' is not removed. >> >> You?re right. >> >>> Also, I think, st->in(MemNode::Memory) could be put in local var since it is used several times in this code. >>> >>>> >>>>> You need to set improved = true since 'this' will not change. We also use 'make_progress' variable's name in such cases. >>>> >>>> In the example above, if we remove st2, we modify this, right? >>> >>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal(). >> >> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores? >> >>>> We?ll find a path from the head that doesn?t go through the store and that exits the loop. What the comment doesn?t say is that with example 2 below: >>>> >>>> for (int i = 0; i < 10; i++) { >>>> if (some_condition) { >>>> uncommon_trap(); >>>> } >>>> array[idx] = 999; >>>> } >>>> >>>> my verification code would find the early exit as well. >>>> >>>> It?s verification code only because if we have example 1 above, then we have a memory Phi to merge both branches of the if. So the pattern that we look for in PhaseIdealLoop::try_move_store_before_loop() won?t match: the loop?s memory Phi backedge won?t be the store. If we have example 2 above, then the loop?s memory Phi doesn?t have a single memory use. So I don?t think we need to check that the store post dominate the loop head in product. That?s my reasoning anyway and the verification code is there to verify it. >>> >>> I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which reduce cases only to one store to this address in the loop - good. >>> >>> How you check in product VM that there are no other exists from a loop (your example 2)? Is it guarded by mem->outcnt() == 1 check? >> >> Yes. >> >>>>> Should you check phi == NULL instead of assert to make sure you have only one Phi node? >>>> >>>> Can there be more than one memory Phi for a particular slice that has in(0) == n_loop->_head? >>>> I would have expected that to be impossible. >>> >>> BOTTOM (all slices) Phi? >> >> Wouldn?t there be a MergeMem between the store and the Phi then? >> >> For the record, the webrev: >> >> http://cr.openjdk.java.net/~roland/8080289/webrev.00/ >> >> Roland. >> From adinn at redhat.com Fri Jul 31 10:33:42 2015 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 31 Jul 2015 11:33:42 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55BA78B7.7030300@oracle.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> Message-ID: <55BB4F06.7090603@redhat.com> Hi Vladimir, Thank you very much for the feedback. On 30/07/15 20:19, Vladimir Kozlov wrote: > First, thank you for extensive comments - they help. They were a necessity for me as much as anyone else :-) > Second, does it really help? I don't see any numbers. Hmm, running on prejudice, maybe try science? good idea! I will obtain numbers. > Next pattern (returning NULL or false) is repeater several times. May be > make separate function for it. > > > + if (opcode != Op_MemBarRelease) { > + if (opcode != Op_MemBarCPUOrder) > + return NULL; > + MemBarNode *parent = parent_membar(leading); > + if (!parent || !parent->Opcode() == Op_MemBarRelease) > + return NULL; > + } Yes, that's a very good idea. > Can you replace retry_feed: with while(cond) ? Of course. > Next could be one line Node *x = training > + Node *x; > + // the Mem feed to the membar should be a merge > + x = trailing->in(TypeFunc::Memory); Yes, I agree. I'll post a revised webrev with the above fixed after I obtain some performance figures. regards, Andrew Dinn ----------- From roland.westrelin at oracle.com Fri Jul 31 11:42:00 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Fri, 31 Jul 2015 13:42:00 +0200 Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method In-Reply-To: <1AB40291-4B32-4B33-8621-183585854169@oracle.com> References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> <55B7BACE.9080100@oracle.com> <1AB40291-4B32-4B33-8621-183585854169@oracle.com> Message-ID: > thank you for your comments. I have uploaded a revised webrev to http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are inlined below. That looks good to me. In subnode.cpp comment: 1439 //---------------------------------rel----------------------------------------- doesn?t match function name. Roland. From michael.haupt at oracle.com Fri Jul 31 11:54:38 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Fri, 31 Jul 2015 13:54:38 +0200 Subject: RFR(M): 8004073: Implement C2 Ideal node specific dump() method In-Reply-To: References: <0FE82414-5869-480D-B56A-1F3677A569ED@oracle.com> <55B7BACE.9080100@oracle.com> <1AB40291-4B32-4B33-8621-183585854169@oracle.com> Message-ID: <418CAD45-6BFA-43C1-8CCD-AD16CAA4227E@oracle.com> Hi Roland, > Am 31.07.2015 um 13:42 schrieb Roland Westrelin : >> thank you for your comments. I have uploaded a revised webrev to http://cr.openjdk.java.net/~mhaupt/8004073/webrev.01; some replies are inlined below. > > That looks good to me. > > In subnode.cpp comment: > 1439 //---------------------------------rel----------------------------------------- > doesn?t match function name. thank you very much. This was the case also in several other places; I've fixed these occurrences. Best, Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | LangTools Team | Nashorn Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Fri Jul 31 13:32:45 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 31 Jul 2015 14:32:45 +0100 Subject: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55BB4F06.7090603@redhat.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55BB4F06.7090603@redhat.com> Message-ID: <55BB78FD.3050002@redhat.com> Hi, On 07/31/2015 11:33 AM, Andrew Dinn wrote: > On 30/07/15 20:19, Vladimir Kozlov wrote: >> First, thank you for extensive comments - they help. > > They were a necessity for me as much as anyone else :-) > >> Second, does it really help? I don't see any numbers. > > Hmm, running on prejudice, maybe try science? good idea! > > I will obtain numbers. That's not easy because AArch64 is a specification, not an implementation. Going the route of load acquire/store release may not help much on some chips, but conversations I've had with ARM architects tell me that they should be preferred. In particular, store release for volatiles means that we can avoid a full fence. Current status: on one out-of-order implementation of AArch64 I see no difference between "stlr" and "dmb st; str ; dmb ish". On another, this time an in-order processor, "stlr" is 40% faster. This is just the execution for a few instructions, like this: .L3: ldr w2, [x1] add w2, w2, 1 str w2, [x1] stlr x4, [x3] subs x0, x0, #1 bne .L3 versus this: .L3: ldr w2, [x1] add w2, w2, 1 str w2, [x1] dmb st; str x4, [x3]; dmb ish subs x0, x0, #1 bne .L3 The guidelines from ARM are that we should optimize for the simpler in-order processors; it won't help the out-of-order parts very much, but it won't hurt either. Andrew. From vladimir.kozlov at oracle.com Fri Jul 31 15:18:12 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 31 Jul 2015 08:18:12 -0700 Subject: [9] RFR(S): 8132457: Unify command-line flags controlling the usage of compiler intrinsics In-Reply-To: <55BB479E.8000402@oracle.com> References: <55BB479E.8000402@oracle.com> Message-ID: <55BB91B4.805@oracle.com> Very nice cleanup. Thank you, Zoltan. Vladimir On 7/31/15 3:02 AM, Zolt?n Maj? wrote: > Hi, > > > please review the following patch for JDK-8132457. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8132457 > > Problem: There are four cases when flags controlling intrinsics for C1 and C2 behave inconsistently: > 1) The DisableIntrinsic flag is C2-specific. > 2) The InlineNatives flag disables most but not all intrinsics. Some intrinsics (implemented by both C1 and C2) are > turned off by -XX:-InlineNatives for C1 but are left on for C2. > 3) The _getClass intrinsic (implemented by both C1 and C2) is turned off by -XX:-InlineClassNatives for C1 and is left > unaffected by C2. > 4) The _loadfence, _storefence, _fullfence, _compareAndSwapObject, _compareAndSwapLong, and _compareAndSwapInt > intrinsics are turned off by -XX:-InlineUnsafeOps for C2 and are unaffected by C1. > > > Solution: Unify command-line flags controlling intrinsic processing. Processing of command-line flags is now done only > in vmIntrinsics::is_disabled_by_flags and there is no compiler-specific flag processing. > > The inconsistencies listed in the problem description were addressed the following way: > 1) Extend the C1 compiler to consider the DisableIntrinsic flag when checking if an intrinsic is available. > 2) -XX:-InlineNatives turns off most intrinsics but leaves on some intrinsics (the same set of intrinsics are left on > for both C1 and C2). > 3) -XX:-InlineClassNatives turns off the _getClass intrinsic for both C1 and C2. > 4) -XX:-InlineUnsafeOps turns off the _loadfence, _storefence, _fullfence, _compareAndSwapObject, _compareAndSwapLong, > and _compareAndSwapInt intrinsics for both C1 and C2. > > Webrev: > http://cr.openjdk.java.net/~zmajo/8132457/webrev.00/ > > Testing: > - JPRT run, testset hotspot, all tests pass; > - all JTREG tests in hotspot/test, all tests pass; > - local testing of DisableIntrinsic with both C1 and C2. > > Thank you and best regards, > > > Zoltan > From vladimir.kozlov at oracle.com Fri Jul 31 15:20:27 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 31 Jul 2015 08:20:27 -0700 Subject: RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer In-Reply-To: References: <55789088.5050405@oracle.com> <9DE849F6-9E4A-4649-A174-793726D67FD5@oracle.com> <5580738D.9070900@oracle.com> <5581C465.7070803@oracle.com> Message-ID: <55BB923B.1050900@oracle.com> This looks better. Reviewed. thanks, Vladimir On 7/31/15 3:20 AM, Roland Westrelin wrote: > Here is a new webrev for this that takes Vladimir?s comments into account: > > http://cr.openjdk.java.net/~roland/8080289/webrev.01/ > > Roland. > > >> On Jun 17, 2015, at 9:03 PM, Vladimir Kozlov wrote: >> >>> http://gee.cs.oswego.edu/dl/jmm/cookbook.html >>> >>> it?s allowed to reorder normal stores with normal stores >> >> If we can guarantee that all passed stores are normal (I assume we will have barriers otherwise in between) then I agree. I am not sure why we didn't do it before, there could be a counterargument for that which I don't remember. To make sure, ask John. >> >>>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal(). >>> >>> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores? >> >> Yes, it may simplify the code of Ideal. You may still need a loop to look for previous store which could be eliminated but you don't need to have 'prev'. As soon you remove one node, you exit Ideal returning 'this' and it will be called again so you can search for another previous store. >> >>>> BOTTOM (all slices) Phi? >>> >>> Wouldn?t there be a MergeMem between the store and the Phi then? >> >> Yes. Okay, you can keep the check as assert we will see if Nightly testing hit it it or not. >> >> Thanks, >> Vladimir >> >> On 6/17/15 1:35 AM, Roland Westrelin wrote: >>> >>>>> That?s what I think the code does. That is if you have: >>>>> >>>>> st1->st2->st3->st4 >>>> >>>> I assume st4 is first store and st1 is last. Right? >>> >>> Program order is: >>> st4 >>> st3 >>> st2 >>> st1 >>> >>>>> and st3 is redundant with st1, the chain should become: >>>>> >>>>> st1->st2->st4 >>>> >>>> I am not sure it is correct optimization. On some machines result of st3 could be visible before result of st2. And you change it. >>>> I am suggesting not do that. Do you need that for stores move from loop? >>> >>> It?s not required. It cleans up the graph in some cases like this: >>> >>> static void test_after_5(int idx) { >>> for (int i = 0; i < 1000; i++) { >>> array[idx] = i; >>> array[idx+1] = i; >>> array[idx+2] = i; >>> array[idx+3] = i; >>> array[idx+4] = i; >>> array[idx+5] = i; >>> } >>> } >>> >>> all stores are sunk out of the loop but that happens after iteration splitting and so there are multiple redundant copies of each store that are not collapsed. >>> >>> This said, we currently reorder the stores even if it?s less aggressive than what I?m proposing. With program: >>> >>> st4 >>> st3 >>> st2 >>> st1 >>> >>> If st1, st3 and st4 are on one slice and st2 is on another and if st1 and st3 store to the same address we optimize st3 out: >>> >>> st4 >>> st2 >>> st1 >>> >>> so st3=st1 may only be visible after st2. >>> >>> Also, the way I read the first table in this: >>> >>> http://gee.cs.oswego.edu/dl/jmm/cookbook.html >>> >>> it?s allowed to reorder normal stores with normal stores >>> >>>>> so we need to change the memory input of st2 when we find st3 can be removed. In the code, at that point, this=st1, st = st3 and prev=st2. >>>> >>>> In this case the code should be: >>>> >>>> if (st->in(MemNode::Address)->eqv_uncast(address) && >>>> ... >>>> } else { >>>> prev = st; >>>> } >>>> >>>> to update 'prev' with 'st' only if 'st' is not removed. >>> >>> You?re right. >>> >>>> Also, I think, st->in(MemNode::Memory) could be put in local var since it is used several times in this code. >>>> >>>>> >>>>>> You need to set improved = true since 'this' will not change. We also use 'make_progress' variable's name in such cases. >>>>> >>>>> In the example above, if we remove st2, we modify this, right? >>>> >>>> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal(). >>> >>> Sorry, I don?t understand this. Are you saying there?s no need for a loop at all? Or are you saying that as soon as there?s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores? >>> >>>>> We?ll find a path from the head that doesn?t go through the store and that exits the loop. What the comment doesn?t say is that with example 2 below: >>>>> >>>>> for (int i = 0; i < 10; i++) { >>>>> if (some_condition) { >>>>> uncommon_trap(); >>>>> } >>>>> array[idx] = 999; >>>>> } >>>>> >>>>> my verification code would find the early exit as well. >>>>> >>>>> It?s verification code only because if we have example 1 above, then we have a memory Phi to merge both branches of the if. So the pattern that we look for in PhaseIdealLoop::try_move_store_before_loop() won?t match: the loop?s memory Phi backedge won?t be the store. If we have example 2 above, then the loop?s memory Phi doesn?t have a single memory use. So I don?t think we need to check that the store post dominate the loop head in product. That?s my reasoning anyway and the verification code is there to verify it. >>>> >>>> I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which reduce cases only to one store to this address in the loop - good. >>>> >>>> How you check in product VM that there are no other exists from a loop (your example 2)? Is it guarded by mem->outcnt() == 1 check? >>> >>> Yes. >>> >>>>>> Should you check phi == NULL instead of assert to make sure you have only one Phi node? >>>>> >>>>> Can there be more than one memory Phi for a particular slice that has in(0) == n_loop->_head? >>>>> I would have expected that to be impossible. >>>> >>>> BOTTOM (all slices) Phi? >>> >>> Wouldn?t there be a MergeMem between the store and the Phi then? >>> >>> For the record, the webrev: >>> >>> http://cr.openjdk.java.net/~roland/8080289/webrev.00/ >>> >>> Roland. >>> > From adinn at redhat.com Fri Jul 31 16:17:13 2015 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 31 Jul 2015 17:17:13 +0100 Subject: [aarch64-port-dev ] [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55B72D38.2040705@oracle.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> <55B66DE9.2070000@oracle.com> <55B72D38.2040705@oracle.com> Message-ID: <55BB9F89.3010203@redhat.com> On 28/07/15 08:20, Tobias Hartmann wrote: >> On 7/27/2015 7:36 AM, Tobias Hartmann wrote: >>> Here is the new webrev: >>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ I'm getting a problem building the latest AArch64 hs-comp because of bailout being undefined. : In member function 'virtual void ArrayCopyStub::emit_code(LIR_Assembler*)': /home/adinn/openjdk/hs-comp/hotspot/src/cpu/aarch64/vm/c1_CodeStubs_aarch64.cpp\ :337:39: error: 'bailout' was not declared in this scope bailout("trampoline stub overflow"); I cannot find a declaration for bailout anywhere? Is this something which escaped from the lab too early or is it just that the AArch64 part of the patch got omitted? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From adinn at redhat.com Fri Jul 31 17:06:55 2015 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 31 Jul 2015 18:06:55 +0100 Subject: [aarch64-port-dev ] [9] RFR(S): 8130309: need to bailout cleanly if CompiledStaticCall::emit_to_interp_stub fails when codecache is out of space In-Reply-To: <55BB9F89.3010203@redhat.com> References: <55AE4BC2.1090104@oracle.com> <55B0CEBE.4010100@oracle.com> <81FF1EDF-40C6-4BC6-A3C2-CB67571452C9@oracle.com> <55B22191.9070904@oracle.com> <77322805-2B38-4B85-98EF-9ABE45A31BDA@oracle.com> <55B641DB.1010608@oracle.com> <55B66DE9.2070000@oracle.com> <55B72D38.2040705@oracle.com> <55BB9F89.3010203@redhat.com> Message-ID: <55BBAB2F.1080100@redhat.com> On 31/07/15 17:17, Andrew Dinn wrote: > On 28/07/15 08:20, Tobias Hartmann wrote: >>> On 7/27/2015 7:36 AM, Tobias Hartmann wrote: >>>> Here is the new webrev: >>>> http://cr.openjdk.java.net/~thartmann/8130309/webrev.03/ > > I'm getting a problem building the latest AArch64 hs-comp because of > bailout being undefined. > > : In member function 'virtual void > ArrayCopyStub::emit_code(LIR_Assembler*)': > /home/adinn/openjdk/hs-comp/hotspot/src/cpu/aarch64/vm/c1_CodeStubs_aarch64.cpp\ > :337:39: error: 'bailout' was not declared in this scope > bailout("trampoline stub overflow"); Ok, the problem is that the call is happening inside ArrayCopyStub::emit_code(LIR_Assembler* ce) so it actually needs to be ce->bailout("trampoline stub overflow"); However, that won't work because bailout is private to LIR_Assembler. So we also need a friend declaration in c1_LIRAssembler_aarch64.hpp There is also a problem with the change to MacroAssembler::trampoline_call pp:688:10: error: invalid conversion from 'unsigned int' to 'address {aka unsigned char*}' [-fpermissive] return start_offset; I believe the return value probably ought to be pc() -- the value is not used as far as I can see but it needs to be a non-NULL address to indicate that everything worked ok. I will raise a JIRA for this and post a webrev asap. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From ysr1729 at gmail.com Fri Jul 31 18:19:02 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 31 Jul 2015 11:19:02 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) Message-ID: Hello GC and Compiler teams! One of our services that runs with several thousand threads recently noticed an increase in safepoint stop times, but not gc times, upon transitioning to JDK 8. Further investigation revealed that most of the delta was related to the so-called pre-gc/vmop "cleanup" phase when various book-keeping activities are performed, and more specifically in the portion that walks java thread stacks single-threaded (!) and updates the hotness counters for the active nmethods. This code appears to be new to JDK 8 (in jdk 7 one would walk the stacks only during code cache sweeps). I have two questions: (1) has anyone else (typically, I'd expect applications with many hundreds or thousands of threads) noticed this regression? (2) Can we do better, for example, by: (a) doing these updates by walking thread stacks in multiple worker threads in parallel, or best of all: (b) doing these updates when we walk the thread stacks during GC, and skipping this phase entirely for non-GC safepoints (with attendant loss in frequency of this update in low GC frequency scenarios). It seems kind of silly to do GC's with many multiple worker threads, but do these thread stack walks single-threaded when it is embarrasingly parallel (one could predicate the parallelization based on the measured stack sizes and thread population, if there was concern on the ovrhead of activating and deactivating the thread gangs for the work). A followup question: Any guesses as to how code cache sweep/eviction quality might be compromised if one were to dispense with these hotness updates entirely (or at a much reduced frequency), as a temporary workaround to the performance problem? Thoughts/Comments? In particular, has this issue been addressed perhaps in newer JVMs? Thanks for any comments, feedback, pointers! -- ramki PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7 (first, where this isn't done) vs JDK 8 (where this is done) with a program that has a few thousands of threads: JDK 7: .. 2827.308: [sweeping nmethods, 0.0000020 secs] 2828.679: [sweeping nmethods, 0.0000030 secs] 2829.984: [sweeping nmethods, 0.0000030 secs] 2830.956: [sweeping nmethods, 0.0000030 secs] .. JDK 8: .. 7368.634: [mark nmethods, 0.0177030 secs] 7369.587: [mark nmethods, 0.0178305 secs] 7370.479: [mark nmethods, 0.0180260 secs] 7371.503: [mark nmethods, 0.0186494 secs] .. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Jul 31 18:31:48 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 31 Jul 2015 14:31:48 -0400 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: Message-ID: Ramki, are you running tiered compilation? sent from my phone On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" wrote: > > Hello GC and Compiler teams! > > One of our services that runs with several thousand threads recently > noticed an increase > in safepoint stop times, but not gc times, upon transitioning to JDK 8. > > Further investigation revealed that most of the delta was related to the > so-called > pre-gc/vmop "cleanup" phase when various book-keeping activities are > performed, > and more specifically in the portion that walks java thread stacks > single-threaded (!) > and updates the hotness counters for the active nmethods. This code > appears to > be new to JDK 8 (in jdk 7 one would walk the stacks only during code cache > sweeps). > > I have two questions: > (1) has anyone else (typically, I'd expect applications with many hundreds > or thousands of threads) > noticed this regression? > (2) Can we do better, for example, by: > (a) doing these updates by walking thread stacks in multiple worker > threads in parallel, or best of all: > (b) doing these updates when we walk the thread stacks during GC, > and skipping this phase entirely > for non-GC safepoints (with attendant loss in frequency of > this update in low GC frequency > scenarios). > > It seems kind of silly to do GC's with many multiple worker threads, but > do these thread stack > walks single-threaded when it is embarrasingly parallel (one could > predicate the parallelization > based on the measured stack sizes and thread population, if there was > concern on the ovrhead of > activating and deactivating the thread gangs for the work). > > A followup question: Any guesses as to how code cache sweep/eviction > quality might be compromised if one > were to dispense with these hotness updates entirely (or at a much reduced > frequency), as a temporary > workaround to the performance problem? > > Thoughts/Comments? In particular, has this issue been addressed perhaps in > newer JVMs? > > Thanks for any comments, feedback, pointers! > -- ramki > > PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7 > (first, where this isn't done) > vs JDK 8 (where this is done) with a program that has a few thousands of > threads: > > > > JDK 7: > .. > 2827.308: [sweeping nmethods, 0.0000020 secs] > 2828.679: [sweeping nmethods, 0.0000030 secs] > 2829.984: [sweeping nmethods, 0.0000030 secs] > 2830.956: [sweeping nmethods, 0.0000030 secs] > .. > > JDK 8: > .. > 7368.634: [mark nmethods, 0.0177030 secs] > 7369.587: [mark nmethods, 0.0178305 secs] > 7370.479: [mark nmethods, 0.0180260 secs] > 7371.503: [mark nmethods, 0.0186494 secs] > .. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysr1729 at gmail.com Fri Jul 31 18:33:14 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 31 Jul 2015 11:33:14 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: Message-ID: Yes. On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich wrote: > Ramki, are you running tiered compilation? > > sent from my phone > On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" wrote: > >> >> Hello GC and Compiler teams! >> >> One of our services that runs with several thousand threads recently >> noticed an increase >> in safepoint stop times, but not gc times, upon transitioning to JDK 8. >> >> Further investigation revealed that most of the delta was related to the >> so-called >> pre-gc/vmop "cleanup" phase when various book-keeping activities are >> performed, >> and more specifically in the portion that walks java thread stacks >> single-threaded (!) >> and updates the hotness counters for the active nmethods. This code >> appears to >> be new to JDK 8 (in jdk 7 one would walk the stacks only during code >> cache sweeps). >> >> I have two questions: >> (1) has anyone else (typically, I'd expect applications with many >> hundreds or thousands of threads) >> noticed this regression? >> (2) Can we do better, for example, by: >> (a) doing these updates by walking thread stacks in multiple worker >> threads in parallel, or best of all: >> (b) doing these updates when we walk the thread stacks during GC, >> and skipping this phase entirely >> for non-GC safepoints (with attendant loss in frequency of >> this update in low GC frequency >> scenarios). >> >> It seems kind of silly to do GC's with many multiple worker threads, but >> do these thread stack >> walks single-threaded when it is embarrasingly parallel (one could >> predicate the parallelization >> based on the measured stack sizes and thread population, if there was >> concern on the ovrhead of >> activating and deactivating the thread gangs for the work). >> >> A followup question: Any guesses as to how code cache sweep/eviction >> quality might be compromised if one >> were to dispense with these hotness updates entirely (or at a much >> reduced frequency), as a temporary >> workaround to the performance problem? >> >> Thoughts/Comments? In particular, has this issue been addressed perhaps >> in newer JVMs? >> >> Thanks for any comments, feedback, pointers! >> -- ramki >> >> PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7 >> (first, where this isn't done) >> vs JDK 8 (where this is done) with a program that has a few thousands of >> threads: >> >> >> >> JDK 7: >> .. >> 2827.308: [sweeping nmethods, 0.0000020 secs] >> 2828.679: [sweeping nmethods, 0.0000030 secs] >> 2829.984: [sweeping nmethods, 0.0000030 secs] >> 2830.956: [sweeping nmethods, 0.0000030 secs] >> .. >> >> JDK 8: >> .. >> 7368.634: [mark nmethods, 0.0177030 secs] >> 7369.587: [mark nmethods, 0.0178305 secs] >> 7370.479: [mark nmethods, 0.0180260 secs] >> 7371.503: [mark nmethods, 0.0186494 secs] >> .. >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Jul 31 18:43:12 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 31 Jul 2015 11:43:12 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: Message-ID: <55BBC1C0.3030709@oracle.com> Hi Ramki, Did you fill up CodeCache? It start scanning aggressive only with full CodeCache: // Force stack scanning if there is only 10% free space in the code cache. // We force stack scanning only non-profiled code heap gets full, since critical // allocation go to the non-profiled heap and we must be make sure that there is // enough space. double free_percent = 1 / CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; if (free_percent <= StartAggressiveSweepingAt) { do_stack_scanning(); } Vladimir On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: > > Yes. > > > On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich > wrote: > > Ramki, are you running tiered compilation? > > sent from my phone > > On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" > wrote: > > > Hello GC and Compiler teams! > > One of our services that runs with several thousand threads > recently noticed an increase > in safepoint stop times, but not gc times, upon transitioning to > JDK 8. > > Further investigation revealed that most of the delta was > related to the so-called > pre-gc/vmop "cleanup" phase when various book-keeping activities > are performed, > and more specifically in the portion that walks java thread > stacks single-threaded (!) > and updates the hotness counters for the active nmethods. This > code appears to > be new to JDK 8 (in jdk 7 one would walk the stacks only during > code cache sweeps). > > I have two questions: > (1) has anyone else (typically, I'd expect applications with > many hundreds or thousands of threads) > noticed this regression? > (2) Can we do better, for example, by: > (a) doing these updates by walking thread stacks in > multiple worker threads in parallel, or best of all: > (b) doing these updates when we walk the thread stacks > during GC, and skipping this phase entirely > for non-GC safepoints (with attendant loss in > frequency of this update in low GC frequency > scenarios). > > It seems kind of silly to do GC's with many multiple worker > threads, but do these thread stack > walks single-threaded when it is embarrasingly parallel (one > could predicate the parallelization > based on the measured stack sizes and thread population, if > there was concern on the ovrhead of > activating and deactivating the thread gangs for the work). > > A followup question: Any guesses as to how code cache > sweep/eviction quality might be compromised if one > were to dispense with these hotness updates entirely (or at a > much reduced frequency), as a temporary > workaround to the performance problem? > > Thoughts/Comments? In particular, has this issue been addressed > perhaps in newer JVMs? > > Thanks for any comments, feedback, pointers! > -- ramki > > PS: for comparison, here's data with +TraceSafepointCleanup from > JDK 7 (first, where this isn't done) > vs JDK 8 (where this is done) with a program that has a few > thousands of threads: > > > > JDK 7: > .. > 2827.308: [sweeping nmethods, 0.0000020 secs] > 2828.679: [sweeping nmethods, 0.0000030 secs] > 2829.984: [sweeping nmethods, 0.0000030 secs] > 2830.956: [sweeping nmethods, 0.0000030 secs] > .. > > JDK 8: > .. > 7368.634: [mark nmethods, 0.0177030 secs] > 7369.587: [mark nmethods, 0.0178305 secs] > 7370.479: [mark nmethods, 0.0180260 secs] > 7371.503: [mark nmethods, 0.0186494 secs] > .. > > From ysr1729 at gmail.com Fri Jul 31 18:48:53 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 31 Jul 2015 11:48:53 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: <55BBC1C0.3030709@oracle.com> References: <55BBC1C0.3030709@oracle.com> Message-ID: Hi Vladimir -- I noticed the increase even with Initial and Reserved set to the default of 240 MB, but actual usage much lower (less than a quarter). Look at this code path. Note that this is invoked at every safepoint (although it says "periodically" in the comment). In the mark_active_nmethods() method, there's a thread iteration in both branches of the if. I haven't checked to see which of the two was the culprit here, yet (if either). // Various cleaning tasks that should be done periodically at safepoints void SafepointSynchronize::do_cleanup_tasks() { .... { TraceTime t4("mark nmethods", TraceSafepointCleanupTime); NMethodSweeper::mark_active_nmethods(); } .. } void NMethodSweeper::mark_active_nmethods() { ... if (!sweep_in_progress()) { _seen = 0; _sweep_fractions_left = NmethodSweepFraction; _current = CodeCache::first_nmethod(); _traversals += 1; _total_time_this_sweep = Tickspan(); if (PrintMethodFlushing) { tty->print_cr("### Sweep: stack traversal %d", _traversals); } Threads::nmethods_do(&mark_activation_closure); } else { // Only set hotness counter Threads::nmethods_do(&set_hotness_closure); } OrderAccess::storestore(); } On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov < vladimir.kozlov at oracle.com> wrote: > Hi Ramki, > > Did you fill up CodeCache? It start scanning aggressive only with full > CodeCache: > > // Force stack scanning if there is only 10% free space in the code > cache. > // We force stack scanning only non-profiled code heap gets full, since > critical > // allocation go to the non-profiled heap and we must be make sure that > there is > // enough space. > double free_percent = 1 / > CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; > if (free_percent <= StartAggressiveSweepingAt) { > do_stack_scanning(); > } > > Vladimir > > On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: > >> >> Yes. >> >> >> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich > > wrote: >> >> Ramki, are you running tiered compilation? >> >> sent from my phone >> >> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" > > wrote: >> >> >> Hello GC and Compiler teams! >> >> One of our services that runs with several thousand threads >> recently noticed an increase >> in safepoint stop times, but not gc times, upon transitioning to >> JDK 8. >> >> Further investigation revealed that most of the delta was >> related to the so-called >> pre-gc/vmop "cleanup" phase when various book-keeping activities >> are performed, >> and more specifically in the portion that walks java thread >> stacks single-threaded (!) >> and updates the hotness counters for the active nmethods. This >> code appears to >> be new to JDK 8 (in jdk 7 one would walk the stacks only during >> code cache sweeps). >> >> I have two questions: >> (1) has anyone else (typically, I'd expect applications with >> many hundreds or thousands of threads) >> noticed this regression? >> (2) Can we do better, for example, by: >> (a) doing these updates by walking thread stacks in >> multiple worker threads in parallel, or best of all: >> (b) doing these updates when we walk the thread stacks >> during GC, and skipping this phase entirely >> for non-GC safepoints (with attendant loss in >> frequency of this update in low GC frequency >> scenarios). >> >> It seems kind of silly to do GC's with many multiple worker >> threads, but do these thread stack >> walks single-threaded when it is embarrasingly parallel (one >> could predicate the parallelization >> based on the measured stack sizes and thread population, if >> there was concern on the ovrhead of >> activating and deactivating the thread gangs for the work). >> >> A followup question: Any guesses as to how code cache >> sweep/eviction quality might be compromised if one >> were to dispense with these hotness updates entirely (or at a >> much reduced frequency), as a temporary >> workaround to the performance problem? >> >> Thoughts/Comments? In particular, has this issue been addressed >> perhaps in newer JVMs? >> >> Thanks for any comments, feedback, pointers! >> -- ramki >> >> PS: for comparison, here's data with +TraceSafepointCleanup from >> JDK 7 (first, where this isn't done) >> vs JDK 8 (where this is done) with a program that has a few >> thousands of threads: >> >> >> >> JDK 7: >> .. >> 2827.308: [sweeping nmethods, 0.0000020 secs] >> 2828.679: [sweeping nmethods, 0.0000030 secs] >> 2829.984: [sweeping nmethods, 0.0000030 secs] >> 2830.956: [sweeping nmethods, 0.0000030 secs] >> .. >> >> JDK 8: >> .. >> 7368.634: [mark nmethods, 0.0177030 secs] >> 7369.587: [mark nmethods, 0.0178305 secs] >> 7370.479: [mark nmethods, 0.0180260 secs] >> 7371.503: [mark nmethods, 0.0186494 secs] >> .. >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysr1729 at gmail.com Fri Jul 31 19:07:18 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 31 Jul 2015 12:07:18 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: <55BBC1C0.3030709@oracle.com> Message-ID: Hi Vladimir -- Here's a snapshot of the counters: sun.ci.codeCacheCapacity=251658240 sun.ci.codeCacheMaxCapacity=251658240 sun.ci.codeCacheMethodsReclaimedNum=3450 sun.ci.codeCacheSweepsTotalNum=58 sun.ci.codeCacheSweepsTotalTimeMillis=1111 sun.ci.codeCacheUsed=35888704 Notice that the code cache usage is less that 35 MB, for the 240 MB capacity, yet it seems we have had 58 sweeps already, and safepoint cleanup says: [mark nmethods, 0.0165062 secs] Even if the two closures do little or no work, the single-threaded walk over deep stacks of a thousand threads will cost time for applications with many threads, and this is now done at each safepoint irrespective of the sweeper activity as far as I can tell. It seems as if this work should be somehow rolled up (via a suitable injection) into GC's thread walks that are done in parallel, rather than doing this in a pre-GC phase (unless I am mssing some reason that the sequencing is necessary, which it doesn't seem to be here). -- ramki On Fri, Jul 31, 2015 at 11:48 AM, Srinivas Ramakrishna wrote: > Hi Vladimir -- > > I noticed the increase even with Initial and Reserved set to the default > of 240 MB, but actual usage much lower (less than a quarter). > > Look at this code path. Note that this is invoked at every safepoint > (although it says "periodically" in the comment). > In the mark_active_nmethods() method, there's a thread iteration in both > branches of the if. I haven't checked to > see which of the two was the culprit here, yet (if either). > > // Various cleaning tasks that should be done periodically at safepoints > > void SafepointSynchronize::do_cleanup_tasks() { > > .... > > { > > TraceTime t4("mark nmethods", TraceSafepointCleanupTime); > > NMethodSweeper::mark_active_nmethods(); > > } > > .. > > } > > > void NMethodSweeper::mark_active_nmethods() { > > ... > > if (!sweep_in_progress()) { > > _seen = 0; > > _sweep_fractions_left = NmethodSweepFraction; > > _current = CodeCache::first_nmethod(); > > _traversals += 1; > > _total_time_this_sweep = Tickspan(); > > > if (PrintMethodFlushing) { > > tty->print_cr("### Sweep: stack traversal %d", _traversals); > > } > > Threads::nmethods_do(&mark_activation_closure); > > > } else { > > // Only set hotness counter > > Threads::nmethods_do(&set_hotness_closure); > > } > > > OrderAccess::storestore(); > > } > > On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov < > vladimir.kozlov at oracle.com> wrote: > >> Hi Ramki, >> >> Did you fill up CodeCache? It start scanning aggressive only with full >> CodeCache: >> >> // Force stack scanning if there is only 10% free space in the code >> cache. >> // We force stack scanning only non-profiled code heap gets full, since >> critical >> // allocation go to the non-profiled heap and we must be make sure that >> there is >> // enough space. >> double free_percent = 1 / >> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >> if (free_percent <= StartAggressiveSweepingAt) { >> do_stack_scanning(); >> } >> >> Vladimir >> >> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >> >>> >>> Yes. >>> >>> >>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >> > wrote: >>> >>> Ramki, are you running tiered compilation? >>> >>> sent from my phone >>> >>> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >> > wrote: >>> >>> >>> Hello GC and Compiler teams! >>> >>> One of our services that runs with several thousand threads >>> recently noticed an increase >>> in safepoint stop times, but not gc times, upon transitioning to >>> JDK 8. >>> >>> Further investigation revealed that most of the delta was >>> related to the so-called >>> pre-gc/vmop "cleanup" phase when various book-keeping activities >>> are performed, >>> and more specifically in the portion that walks java thread >>> stacks single-threaded (!) >>> and updates the hotness counters for the active nmethods. This >>> code appears to >>> be new to JDK 8 (in jdk 7 one would walk the stacks only during >>> code cache sweeps). >>> >>> I have two questions: >>> (1) has anyone else (typically, I'd expect applications with >>> many hundreds or thousands of threads) >>> noticed this regression? >>> (2) Can we do better, for example, by: >>> (a) doing these updates by walking thread stacks in >>> multiple worker threads in parallel, or best of all: >>> (b) doing these updates when we walk the thread stacks >>> during GC, and skipping this phase entirely >>> for non-GC safepoints (with attendant loss in >>> frequency of this update in low GC frequency >>> scenarios). >>> >>> It seems kind of silly to do GC's with many multiple worker >>> threads, but do these thread stack >>> walks single-threaded when it is embarrasingly parallel (one >>> could predicate the parallelization >>> based on the measured stack sizes and thread population, if >>> there was concern on the ovrhead of >>> activating and deactivating the thread gangs for the work). >>> >>> A followup question: Any guesses as to how code cache >>> sweep/eviction quality might be compromised if one >>> were to dispense with these hotness updates entirely (or at a >>> much reduced frequency), as a temporary >>> workaround to the performance problem? >>> >>> Thoughts/Comments? In particular, has this issue been addressed >>> perhaps in newer JVMs? >>> >>> Thanks for any comments, feedback, pointers! >>> -- ramki >>> >>> PS: for comparison, here's data with +TraceSafepointCleanup from >>> JDK 7 (first, where this isn't done) >>> vs JDK 8 (where this is done) with a program that has a few >>> thousands of threads: >>> >>> >>> >>> JDK 7: >>> .. >>> 2827.308: [sweeping nmethods, 0.0000020 secs] >>> 2828.679: [sweeping nmethods, 0.0000030 secs] >>> 2829.984: [sweeping nmethods, 0.0000030 secs] >>> 2830.956: [sweeping nmethods, 0.0000030 secs] >>> .. >>> >>> JDK 8: >>> .. >>> 7368.634: [mark nmethods, 0.0177030 secs] >>> 7369.587: [mark nmethods, 0.0178305 secs] >>> 7370.479: [mark nmethods, 0.0180260 secs] >>> 7371.503: [mark nmethods, 0.0186494 secs] >>> .. >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysr1729 at gmail.com Fri Jul 31 19:08:50 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 31 Jul 2015 12:08:50 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: <55BBC1C0.3030709@oracle.com> Message-ID: BTW, i think some of those are counters that we added (a patch for which is attached to the openjdk ticket i opened a few months ago, i think)... -- ramki On Fri, Jul 31, 2015 at 12:07 PM, Srinivas Ramakrishna wrote: > Hi Vladimir -- > > > Here's a snapshot of the counters: > > sun.ci.codeCacheCapacity=251658240 > > sun.ci.codeCacheMaxCapacity=251658240 > > sun.ci.codeCacheMethodsReclaimedNum=3450 > > sun.ci.codeCacheSweepsTotalNum=58 > > sun.ci.codeCacheSweepsTotalTimeMillis=1111 > > sun.ci.codeCacheUsed=35888704 > > > Notice that the code cache usage is less that 35 MB, for the 240 MB > capacity, yet it seems we have had 58 sweeps already, and safepoint cleanup > says: > > [mark nmethods, 0.0165062 secs] > > Even if the two closures do little or no work, the single-threaded walk > over deep stacks of a thousand threads will cost time for applications with > many threads, and this is now done at each safepoint irrespective of the > sweeper activity as far as I can tell. It seems as if this work should be > somehow rolled up (via a suitable injection) into GC's thread walks that > are done in parallel, rather than doing this in a pre-GC phase (unless I am > mssing some reason that the sequencing is necessary, which it doesn't seem > to be here). > > -- ramki > > On Fri, Jul 31, 2015 at 11:48 AM, Srinivas Ramakrishna > wrote: > >> Hi Vladimir -- >> >> I noticed the increase even with Initial and Reserved set to the default >> of 240 MB, but actual usage much lower (less than a quarter). >> >> Look at this code path. Note that this is invoked at every safepoint >> (although it says "periodically" in the comment). >> In the mark_active_nmethods() method, there's a thread iteration in both >> branches of the if. I haven't checked to >> see which of the two was the culprit here, yet (if either). >> >> // Various cleaning tasks that should be done periodically at safepoints >> >> void SafepointSynchronize::do_cleanup_tasks() { >> >> .... >> >> { >> >> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >> >> NMethodSweeper::mark_active_nmethods(); >> >> } >> >> .. >> >> } >> >> >> void NMethodSweeper::mark_active_nmethods() { >> >> ... >> >> if (!sweep_in_progress()) { >> >> _seen = 0; >> >> _sweep_fractions_left = NmethodSweepFraction; >> >> _current = CodeCache::first_nmethod(); >> >> _traversals += 1; >> >> _total_time_this_sweep = Tickspan(); >> >> >> if (PrintMethodFlushing) { >> >> tty->print_cr("### Sweep: stack traversal %d", _traversals); >> >> } >> >> Threads::nmethods_do(&mark_activation_closure); >> >> >> } else { >> >> // Only set hotness counter >> >> Threads::nmethods_do(&set_hotness_closure); >> >> } >> >> >> OrderAccess::storestore(); >> >> } >> >> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com> wrote: >> >>> Hi Ramki, >>> >>> Did you fill up CodeCache? It start scanning aggressive only with full >>> CodeCache: >>> >>> // Force stack scanning if there is only 10% free space in the code >>> cache. >>> // We force stack scanning only non-profiled code heap gets full, >>> since critical >>> // allocation go to the non-profiled heap and we must be make sure >>> that there is >>> // enough space. >>> double free_percent = 1 / >>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >>> if (free_percent <= StartAggressiveSweepingAt) { >>> do_stack_scanning(); >>> } >>> >>> Vladimir >>> >>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >>> >>>> >>>> Yes. >>>> >>>> >>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >>> > wrote: >>>> >>>> Ramki, are you running tiered compilation? >>>> >>>> sent from my phone >>>> >>>> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >>> > wrote: >>>> >>>> >>>> Hello GC and Compiler teams! >>>> >>>> One of our services that runs with several thousand threads >>>> recently noticed an increase >>>> in safepoint stop times, but not gc times, upon transitioning to >>>> JDK 8. >>>> >>>> Further investigation revealed that most of the delta was >>>> related to the so-called >>>> pre-gc/vmop "cleanup" phase when various book-keeping activities >>>> are performed, >>>> and more specifically in the portion that walks java thread >>>> stacks single-threaded (!) >>>> and updates the hotness counters for the active nmethods. This >>>> code appears to >>>> be new to JDK 8 (in jdk 7 one would walk the stacks only during >>>> code cache sweeps). >>>> >>>> I have two questions: >>>> (1) has anyone else (typically, I'd expect applications with >>>> many hundreds or thousands of threads) >>>> noticed this regression? >>>> (2) Can we do better, for example, by: >>>> (a) doing these updates by walking thread stacks in >>>> multiple worker threads in parallel, or best of all: >>>> (b) doing these updates when we walk the thread stacks >>>> during GC, and skipping this phase entirely >>>> for non-GC safepoints (with attendant loss in >>>> frequency of this update in low GC frequency >>>> scenarios). >>>> >>>> It seems kind of silly to do GC's with many multiple worker >>>> threads, but do these thread stack >>>> walks single-threaded when it is embarrasingly parallel (one >>>> could predicate the parallelization >>>> based on the measured stack sizes and thread population, if >>>> there was concern on the ovrhead of >>>> activating and deactivating the thread gangs for the work). >>>> >>>> A followup question: Any guesses as to how code cache >>>> sweep/eviction quality might be compromised if one >>>> were to dispense with these hotness updates entirely (or at a >>>> much reduced frequency), as a temporary >>>> workaround to the performance problem? >>>> >>>> Thoughts/Comments? In particular, has this issue been addressed >>>> perhaps in newer JVMs? >>>> >>>> Thanks for any comments, feedback, pointers! >>>> -- ramki >>>> >>>> PS: for comparison, here's data with +TraceSafepointCleanup from >>>> JDK 7 (first, where this isn't done) >>>> vs JDK 8 (where this is done) with a program that has a few >>>> thousands of threads: >>>> >>>> >>>> >>>> JDK 7: >>>> .. >>>> 2827.308: [sweeping nmethods, 0.0000020 secs] >>>> 2828.679: [sweeping nmethods, 0.0000030 secs] >>>> 2829.984: [sweeping nmethods, 0.0000030 secs] >>>> 2830.956: [sweeping nmethods, 0.0000030 secs] >>>> .. >>>> >>>> JDK 8: >>>> .. >>>> 7368.634: [mark nmethods, 0.0177030 secs] >>>> 7369.587: [mark nmethods, 0.0178305 secs] >>>> 7370.479: [mark nmethods, 0.0180260 secs] >>>> 7371.503: [mark nmethods, 0.0186494 secs] >>>> .. >>>> >>>> >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Fri Jul 31 19:47:24 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 31 Jul 2015 19:47:24 +0000 Subject: 8132160 - support for AVX 512 call frames and stack management In-Reply-To: References: Message-ID: I have revised the webrev below with more EVEX based changes for 64-bit and 32-bit. Accordingly, I still need two reviewers. Vladimir, can you be first reviewer? Thanks, -Michael From: Berg, Michael C Sent: Wednesday, July 22, 2015 10:55 PM To: 'hotspot-compiler-dev at openjdk.java.net' Subject: RFR: 8132160 - support for AVX 512 call frames and stack management Hi Folks, I would like to contribute AVX 512 call frame and stack management changes. I need two reviewers to examine this patch and comment as needed: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8132160 webrev: http://cr.openjdk.java.net/~mcberg/8132160/webrev.01/ These changes simplify frame management on 32-bit and 64-bit systems which support EVEX and extend more complete frame save and restore functionality as well as stack management for calls, traps and explicit exception paths. These changes also move CPUID queries into the assembler object state and add more state rules to a large class of instructions while simplifying their use. Also added is support for vectorizing double precision sqrt which is available through the math library. Many generated stubs and internal functions also now have predicated mask management for EVEX added. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Jul 31 21:28:35 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 31 Jul 2015 14:28:35 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: <55BBC1C0.3030709@oracle.com> Message-ID: <55BBE883.1080308@oracle.com> Got it. Yes, it is issue with thousands java threads. You are the first pointing this problem. File bug on compiler. We will look what we can do. Most likely we need parallelize this work. Method's hotness is used only for UseCodeCacheFlushing. You can try to guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch it off. We need mark_as_seen_on_stack so leave it. Thanks, Vladimir On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: > Hi Vladimir -- > > I noticed the increase even with Initial and Reserved set to the default > of 240 MB, but actual usage much lower (less than a quarter). > > Look at this code path. Note that this is invoked at every safepoint > (although it says "periodically" in the comment). > In the mark_active_nmethods() method, there's a thread iteration in both > branches of the if. I haven't checked to > see which of the two was the culprit here, yet (if either). > > // Various cleaning tasks that should be done periodically at safepoints > > void SafepointSynchronize::do_cleanup_tasks() { > > .... > > { > > TraceTime t4("mark nmethods", TraceSafepointCleanupTime); > > NMethodSweeper::mark_active_nmethods(); > > } > > .. > > } > > > void NMethodSweeper::mark_active_nmethods() { > > ... > > if (!sweep_in_progress()) { > > _seen = 0; > > _sweep_fractions_left = NmethodSweepFraction; > > _current = CodeCache::first_nmethod(); > > _traversals += 1; > > _total_time_this_sweep = Tickspan(); > > > if (PrintMethodFlushing) { > > tty->print_cr("### Sweep: stack traversal %d", _traversals); > > } > > Threads::nmethods_do(&mark_activation_closure); > > > } else { > > // Only set hotness counter > > Threads::nmethods_do(&set_hotness_closure); > > } > > > OrderAccess::storestore(); > > } > > > On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov > > wrote: > > Hi Ramki, > > Did you fill up CodeCache? It start scanning aggressive only with > full CodeCache: > > // Force stack scanning if there is only 10% free space in the > code cache. > // We force stack scanning only non-profiled code heap gets full, > since critical > // allocation go to the non-profiled heap and we must be make > sure that there is > // enough space. > double free_percent = 1 / > CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; > if (free_percent <= StartAggressiveSweepingAt) { > do_stack_scanning(); > } > > Vladimir > > On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: > > > Yes. > > > On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich > > >> wrote: > > Ramki, are you running tiered compilation? > > sent from my phone > > On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" > > >> wrote: > > > Hello GC and Compiler teams! > > One of our services that runs with several thousand threads > recently noticed an increase > in safepoint stop times, but not gc times, upon > transitioning to > JDK 8. > > Further investigation revealed that most of the delta was > related to the so-called > pre-gc/vmop "cleanup" phase when various book-keeping > activities > are performed, > and more specifically in the portion that walks java thread > stacks single-threaded (!) > and updates the hotness counters for the active > nmethods. This > code appears to > be new to JDK 8 (in jdk 7 one would walk the stacks > only during > code cache sweeps). > > I have two questions: > (1) has anyone else (typically, I'd expect applications > with > many hundreds or thousands of threads) > noticed this regression? > (2) Can we do better, for example, by: > (a) doing these updates by walking thread stacks in > multiple worker threads in parallel, or best of all: > (b) doing these updates when we walk the thread > stacks > during GC, and skipping this phase entirely > for non-GC safepoints (with attendant loss in > frequency of this update in low GC frequency > scenarios). > > It seems kind of silly to do GC's with many multiple worker > threads, but do these thread stack > walks single-threaded when it is embarrasingly parallel > (one > could predicate the parallelization > based on the measured stack sizes and thread population, if > there was concern on the ovrhead of > activating and deactivating the thread gangs for the work). > > A followup question: Any guesses as to how code cache > sweep/eviction quality might be compromised if one > were to dispense with these hotness updates entirely > (or at a > much reduced frequency), as a temporary > workaround to the performance problem? > > Thoughts/Comments? In particular, has this issue been > addressed > perhaps in newer JVMs? > > Thanks for any comments, feedback, pointers! > -- ramki > > PS: for comparison, here's data with > +TraceSafepointCleanup from > JDK 7 (first, where this isn't done) > vs JDK 8 (where this is done) with a program that has a few > thousands of threads: > > > > JDK 7: > .. > 2827.308: [sweeping nmethods, 0.0000020 secs] > 2828.679: [sweeping nmethods, 0.0000030 secs] > 2829.984: [sweeping nmethods, 0.0000030 secs] > 2830.956: [sweeping nmethods, 0.0000030 secs] > .. > > JDK 8: > .. > 7368.634: [mark nmethods, 0.0177030 secs] > 7369.587: [mark nmethods, 0.0178305 secs] > 7370.479: [mark nmethods, 0.0180260 secs] > 7371.503: [mark nmethods, 0.0186494 secs] > .. > > > From ysr1729 at gmail.com Fri Jul 31 22:02:06 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 31 Jul 2015 15:02:06 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: <55BBE883.1080308@oracle.com> References: <55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com> Message-ID: OK, will do and add you as watcher; thanks Vladimir! (don't yet know if with tiered and a necessarily bounded, if large, code cache whether flushing will in fact eventually become necessary, wrt yr suggested temporary workaround.) Have a good weekend! -- ramki On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov wrote: > Got it. Yes, it is issue with thousands java threads. > You are the first pointing this problem. File bug on compiler. We will > look what we can do. Most likely we need parallelize this work. > > Method's hotness is used only for UseCodeCacheFlushing. You can try to > guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch > it off. > > We need mark_as_seen_on_stack so leave it. > > Thanks, > Vladimir > > > On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: > >> Hi Vladimir -- >> >> I noticed the increase even with Initial and Reserved set to the default >> of 240 MB, but actual usage much lower (less than a quarter). >> >> Look at this code path. Note that this is invoked at every safepoint >> (although it says "periodically" in the comment). >> In the mark_active_nmethods() method, there's a thread iteration in both >> branches of the if. I haven't checked to >> see which of the two was the culprit here, yet (if either). >> >> // Various cleaning tasks that should be done periodically at safepoints >> >> void SafepointSynchronize::do_cleanup_tasks() { >> >> .... >> >> { >> >> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >> >> NMethodSweeper::mark_active_nmethods(); >> >> } >> >> .. >> >> } >> >> >> void NMethodSweeper::mark_active_nmethods() { >> >> ... >> >> if (!sweep_in_progress()) { >> >> _seen = 0; >> >> _sweep_fractions_left = NmethodSweepFraction; >> >> _current = CodeCache::first_nmethod(); >> >> _traversals += 1; >> >> _total_time_this_sweep = Tickspan(); >> >> >> if (PrintMethodFlushing) { >> >> tty->print_cr("### Sweep: stack traversal %d", _traversals); >> >> } >> >> Threads::nmethods_do(&mark_activation_closure); >> >> >> } else { >> >> // Only set hotness counter >> >> Threads::nmethods_do(&set_hotness_closure); >> >> } >> >> >> OrderAccess::storestore(); >> >> } >> >> >> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >> > wrote: >> >> Hi Ramki, >> >> Did you fill up CodeCache? It start scanning aggressive only with >> full CodeCache: >> >> // Force stack scanning if there is only 10% free space in the >> code cache. >> // We force stack scanning only non-profiled code heap gets full, >> since critical >> // allocation go to the non-profiled heap and we must be make >> sure that there is >> // enough space. >> double free_percent = 1 / >> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >> if (free_percent <= StartAggressiveSweepingAt) { >> do_stack_scanning(); >> } >> >> Vladimir >> >> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >> >> >> Yes. >> >> >> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >> >> >> wrote: >> >> Ramki, are you running tiered compilation? >> >> sent from my phone >> >> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >> >> >> >> wrote: >> >> >> Hello GC and Compiler teams! >> >> One of our services that runs with several thousand >> threads >> recently noticed an increase >> in safepoint stop times, but not gc times, upon >> transitioning to >> JDK 8. >> >> Further investigation revealed that most of the delta was >> related to the so-called >> pre-gc/vmop "cleanup" phase when various book-keeping >> activities >> are performed, >> and more specifically in the portion that walks java >> thread >> stacks single-threaded (!) >> and updates the hotness counters for the active >> nmethods. This >> code appears to >> be new to JDK 8 (in jdk 7 one would walk the stacks >> only during >> code cache sweeps). >> >> I have two questions: >> (1) has anyone else (typically, I'd expect applications >> with >> many hundreds or thousands of threads) >> noticed this regression? >> (2) Can we do better, for example, by: >> (a) doing these updates by walking thread stacks >> in >> multiple worker threads in parallel, or best of all: >> (b) doing these updates when we walk the thread >> stacks >> during GC, and skipping this phase entirely >> for non-GC safepoints (with attendant loss >> in >> frequency of this update in low GC frequency >> scenarios). >> >> It seems kind of silly to do GC's with many multiple >> worker >> threads, but do these thread stack >> walks single-threaded when it is embarrasingly parallel >> (one >> could predicate the parallelization >> based on the measured stack sizes and thread population, >> if >> there was concern on the ovrhead of >> activating and deactivating the thread gangs for the >> work). >> >> A followup question: Any guesses as to how code cache >> sweep/eviction quality might be compromised if one >> were to dispense with these hotness updates entirely >> (or at a >> much reduced frequency), as a temporary >> workaround to the performance problem? >> >> Thoughts/Comments? In particular, has this issue been >> addressed >> perhaps in newer JVMs? >> >> Thanks for any comments, feedback, pointers! >> -- ramki >> >> PS: for comparison, here's data with >> +TraceSafepointCleanup from >> JDK 7 (first, where this isn't done) >> vs JDK 8 (where this is done) with a program that has a >> few >> thousands of threads: >> >> >> >> JDK 7: >> .. >> 2827.308: [sweeping nmethods, 0.0000020 secs] >> 2828.679: [sweeping nmethods, 0.0000030 secs] >> 2829.984: [sweeping nmethods, 0.0000030 secs] >> 2830.956: [sweeping nmethods, 0.0000030 secs] >> .. >> >> JDK 8: >> .. >> 7368.634: [mark nmethods, 0.0177030 secs] >> 7369.587: [mark nmethods, 0.0178305 secs] >> 7370.479: [mark nmethods, 0.0180260 secs] >> 7371.503: [mark nmethods, 0.0186494 secs] >> .. >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysr1729 at gmail.com Fri Jul 31 22:04:28 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 31 Jul 2015 15:04:28 -0700 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: <20150731234445.0000459c.ecki@zusammenkunft.net> References: <55BBC1C0.3030709@oracle.com> <20150731234445.0000459c.ecki@zusammenkunft.net> Message-ID: Hi Bernd -- It doesn't seem coupled to GC; here's an example snapshot: sun.ci.codeCacheCapacity=13041664 sun.ci.codeCacheMaxCapacity=50331648 sun.ci.codeCacheMethodsReclaimedNum=4281 sun.ci.codeCacheSweepsTotalNum=409 sun.ci.codeCacheSweepsTotalTimeMillis=1541 sun.ci.codeCacheUsed=10864704 sun.gc.collector.0.invocations=6319 sun.gc.collector.1.invocations=6 BTW, to a question of Vitaly's on this thread earlier, this code executes irrespective of whether you are running tiered or not (the above is from tiered explictly off -- earlier ones were with tiered on --, yet note the excessive time in the stack walk as part of this code in JDK 8): [mark nmethods, 0.0171828 secs] On a similar note (and to Vladimir's earlier note of turning off code cache flushing) and with the understanding that there were lots of changes related to tiered compilation and code cache flushing between 7 ad 8, turning on Tiered in 7 leads to the occasional (rare) long safepoint for the stack walks, but else not. And finally the same issue must exist in 9 as well, albeit based on code inspection, not running with JDK 9 yet. Have a good weekend! -- ramki On Fri, Jul 31, 2015 at 2:44 PM, Bernd Eckenfels wrote: > Am Fri, 31 Jul 2015 12:07:18 -0700 > schrieb Srinivas Ramakrishna : > > > sun.ci.codeCacheSweepsTotalNum=58 > ... > > Notice that the code cache usage is less that 35 MB, for the 240 MB > > capacity, yet it seems we have had 58 sweeps already > > I would also be interested in what causes this. Is this caused by > System.gc maybe? (we do see sweeps and decreasing code cache usage on > systems where no pressure should exist). > > Gruss > Bernd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Jul 31 22:08:14 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 31 Jul 2015 18:08:14 -0400 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: <55BBC1C0.3030709@oracle.com> <55BBE883.1080308@oracle.com> Message-ID: Ramki, are you actually seeing better peak perf with tiered than C2? I experimented with it on a real workload and it was a net loss for peak perf (anywhere from 8-20% worse than C2, but also quite unstable); this was with a very large code cache to play it safe, but no other tuning. sent from my phone On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" wrote: > OK, will do and add you as watcher; thanks Vladimir! (don't yet know if > with tiered and a necessarily bounded, if large, code cache whether > flushing will in fact eventually become necessary, wrt yr suggested > temporary workaround.) > > Have a good weekend! > -- ramki > > On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov < > vladimir.kozlov at oracle.com> wrote: > >> Got it. Yes, it is issue with thousands java threads. >> You are the first pointing this problem. File bug on compiler. We will >> look what we can do. Most likely we need parallelize this work. >> >> Method's hotness is used only for UseCodeCacheFlushing. You can try to >> guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch >> it off. >> >> We need mark_as_seen_on_stack so leave it. >> >> Thanks, >> Vladimir >> >> >> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote: >> >>> Hi Vladimir -- >>> >>> I noticed the increase even with Initial and Reserved set to the default >>> of 240 MB, but actual usage much lower (less than a quarter). >>> >>> Look at this code path. Note that this is invoked at every safepoint >>> (although it says "periodically" in the comment). >>> In the mark_active_nmethods() method, there's a thread iteration in both >>> branches of the if. I haven't checked to >>> see which of the two was the culprit here, yet (if either). >>> >>> // Various cleaning tasks that should be done periodically at safepoints >>> >>> void SafepointSynchronize::do_cleanup_tasks() { >>> >>> .... >>> >>> { >>> >>> TraceTime t4("mark nmethods", TraceSafepointCleanupTime); >>> >>> NMethodSweeper::mark_active_nmethods(); >>> >>> } >>> >>> .. >>> >>> } >>> >>> >>> void NMethodSweeper::mark_active_nmethods() { >>> >>> ... >>> >>> if (!sweep_in_progress()) { >>> >>> _seen = 0; >>> >>> _sweep_fractions_left = NmethodSweepFraction; >>> >>> _current = CodeCache::first_nmethod(); >>> >>> _traversals += 1; >>> >>> _total_time_this_sweep = Tickspan(); >>> >>> >>> if (PrintMethodFlushing) { >>> >>> tty->print_cr("### Sweep: stack traversal %d", _traversals); >>> >>> } >>> >>> Threads::nmethods_do(&mark_activation_closure); >>> >>> >>> } else { >>> >>> // Only set hotness counter >>> >>> Threads::nmethods_do(&set_hotness_closure); >>> >>> } >>> >>> >>> OrderAccess::storestore(); >>> >>> } >>> >>> >>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov >>> > wrote: >>> >>> Hi Ramki, >>> >>> Did you fill up CodeCache? It start scanning aggressive only with >>> full CodeCache: >>> >>> // Force stack scanning if there is only 10% free space in the >>> code cache. >>> // We force stack scanning only non-profiled code heap gets full, >>> since critical >>> // allocation go to the non-profiled heap and we must be make >>> sure that there is >>> // enough space. >>> double free_percent = 1 / >>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100; >>> if (free_percent <= StartAggressiveSweepingAt) { >>> do_stack_scanning(); >>> } >>> >>> Vladimir >>> >>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote: >>> >>> >>> Yes. >>> >>> >>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich >>> >>> >> wrote: >>> >>> Ramki, are you running tiered compilation? >>> >>> sent from my phone >>> >>> On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" >>> >>> >> >>> wrote: >>> >>> >>> Hello GC and Compiler teams! >>> >>> One of our services that runs with several thousand >>> threads >>> recently noticed an increase >>> in safepoint stop times, but not gc times, upon >>> transitioning to >>> JDK 8. >>> >>> Further investigation revealed that most of the delta >>> was >>> related to the so-called >>> pre-gc/vmop "cleanup" phase when various book-keeping >>> activities >>> are performed, >>> and more specifically in the portion that walks java >>> thread >>> stacks single-threaded (!) >>> and updates the hotness counters for the active >>> nmethods. This >>> code appears to >>> be new to JDK 8 (in jdk 7 one would walk the stacks >>> only during >>> code cache sweeps). >>> >>> I have two questions: >>> (1) has anyone else (typically, I'd expect applications >>> with >>> many hundreds or thousands of threads) >>> noticed this regression? >>> (2) Can we do better, for example, by: >>> (a) doing these updates by walking thread stacks >>> in >>> multiple worker threads in parallel, or best of all: >>> (b) doing these updates when we walk the thread >>> stacks >>> during GC, and skipping this phase entirely >>> for non-GC safepoints (with attendant loss >>> in >>> frequency of this update in low GC frequency >>> scenarios). >>> >>> It seems kind of silly to do GC's with many multiple >>> worker >>> threads, but do these thread stack >>> walks single-threaded when it is embarrasingly parallel >>> (one >>> could predicate the parallelization >>> based on the measured stack sizes and thread >>> population, if >>> there was concern on the ovrhead of >>> activating and deactivating the thread gangs for the >>> work). >>> >>> A followup question: Any guesses as to how code cache >>> sweep/eviction quality might be compromised if one >>> were to dispense with these hotness updates entirely >>> (or at a >>> much reduced frequency), as a temporary >>> workaround to the performance problem? >>> >>> Thoughts/Comments? In particular, has this issue been >>> addressed >>> perhaps in newer JVMs? >>> >>> Thanks for any comments, feedback, pointers! >>> -- ramki >>> >>> PS: for comparison, here's data with >>> +TraceSafepointCleanup from >>> JDK 7 (first, where this isn't done) >>> vs JDK 8 (where this is done) with a program that has a >>> few >>> thousands of threads: >>> >>> >>> >>> JDK 7: >>> .. >>> 2827.308: [sweeping nmethods, 0.0000020 secs] >>> 2828.679: [sweeping nmethods, 0.0000030 secs] >>> 2829.984: [sweeping nmethods, 0.0000030 secs] >>> 2830.956: [sweeping nmethods, 0.0000030 secs] >>> .. >>> >>> JDK 8: >>> .. >>> 7368.634: [mark nmethods, 0.0177030 secs] >>> 7369.587: [mark nmethods, 0.0178305 secs] >>> 7370.479: [mark nmethods, 0.0180260 secs] >>> 7371.503: [mark nmethods, 0.0186494 secs] >>> .. >>> >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Jul 31 22:10:51 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 31 Jul 2015 18:10:51 -0400 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: <55BBC1C0.3030709@oracle.com> <20150731234445.0000459c.ecki@zusammenkunft.net> Message-ID: Yes, I misremembered thinking turning off tiered disabled sweeps at safepoints; sorry for the noise. sent from my phone On Jul 31, 2015 6:04 PM, "Srinivas Ramakrishna" wrote: > > Hi Bernd -- > > It doesn't seem coupled to GC; here's an example snapshot: > > sun.ci.codeCacheCapacity=13041664 > sun.ci.codeCacheMaxCapacity=50331648 > sun.ci.codeCacheMethodsReclaimedNum=4281 > sun.ci.codeCacheSweepsTotalNum=409 > sun.ci.codeCacheSweepsTotalTimeMillis=1541 > sun.ci.codeCacheUsed=10864704 > > sun.gc.collector.0.invocations=6319 > sun.gc.collector.1.invocations=6 > > BTW, to a question of Vitaly's on this thread earlier, this code executes > irrespective of whether you are running tiered or not (the above is from > tiered explictly off -- earlier ones were with tiered on --, yet note the > excessive time in the stack walk as part of this code in JDK 8): > > [mark nmethods, 0.0171828 secs] > > On a similar note (and to Vladimir's earlier note of turning off code > cache flushing) and with the understanding that there were lots of changes > related to tiered compilation and code cache flushing between 7 ad 8, > turning on Tiered in 7 leads to the occasional (rare) long safepoint for > the stack walks, but else not. > > And finally the same issue must exist in 9 as well, albeit based on code > inspection, not running with JDK 9 yet. > Have a good weekend! > -- ramki > > On Fri, Jul 31, 2015 at 2:44 PM, Bernd Eckenfels > wrote: > >> Am Fri, 31 Jul 2015 12:07:18 -0700 >> schrieb Srinivas Ramakrishna : >> >> > sun.ci.codeCacheSweepsTotalNum=58 >> ... >> > Notice that the code cache usage is less that 35 MB, for the 240 MB >> > capacity, yet it seems we have had 58 sweeps already >> >> I would also be interested in what causes this. Is this caused by >> System.gc maybe? (we do see sweeps and decreasing code cache usage on >> systems where no pressure should exist). >> >> Gruss >> Bernd >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecki at zusammenkunft.net Fri Jul 31 21:44:45 2015 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Fri, 31 Jul 2015 23:44:45 +0200 Subject: Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8) In-Reply-To: References: <55BBC1C0.3030709@oracle.com> Message-ID: <20150731234445.0000459c.ecki@zusammenkunft.net> Am Fri, 31 Jul 2015 12:07:18 -0700 schrieb Srinivas Ramakrishna : > sun.ci.codeCacheSweepsTotalNum=58 ... > Notice that the code cache usage is less that 35 MB, for the 240 MB > capacity, yet it seems we have had 58 sweeps already I would also be interested in what causes this. Is this caused by System.gc maybe? (we do see sweeps and decreasing code cache usage on systems where no pressure should exist). Gruss Bernd